# RTVI-CV Runbook Deploy, operate, debug, and tear down the **Real Time Video Intelligence CV (RTVI-CV)** microservice. The agent collects credentials once, detects the platform, downloads NGC resources, configures the pipeline, launches the container, applies all required config edits, and starts the app. > **Service**: `rtvi-cv` (`metropolis_perception_app`) > **Image**: `nvcr.io//:` — user-supplied at deploy time > **REST port**: `9000` (stream add/remove, health, metrics) > **Hardware**: x86/aarch64 dGPU (T4, A100, L40, H100, B200, RTX), SBSA > (Spark, Grace-Hopper), Jetson (Thor, Orin, Xavier) --- ## Use Cases | Use case | Model | Inference engine | Notes | |---------------------|------------------------------------|---------------------------|----------------------------------------------------| | `warehouse-2d` | RT-DETR + NvDCF tracker | `nvinfer` (PGIE) | 7 classes, DS auto-builds engine, skill caches it. | | `warehouse-3d` | Sparse4D (multi-camera BEV) | `videotemplate` plugin | 6 classes, requires `LD_PRELOAD` + setup script. | | `smartcity-rtdetr` | TrafficCamNet RT-DETR | `nvinfer` | 5 classes, ITS use case. | | `smartcity-gdino` | Grounding DINO (open-vocab) | `nvinferserver` (Triton) | Prompt-based detection; engine cached as `.plan`. | --- ## Prerequisites - Docker Engine ≥ 20.10 with `docker` CLI (`docker --version` to verify). - NVIDIA Container Toolkit installed (`nvidia-smi` works inside containers). - ≥ 30 GB free disk for images + models + videos (warehouse-3d / GDINO may need 50+ GB). - Free port `9000` (REST API); `9092` only if Kafka sink is enabled. - Outbound network to `nvcr.io`, `ngc.nvidia.com`, `api.ngc.nvidia.com`. - NGC account + API key from (only needed if downloading NGC models/videos; not for local-only assets). - For `eglsink` display output: X11 server with `$DISPLAY` set on the host. For full secret, mount, env-var, and GPU-selection detail see `environment.md`. --- ## Quick Start ```text deploy rtvicv warehouse 2d with 4 streams and display the output use this docker image use this ngc resource use this model with these videos ``` Anything you omit, the skill asks for via `AskQuestion` or auto-detects from the host (platform, existing containers, cached resources). At minimum you can say `deploy rtvi-cv warehouse 2d` and answer prompts one by one. ### Placeholders | Placeholder | Description | |-------------------------------|---------------------------------------------------------------------------------------------------------| | `` | Full RTVI-CV docker image, e.g. `nvcr.io//:` (use the `-sbsa-` tag variant for SBSA). | | `` | Warehouse NGC resource (`//:`) — used by warehouse-2d and warehouse-3d. | | `` | Smart-city videos NGC resource — shared by smartcity-rtdetr and smartcity-gdino. | | `` | TrafficCamNet RT-DETR NGC model reference. | | `` | Grounding DINO NGC model reference. | | `` | Warehouse 2D RT-DETR ONNX filename (resolved inside the NGC app-data resource). | | `` | Named video set inside the warehouse NGC resource (e.g. a 4-camera test set). | | `` / `` | Host paths to override the NGC defaults with local files / directories. | | `` | Batch size / max stream count. | ### What you can specify inline Anything *not* pinned by the user query is asked via the Step 2 `AskQuestion` (no silent defaults — see the "What gets asked" section below). | Param | Phrases that pin a value (skip the question) | |------------------|---------------------------------------------------------------------------------------------------------| | Use case | `warehouse 2d`, `warehouse-3d`, `smartcity rtdetr`, `smartcity gdino`, `sparse4d` | | Batch / streams | `with 4 streams`, `batch 8`, `4 cameras` | | Stream mode | `dynamic` / `via rest` / `add via api` / `live add` → dynamic. Otherwise unpinned → ASK (default `static`). | | Input source | `from rtsp ` → RTSP. Otherwise unpinned → ASK. | | Output sink | `save the output in a file` → filedump; `display`/`on screen` → eglsink; `benchmark` → fakesink. Otherwise ASK. | | Docker image | `use this docker image ` | | NGC resource | `use this ngc resource ` | | Model override | `use this model `, `use this gdino `, `use this rtdetr model ` | | Video override | `with these videos ` | --- ## Mode Selection — DEPLOY vs TEARDOWN | Mode | Trigger phrases | Goes to | |--------------|-----------------------------------------------------------------------|-----------------------------------------------------------------------------------------------| | **DEPLOY** | `deploy`, `run`, `launch`, `start`, `set up`, `bring up` | Step 0 → Steps 1–6 below. | | **TEARDOWN** | `stop`, `tear down`, `shutdown`, `kill`, `cleanup`, `remove container` | `teardown-flow.md`. | If the user's intent is clearly deploy or teardown, do not ask — proceed directly to the matching flow. Only ask via `AskQuestion` when ambiguous. --- ## Deployment Workflow (high-level map) The skill renders progress as a 10-task list (see `task-list.md`) and walks through six logical steps. Each step links to the file with the detailed bash and decision logic. | Step | What it does | Reference | |------|-----------------------------------------------------------|-------------------------------------------------------------------------| | 0 | Build the full task list upfront via `TodoWrite`. | `task-list.md` | | 1 | Confirm use case, platform, image, NGC creds, resources. | `usecases.md`, `platforms.md`, `ngc-setup.md`, `resource-plan.md` | | 2 | Pipeline configuration: batch size, streams, sink. | `pipeline-config.md` | | 3 | Launch (or reuse / restart / parallel) the container. | `container-reuse.md` | | 4 | Apply config inside the container (path sub, batch, sink, sources, engine cache lookup). | `apply-config.md` | | 5 | Start the perception app + capture metrics + write log. | `start-app.md` | | 6 | Next steps: stream lifecycle, metrics, REST examples. | `next-steps.md` | **Teardown** is a separate 5-step flow (discover → select → method → cleanup scope → execute) — see `teardown-flow.md`. NGC credentials are always preserved. ### Startup contract — first response after invocation Before *any* tool call, the agent's first reply is a **single terse line** acknowledging the use case + values pinned by the query, immediately followed by the `TodoWrite` / `TaskCreate` plan call(s). No bash, no file reads beyond the skill manifest, no narration of upcoming steps, **no narration of internal tool loading** (see Forbidden list below). ``` ✔ warehouse-2d, batch=4, sink=eglsink (from query) [TodoWrite fires here — renders the 5-task widget, OR 5 successive TaskCreate calls if TaskCreate is the available tool] ``` **Forbidden in the first reply:** - "Let me start by loading the task list, detecting the platform, and resolving YAML defaults." — narrating future work re-states what the plan tool is about to render. - **"I need to load TodoWrite (a deferred tool…)"**, "Loading TaskCreate…", "Calling ToolSearch for the planning tool…", or any other reference to tool resolution, deferred tools, or `ToolSearch`. The agent loads tools silently — the user only ever sees the `✔` summary line followed by the widget. If `TodoWrite` / `TaskCreate` happens to be a deferred tool in the current runtime, the `ToolSearch` call to load its schema is **silent** — no chat text accompanies it. - `bash load_defaults.sh` / `uname -m` / `nvidia-smi` / any other tool **before** the planning tool call. Platform detect and YAML defaults are part of task 1/5 and run AFTER the todo list is on screen. (Internal labels for these substeps — `1.b` / `1.c` — are model-facing only; never print them to the user.) - Multi-paragraph preambles. One sentence + the widget. **Required ordering at startup (no exceptions):** ``` 1. one-line ✔ summary of what's pinned from the query 2. Plan tool call — either: • TodoWrite (merge:false) — full 5-task list, task 1/5 = in_progress • OR 5 successive TaskCreate calls (one per task) when TaskCreate is the available planning tool (see task-list.md § "Initial TaskCreate calls") 3. Pre-completion of inferred tasks — either: • TodoWrite (merge:true) for any task fully answered by the query • OR TaskUpdate (status:"completed") for each pre-completed task 4. Task 1/5 begins — first bash runs here (load_defaults.sh, platform detect) ``` If the use case isn't in the query, step 1 above becomes the *only* place the agent asks for it, before the plan tool call. Everything else stays the same. ### Step ordering invariants — DO NOT skip ahead The skill MUST run steps in the order below. The most common temptation is to peek at the engine cache (or container state) before the user has finished picking targets — this is forbidden because the cache key depends on the chosen model. ```dot digraph step_order { rankdir=LR; "platform + YAML defaults" [shape=box]; "Step 1 AskQuestion\n(image, NGC resource, model, videos)" [shape=box]; "Step 2 AskQuestion\n(batch, stream_mode, input, sink)" [shape=box]; "Step 3 launch / reuse" [shape=box]; "Step 4 apply-config\n(includes engine-cache lookup)" [shape=box, color=red]; "Step 5 start-app" [shape=box]; "platform + YAML defaults" -> "Step 1 AskQuestion\n(image, NGC resource, model, videos)"; "Step 1 AskQuestion\n(image, NGC resource, model, videos)" -> "Step 2 AskQuestion\n(batch, stream_mode, input, sink)"; "Step 2 AskQuestion\n(batch, stream_mode, input, sink)" -> "Step 3 launch / reuse"; "Step 3 launch / reuse" -> "Step 4 apply-config\n(includes engine-cache lookup)"; "Step 4 apply-config\n(includes engine-cache lookup)" -> "Step 5 start-app"; } ``` **Hard rules:** 1. **Step 1 `AskQuestion` fires BEFORE Step 2 `AskQuestion`.** Never merge them. Even when Step 1 has YAML defaults for every parameter, the user still picks each one in Step 1's question (defaults appear as the "Recommended" option). Don't skip ahead to Step 2 just because Step 1 "could" be auto-answered. 2. **No engine-cache lookup before Step 4.** The cache filename is `_b.{engine,plan}`. Until Step 1 returns the model choice (NGC default or custom override) AND Step 2 returns the batch size, the cache key is unknown — any earlier lookup is wasted work that a custom ONNX or a different batch invalidates. 3. **No reuse-vs-restart decision before Step 3.** Container reuse is only safe once Step 1 has confirmed the image and Step 2 has confirmed the pipeline parameters that decide whether the existing mounts are compatible. 4. **No bash discovery beyond Step 1.b.** Steps 1.b (platform detect via `uname -m` + `nvidia-smi`) and 1.c (load YAML defaults via `load_defaults.sh`) are the only allowed pre-AskQuestion bash. Everything else — `docker image inspect`, `docker ps`, listing engine cache, listing resources for *cosmetic* summaries — must wait until the user has answered Step 1's `AskQuestion`. `docker ps` for the container-reuse decision belongs in Step 3, not Step 1. If the agent finds itself thinking "let me just check X to make the next question prettier", that's the smell — stop, ask first, look later. If the agent finds itself thinking "the YAML defaults are good enough, let me skip Step 1's question", that's the same smell. Ask anyway. 5. **Step 6 `AskQuestion` is the final step of every successful deploy.** Right after the "Perception Application — Results" box, fire the Step 6 menu from `next-steps.md` § "11.c". Do NOT replace it with a free-form "Next steps:" bullet list. The menu is the user's handle on the running deploy. --- ## Minimal-Interaction Contract The skill asks the user **once** for everything it cannot figure out on its own, then runs to completion with progress prints but no further prompts. ### One-shot intake Before any work begins, the skill builds a single consolidated `AskQuestion` block covering: - Docker image ref (if not in the query). - Resource source type + refs / local paths (if not in the query). - NGC API key + org (only if `NEEDS_NGC=1` AND `~/.ngc/config` is missing). - Pipeline settings: batch size, stream mode, input type, output sink. - `stream_add_delay` (defaults to 5 s; use 10–20 s for Jetson). - Optional filename / dir hints (otherwise the skill auto-discovers). If a later step needs a value that wasn't collected upfront, the skill applies the default from the table below and keeps going. Only hard errors re-prompt the user. ### What gets asked vs. applied silently **The skill drives TWO `AskQuestion` rounds in fixed order: Step 1 BEFORE Step 2. YAML defaults from `deploy-defaults.yml` appear inside each question as the "(Recommended)" option — they are NEVER applied silently.** **Step 1 (deploy targets) MUST drive an `AskQuestion` with exactly three parameters when the user query did not pin them — even if YAML defaults exist for the use case:** | Step 1 parameter | Pin via query phrase | If unpinned | |------------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------| | `docker_image` | `use this docker image ` | **ASK** — present the YAML-default tag for the detected platform as "(Recommended)". | | `model` | `use this model ` / `use this rtdetr ...` / `use this gdino ...` | **ASK** — present the YAML-default ONNX with its NGC ref + in-resource path inline; offer "Custom local ONNX" alternative. | | `videos` | `with these videos ` | **ASK** — present the YAML-default video set with its NGC ref + in-resource path inline; offer "Custom local directory" alt. | **No separate "NGC resource" question.** The NGC ref lives inside the Model and Videos options. If the user picks "Custom local …" for an asset, no NGC ref is needed for that asset. **[`deploy-defaults.yml`](../assets/deploy-defaults.yml) is the SINGLE SOURCE OF TRUTH for every default value the skill suggests** — docker image tag, NGC resource ref, ONNX basename, in-resource path, video set name, in-image config paths. The agent reads these via `scripts/load_defaults.sh ` (emits `DEFAULT_IMAGE`, `DEFAULT_GPU_ID`, `DEFAULT_MODEL_NGC_REF`, `DEFAULT_MODEL_PATH`, `DEFAULT_VIDEOS_NGC_REF`, `DEFAULT_VIDEOS_PATH`, …) and substitutes the resolved values into the question options. **Never hardcode a tag, NGC ref, ONNX name, path, or GPU index inside SKILL.md, references, or scripts** — editing the YAML must be sufficient to change the suggested defaults. **Strictness contract for the "Recommended (default)" option:** - When the user picks the Recommended option for `docker_image`, the value used downstream MUST equal `$DEFAULT_IMAGE` from `load_defaults.sh`. No hardcoded fallback in any script may substitute. - When the user picks the Recommended option for `model`, the path used by `fetch_resources.sh`, `apply_config.sh`, `setup_gdino.sh`, `setup_sparse4d.sh`, and `prelaunch_nvinfer_engine.sh` MUST equal `$DEFAULT_MODEL_PATH`. The agent passes it explicitly via `--onnx "$DEFAULT_MODEL_PATH"` whenever those scripts accept the flag. - When the user picks the Recommended option for `videos`, the directory passed downstream MUST equal `$DEFAULT_VIDEOS_PATH` (host side) or its in-container equivalent. The agent passes it explicitly via `--videos "$DEFAULT_VIDEOS_PATH_CONTAINER"` whenever a script accepts the flag. - `main_config`, `pgie_config`, `sparse4d_config` are **never** user- overridable — they're paths baked into the container image and read straight from `usecases..<*>_config` in the YAML. - The `docker run` command MUST pass `--gpus '"device=$DEFAULT_GPU_ID"'` (resolved from `runtime.gpu_id` in the YAML, default `0`). If the user query inline-pins a different device — e.g. "run on gpu 1", "use gpu 2" — the agent passes that value instead, but does NOT mutate the YAML. `--gpus all` is only used when the user explicitly asks for it. **User-supplied custom values:** - If the user picks "Custom local ONNX", "Custom local directory", or "Use a different docker image", the agent uses the user-supplied value in place of the YAML default for that asset. The other two assets keep their YAML defaults unless individually overridden. - If the user query already pinned a value (e.g. `use this docker image `), the agent skips that question and uses the pinned value as if the user had picked "Custom". Each Model / Videos option's sub-line shows the resolved ` / ` so the user reads the source of truth without a separate question. Example shape (concrete values are filled at runtime from the YAML): ``` Which model ONNX should I use? ❯ 1. (Recommended) From NGC: Path: 2. Use a custom local ONNX Provide a host path to a different ONNX file ``` **Step 3 (container) MUST drive an `AskQuestion` whenever an existing container is detected on the same image — never silently auto-decide:** | Detection result | AskQuestion fires? | |-----------------------------------------------|---------------------------------------------------------------------------------------------------| | No existing container on this image | No — proceed straight to `docker run` (only one path). | | Existing container, all required mounts match | **YES** — present `Reuse / Restart / New parallel container` (full JSON in `container-reuse.md`). | | Existing container, mounts incompatible | **YES** — present `Restart with correct mounts / New parallel container` (reuse hidden). | The user always picks. The skill never auto-picks "reuse" just because mounts match — that information is presented in the question's description text and lets the user decide. **Step 2 (pipeline configuration) MUST drive an `AskQuestion` for these four parameters when the user query did not pin them, AFTER Step 1 has finished:** | Step 2 parameter | Pin via query phrase | If unpinned | |------------------|-------------------------------------------------------------------------------------|--------------------------------------------------------------------------| | `batch_size` | `with N streams`, `batch N`, `N cameras` | **ASK** — see `pipeline-config.md` (4-question `AskQuestion` block). | | `stream_mode` | `dynamic` / `via rest` / `add via api` / `live add` → dynamic | **ASK** — `static` appears as the "(Recommended)" option (default). | | `input_type` | `from rtsp ` → rtsp | **ASK** — `filesrc` appears as the "(Recommended)" option. | | `output_sink` | `display`/`on screen` → eglsink; `save to file` → filedump; `benchmark` → fakesink | **ASK** — `fakesink` appears as the "(Recommended)" option. | **Strict ordering rules:** 1. **Step 1 questions fire before Step 2 questions.** Never merge them into a single `AskQuestion` block, and never ask Step 2 first because Step 1 "has YAML defaults available." 2. Only mark a step `completed` AFTER the user's values are confirmed (either pinned by the query or returned from `AskQuestion`). **Never** mark a step done from a partially-specified query. 3. The presence of a YAML default does NOT permit skipping the `AskQuestion`. The default is the "Recommended" option *inside* the question — the user still chooses. **Truly silent defaults (applied without asking, but announced before use):** | Setting | Default | Why | |-----------------------|----------------------------------------------------------|--------------------------------------------------| | `platform` | auto-detected via `uname -m` + `nvidia-smi` | Deterministic — confirmation is pointless. | | `stream_add_delay` | `20` s (dynamic mode only) | Stable on all platforms; user rarely overrides. | | Docker launch confirm | skipped — print the command inline before it runs | User already approved by invoking the skill. | | Engine cache miss | announce loudly + heartbeat every 30 s during build | User must know they're waiting on a build. | | Engine cache hit | confirm via grep on `deserialize cuda engine` | Closes the loop after DS actually loads it. | | Single-candidate scan | auto-use; print `✔ : ` | One option is not a question. | ### When the skill MUST prompt anyway 1. Multi-candidate scan with no matching hint — picker is unavoidable. 2. Hard error (zero candidates, auth failure, arch mismatch). 3. Destructive action about to run (cleanup wipe, force rebuild ≥ 10 min). 4. Credentials missing on first run — API key must come from the user. Everything else runs without asking. See `ux-conventions.md` for the full visibility / `AskQuestion` contract. ### User-facing announcements — never include substep notation The user only knows the 5 todos in the widget (`1/5. Prepare deploy`, `2/5. Finalize pipeline`, `3/5. Launch container`, `4/5. Apply configuration`, `5/5. Start app`). The internal substep labels — `1.b`, `1.c`, `1.g`, `4.a`, `4.f`, `5.b.2`, `T3` etc. — exist only in SKILL.md / references for the agent's own bookkeeping. **They MUST NOT appear in any user-facing line** (`→`, `✔`, `?`, `⚠`, `✖`, headings, or box titles). **Required form:** describe the action directly, optionally prefixed with the user-visible top-level step name only. | ❌ Forbidden (model-facing labels leaking into UI) | ✅ Required (action-first, no internal labels) | |----------------------------------------------------|------------------------------------------------| | `→ Step 1.b/1.c: detect platform + load YAML defaults` | `→ Detect platform + load YAML defaults` (preferred) or `→ Step 1: detect platform + load YAML defaults` | | `✔ 4.a: assets resolved` | `✔ Assets resolved (model + videos)` | | `→ Step 4.f: Engine cache lookup (parallel)` | `→ Engine cache lookup (parallel)` | | `→ 5.b.2 — launch app + poll ready` | `→ Launch app + poll readiness` | | `✔ T3: stop method chosen` | `✔ Stop method: graceful (docker stop)` | | Box title `Step 1.b — Detect platform` | Box title `Detect platform` | **Exception:** `TodoWrite` task labels keep their `N/5.` prefix (`1/5. Prepare deploy …`) — that's the user-visible numbering scheme, not an internal substep. Per-step exit boxes and `→`/`✔` lines drop the prefix entirely (the box title is plain action language). ### Bash batching rule **Every group of sequential bash commands that requires no user decision between them MUST be combined into a single bash tool call.** Each separate call is a permission round-trip; batch with newlines, `&&`, or `docker exec ... bash -c "cmd1 && cmd2 && cmd3"`. Only split when a real conditional branch needs the output of the first call. **Per-step bash budget — canonical "one call per step" map:** A complete DEPLOY runs in **~7 bash calls total**. Each row below is one call. If the agent finds itself making more than these, it's violating the batching rule. **Boxes are NEVER bash calls.** Container, Apply configuration, Perception Application — Plan / Results, Metrics & FPS, and Liveness boxes are all rendered as literal text in the assistant's reply, not via `python3` / `cat <)"` then echo platform + defaults | Combine `uname -m`, `nvidia-smi`, `load_defaults.sh` — one bash call. (Internal substep labels for the parts of this — `1.b` platform / `1.c` YAML — are agent-only, never printed.) | | 2 | 1.g | `bash scripts/fetch_resources.sh ` (with overrides via env vars on the same line) | Script handles NGC creds check, `chmod 600`, download, extract, scan, resolve. **Do not pre-check or pre-chmod `~/.ngc/config`** — `fetch_resources.sh` does it. | | 3 | 3 | `docker ps --filter ... --format ...` + `docker inspect --format ...` + reuse decision logic in one shell block | Step 3.0 detect + 3.1 mount-compatibility check share state; one call. | | 4 | 3 | `docker run ... ` (or `docker start ` for reuse / `docker stop && docker run ...` for restart) | One call to land the container. Skipped entirely on reuse. | | 5 | 4 | `bash $SKILL_DIR/scripts/apply_in_container.sh --container --usecase --batch --sink --stream-mode --onnx ... --videos ...` | **Host-side wrapper** — internally does `docker exec rm -rf /tmp/scripts` + `docker cp scripts :/tmp/` + `docker exec chmod -R +x` + `docker exec apply_config.sh ...`. ONE permission prompt for all of Step 4. | | 6 | 5 | `bash $SKILL_DIR/scripts/start_app_in_container.sh --container --usecase --batch --sink --stream-mode --onnx ... --videos ... [--image ... --ngc ... --platform ... --docker-cmd ...]` | **Host-side wrapper** — internally does refresh scripts + X11 pre-flight (eglsink) + `write_deployment_log.sh` + `run_app_and_wait.sh --log "$LOG"`. ONE permission prompt for all of Step 5. The wrapper passes the metadata args (`--image / --ngc / --platform / --docker-cmd`) to `write_deployment_log.sh` so the log header is fully populated. | | 7 | 5.c | `curl -s http://localhost:9000/api/v1/ready` + `curl -s http://localhost:9000/api/v1/metrics` | Final verification — both REST checks in one call. (RTVI-CV exposes `/live`, `/ready`, `/startup`, `/metrics` — there is **no** `/health` endpoint.) | **Anti-patterns that cost extra calls (DON'T do these):** - ❌ **Any pre-check / pre-flight on `~/.ngc/config` before `fetch_resources.sh`** — including: - `if [ -f ~/.ngc/config ]; then echo "✔ NGC creds present"; fi` - `ls -la ~/.ngc/config` - `chmod 0600 ~/.ngc/config` - `grep '^apikey' ~/.ngc/config` `fetch_resources.sh` does ALL of this itself: checks file existence, validates the `apikey` line, enforces `chmod 600`, and exits with a clear `✖ NGC credentials missing at ~/.ngc/config — set up first, then re-run.` (rc=5) if anything's wrong. Just run `bash $SKILL_DIR/scripts/fetch_resources.sh ` directly — one bash call covers the creds gate, NGC download, extract, scan, and path resolution. - ❌ `bash