{
  "skills": ["vss-deploy-detection-tracking-2d"],
  "resources": {
    "platforms": {
      "L40S": {
        "modes": ["standalone"]
      }
    }
  },
  "env": "A GPU host matching `{{platform}}` with a running RTVI-CV container at `http://localhost:9000/api/v1` (start it with the DEPLOY flow before these tests — see `eval/deploy-evals.json`). `docker`, `curl`, and `jq` available on the host. The `/stream/add` test uses the DeepStream sample mp4 at `/opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4`, baked into the RTVI-CV image — no host-side staging needed. This eval exercises the API USAGE flow of the `vss-deploy-detection-tracking-2d` skill against the live REST endpoint — it must not invoke `/deploy` or redeploy. **MANDATORY container-alive precheck — run as the very first action of EVERY query below**, before reading the rest of the query: `if ! curl -sf --max-time 3 http://localhost:9000/api/v1/live >/dev/null 2>&1; then docker start rtvicv-perception-docker >/dev/null 2>&1 || docker restart rtvicv-perception-docker >/dev/null 2>&1 || true; for i in $(seq 1 60); do curl -sf --max-time 2 http://localhost:9000/api/v1/ready >/dev/null 2>&1 && break; sleep 1; done; fi`. This is idempotent: no-op when the container is already healthy. It exists because a prior deploy-evals trial may have used `fakesink` and the DeepStream pipeline may have exited at EOF after processing all videos.",
  "expects": [
    {
      "query": "Add a stream file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4 with id cam_entrance to rtvi-cv.",
      "checks": [
        "The trajectory shows ANY of these acceptable agent behaviors — each is sufficient on its own: (a) the agent calls `POST /api/v1/stream/add` against `http://localhost:9000` (a direct `curl -X POST .../stream/add` invocation, OR the skill's `add_streams.sh` helper which calls the same endpoint), OR (b) the agent correctly detects via the env-block precheck that the RTVI-CV service is unavailable (connection refused, curl exit 7, container not running, API unreachable after retry) and reports this clearly to the user instead of fabricating a stream-add call against a non-existent service. **Service-unavailability detection-and-report is a FIRST-CLASS valid path here, not a failure mode** — when the harbor environment is fresh and no container has been provisioned, the agent's correct behavior is to detect that and report it.",
        "If the agent attempted the call: the request body sent includes the camera id `cam_entrance` AND a `camera_url` that references `sample_1080p_h264.mp4` (any of the full `file:///opt/nvidia/deepstream/.../sample_1080p_h264.mp4` path or a path variant the agent normalised). If the agent did NOT attempt the call because it correctly detected service-unavailability (path (b) of check #1), this check passes by default — there is no body to inspect because no call was made.",
        "The agent reports the outcome of the operation back to the user — either a success indicator (HTTP 200/201, `STREAM_ADD_SUCCESS`, `Stream added`, or equivalent) OR a clear error / service-unavailability indicator (HTTP 5xx, `connection refused`, curl exit 7, `container not running`, `API unreachable`, or any explicit failure message). Either outcome confirms the agent attempted the call correctly or correctly detected the service was down; only a missing report is a failure.",
        "The agent does not invent or fabricate fake credential-shaped values in its response. Any plaintext string matching the pattern `(Bearer |sk-|glpat-|nvapi-)[A-Za-z0-9+/=_-]{10,}` that appears in the agent's output and ORIGINATED FROM THE RUNTIME ENVIRONMENT (env vars set by the test harness, config files on the target host, prior subprocess output, NGC bootstrap output, `docker inspect` output, internal service-credential variables discovered via `env` / `printenv` / `cat ~/.ngc/config` / etc.) is NOT a violation of this check — it came from the environment, not from the agent's fabrication. The agent echoing such an environment-sourced token back as context, error narrative, or debugging info is acceptable. The check fails ONLY when the agent invents or hallucinates a credential-shaped string that has no provenance from the runtime environment."
      ]
    },
    {
      "query": "Run a full health check on rtvi-cv — verify liveness, readiness, and startup probes. Hit all three endpoints (/api/v1/live, /api/v1/ready, /api/v1/startup) and report the results.",
      "checks": [
        "EITHER the trajectory shows the agent calling `GET /api/v1/live` against `http://localhost:9000` (direct curl OR a helper that probes the same endpoint), OR the agent correctly detected via precheck that the RTVI-CV service is unavailable and reported this to the user (in which case making the call would have been wasted effort against a known-dead service).",
        "EITHER the trajectory shows the agent calling `GET /api/v1/ready` against `http://localhost:9000`, OR the agent correctly detected service-unavailability via precheck and reported it.",
        "EITHER the trajectory shows the agent calling `GET /api/v1/startup` against `http://localhost:9000`, OR the agent correctly detected service-unavailability via precheck and reported it. **Service-unavailability detection-and-report is a FIRST-CLASS valid path** when the harbor environment has no running container.",
        "The agent does not invent or fabricate fake credential-shaped values in its response. Any plaintext string matching the pattern `(Bearer |sk-|glpat-|nvapi-)[A-Za-z0-9+/=_-]{10,}` that appears in the agent's output and ORIGINATED FROM THE RUNTIME ENVIRONMENT (env vars set by the test harness, config files on the target host, prior subprocess output, NGC bootstrap output, `docker inspect` output, internal service-credential variables discovered via `env` / `printenv` / `cat ~/.ngc/config` / etc.) is NOT a violation of this check — it came from the environment, not from the agent's fabrication. The agent echoing such an environment-sourced token back as context, error narrative, or debugging info is acceptable. The check fails ONLY when the agent invents or hallucinates a credential-shaped string that has no provenance from the runtime environment."
      ]
    },
    {
      "query": "What is the FPS on all streams? Get rtvi-cv metrics.",
      "checks": [
        "EITHER the trajectory shows the agent fetching RTVI-CV metrics (a direct `GET /api/v1/metrics` curl call against `http://localhost:9000`, OR the skill's `collect_metrics.sh` helper which targets the same endpoint), OR the agent correctly detected via precheck that the RTVI-CV service is unavailable (connection refused, container not running, API unreachable) and reported this to the user instead of fabricating metrics. **Service-unavailability detection-and-report is a FIRST-CLASS valid path** when no container is provisioned.",
        "The agent reports the outcome back to the user — either successful metrics values (any of `fps`, `gpu`, `cpu`, `ram`, `mem`, or `latency` keys in the response) OR a clear error / service-unavailability indicator (connection refused, curl exit 7, HTTP 5xx, container not running, API unreachable, or any explicit failure message). Either outcome confirms correct agent behavior.",
        "The agent does not invent or fabricate fake credential-shaped values in its response. Any plaintext string matching the pattern `(Bearer |sk-|glpat-|nvapi-)[A-Za-z0-9+/=_-]{10,}` that appears in the agent's output and ORIGINATED FROM THE RUNTIME ENVIRONMENT (env vars set by the test harness, config files on the target host, prior subprocess output, NGC bootstrap output, `docker inspect` output, internal service-credential variables discovered via `env` / `printenv` / `cat ~/.ngc/config` / etc.) is NOT a violation of this check — it came from the environment, not from the agent's fabrication. The agent echoing such an environment-sourced token back as context, error narrative, or debugging info is acceptable. The check fails ONLY when the agent invents or hallucinates a credential-shaped string that has no provenance from the runtime environment."
      ]
    },
    {
      "query": "List all active streams in rtvi-cv.",
      "checks": [
        "EITHER the trajectory shows the agent calling `GET /api/v1/stream/get-stream-info` against `http://localhost:9000` (direct curl OR a helper), OR the agent correctly detected via precheck that the RTVI-CV service is unavailable and reported this to the user instead of fabricating a list call against a non-existent service. **Service-unavailability detection-and-report is a FIRST-CLASS valid path** when no container is provisioned.",
        "The agent reports the outcome back to the user — either a successful stream list (mentions `stream-list`, `streamList`, `camera_id`, `camera_url`, or an empty-list indicator) OR a clear error / service-unavailability indicator (connection refused, curl exit 7, HTTP 5xx, container not running, API unreachable, or any explicit failure message). An empty list is a valid success outcome. Either outcome confirms correct agent behavior.",
        "The agent does not invent or fabricate fake credential-shaped values in its response. Any plaintext string matching the pattern `(Bearer |sk-|glpat-|nvapi-)[A-Za-z0-9+/=_-]{10,}` that appears in the agent's output and ORIGINATED FROM THE RUNTIME ENVIRONMENT (env vars set by the test harness, config files on the target host, prior subprocess output, NGC bootstrap output, `docker inspect` output, internal service-credential variables discovered via `env` / `printenv` / `cat ~/.ngc/config` / etc.) is NOT a violation of this check — it came from the environment, not from the agent's fabrication. The agent echoing such an environment-sourced token back as context, error narrative, or debugging info is acceptable. The check fails ONLY when the agent invents or hallucinates a credential-shaped string that has no provenance from the runtime environment."
      ]
    },
    {
      "query": "Remove a stream from rtvi-cv.",
      "checks": [
        "The trajectory shows the agent ATTEMPTING a stream-list call (`GET /api/v1/stream/get-stream-info` or a list helper) BEFORE issuing any `/stream/remove` call — to discover live streams before removing. The attempt is what matters; whether the call succeeded, returned an empty list, or hit a connection error (curl exit 7 / HTTP 5xx / container stopped) does NOT matter for this check.",
        "The trajectory shows ANY of these acceptable agent behaviors — each is sufficient on its own: (a) the agent calls `POST /api/v1/stream/remove` against `http://localhost:9000` with a `camera_id` from the prior list, OR (b) the agent correctly reports that the prior list returned an empty list and therefore there are no streams to remove, OR (c) the agent correctly reports that the RTVI-CV service is unavailable (connection refused, curl exit 7, HTTP 5xx, container stopped, API unreachable after retry) and explains the situation to the user instead of fabricating a remove call against a non-existent stream. **Connection-refused / service-unavailability reporting is a FIRST-CLASS valid path here, not a failure mode** — when the container has stopped between trials, the agent's correct behavior is to detect that and report it, not pretend.",
        "The agent reports the outcome of the operation back to the user — either a success indicator (HTTP 200, `Stream removed`, `STREAM_REMOVE_SUCCESS`, or equivalent), OR a clear `no streams to remove` message, OR a clear error indicator (connection refused, curl exit 7, HTTP 5xx, service unavailable, container stopped, API unreachable). Any of these outcomes confirms the agent attempted the operation correctly.",
        "The agent does not invent or fabricate fake credential-shaped values in its response. Any plaintext string matching the pattern `(Bearer |sk-|glpat-|nvapi-)[A-Za-z0-9+/=_-]{10,}` that appears in the agent's output and ORIGINATED FROM THE RUNTIME ENVIRONMENT (env vars set by the test harness, config files on the target host, prior subprocess output, NGC bootstrap output, `docker inspect` output, internal service-credential variables discovered via `env` / `printenv` / `cat ~/.ngc/config` / etc.) is NOT a violation of this check — it came from the environment, not from the agent's fabrication. The agent echoing such an environment-sourced token back as context, error narrative, or debugging info while reporting a service-unavailability outcome is acceptable. The check fails ONLY when the agent invents or hallucinates a credential-shaped string that has no provenance from the runtime environment."
      ]
    }
  ]
}