# Skills Eval Benchmark

Generated: 2026-05-29 19:01:32 UTC
Specs: 1

---

## Skill Eval — `skills/vss-deploy-video-embedding/evals/standalone_deploy.json`

Head: `8e2c958` · 1 platform · standalone RT-Embed (`rtvi-embed`) bring-up — no `/vss-deploy-profile` prerequisite
First started: `2026-05-29T18:46:54Z` · Last finished: `2026-05-29T19:01:32Z` · Total: `14m 37s`

| Platform | Result | Reward | Duration | Turns | Prompt tok | Cached tok | Trace |
|---|---|---|---|---|---|---|---|
| L40S | ✅ 1.0 (9/9) | 1.0 | 14m 37s | 35 | 2.4k | 713.5k | trace |

All 9 checks passed. The agent brought up `rtvi-embed` standalone from the `bp_developer_search_2d` Compose profile on host port 8017 — without invoking `/vss-deploy-profile` or `scripts/dev-profile.sh` — staged a writable `VSS_DATA_DIR` with `data_log/vst/clip_storage` pre-created, disabled Kafka / Redis error messages (no broker or Redis peer started), left `start_period: 1200s` unshortened, and waited out the first-boot Cosmos-Embed1-448p download + Triton model-repo build before `/v1/ready` returned 200 and `/v1/models` reported `cosmos-embed1-448p`. Container `vss-rtvi-embed` confirmed running. No credentials were fabricated or echoed.

<sub>Generated by the skills-eval agent. Trial datasets/results live in the workflow artifact at `skills-eval-results-pr-836-26655435373.tar.gz`.</sub>

---
