# Evaluation Deployment Context

Use this pack when `eval/model_eval` needs deployment guidance for hosted
endpoints or Megatron checkpoints. HF checkpoints require an existing hosted
endpoint or a conversion path before using the checked-in Megatron deployment
config.

## Product Contract

- Prefer an existing OpenAI-compatible endpoint when the user provides one.
- If deployment is part of the eval step, use the checked-in config and env
  profile. Do not fabricate a deployment service in skill guidance.
- Keep deployment and evaluation artifacts separate from training outputs.

## Artifact Routing

| Input artifact | Deployment path |
|---|---|
| `checkpoint_hf` | Existing hosted endpoint, or convert to a supported deployment format first |
| `checkpoint_megatron` | `eval/model_eval/config/default.yaml` Megatron deployment path |
| LoRA adapter | merge or load adapter with base before evaluation, depending on supported serving path |
| Existing URL | skip deployment and configure evaluator against the URL |

## Endpoint Rules

- Chat/instruction benchmarks need a chat-compatible endpoint.
- Logprob benchmarks need completions/logprobs support and a matching tokenizer.
- Keep model name, URL, and API-key env var explicit in config or CLI
  overrides.
- Do not print resolved secret values.

## Remote Rules

- For Lepton/Slurm/DGX Cloud, pick the deployment/eval profile from env TOML.
- Verify mounted checkpoint paths exist inside the runtime container.
- Use dry-run or a limited benchmark only to validate launch wiring.

## Failure Modes

- `bad_megatron_checkpoint_path`: point at the concrete `iter_*` checkpoint.
- `endpoint_not_ready`: health-check the service before starting evaluation.
- `missing_auth`: set the endpoint API key env var in the runtime.