# AutoModel SFT And PEFT Context Use this pack when configuring `sft/automodel` or `peft/automodel`. ## Product Contract - AutoModel is the HF-native path. It consumes chat-format `training_jsonl` and produces an HF checkpoint for full SFT or a LoRA adapter for PEFT. - Do not feed packed Parquet to AutoModel. Packed Parquet is for Megatron-Bridge SFT/PEFT. - Prefer YAML overrides against the existing step configs. Do not write a new training script unless the repo runner cannot express the request. ## When To Pick AutoModel | User constraint | Decision | |---|---| | HF checkpoint output is required | Prefer AutoModel | | 1-4 GPU iteration or smaller model | Prefer AutoModel | | Non-Nemotron or custom HF model | Prefer AutoModel unless a Megatron-Bridge recipe exists | | Large distributed Megatron checkpoint output | Prefer Megatron-Bridge | | Adapter-only tuning on HF data | `peft/automodel` | ## Required Inputs - `model.pretrained_model_name_or_path`: HF id or local HF checkpoint path. - `dataset.path_or_dataset_id`: chat-format JSONL or dataset id. - Output directory for checkpoints/adapters. - Tokenizer/chat-template expectations if they differ from the model defaults. ## SFT Rules - Use full SFT only when memory is sufficient and a full HF checkpoint is the desired artifact. - Keep batch size, max sequence length, gradient accumulation, and precision explicit in the config for reproducibility. - If the dataset does not already have OpenAI-style `messages`, add a data-prep step before AutoModel rather than changing the trainer. ## PEFT Rules - Record the exact base model with the adapter; `convert/merge_lora` needs the same base checkpoint and tokenizer. - Start with modest LoRA rank and alpha for smoke runs. Raise rank only when the task needs more capacity. - Treat adapter eval and merged-checkpoint eval as separate validation points. ## Failure Modes - `packed_parquet_used_with_automodel`: use source JSONL or switch to `sft/megatron_bridge`. - `chat_template_missing`: use a tokenizer with chat-template support or normalize the dataset. - `oom`: reduce sequence length/batch size, switch to LoRA, or choose a smaller model.