# Megatron-Bridge SFT And PEFT Context

Use this pack for:

- `sft/megatron_bridge`: full or LoRA SFT on packed Parquet.
- `peft/megatron_bridge`: LoRA adapter training on packed Parquet.

## Product Contract

- These steps consume `packed_parquet` from `data_prep/sft_packing`.
- `sft/megatron_bridge` produces a Megatron checkpoint.
- `peft/megatron_bridge` produces a LoRA adapter. Plan conversion/merge when an
  HF deployment artifact is required.
- Prefer YAML overrides on existing configs. Do not fork Megatron-Bridge scripts
  for normal SFT/PEFT.

## Required Wiring

- `dataset.packed_sequence_specs.packed_train_data_path`: training Parquet glob.
- Validation/test packed paths when the config schedules validation.
- `seq_length` equal to the data-prep `pack_size`.
- `checkpoint.pretrained_checkpoint` when adapting a Megatron base.
- Distinct output directories for base checkpoint, adapter, and final merged
  artifact.

## Backend Choice

| Need | Step |
|---|---|
| Full large-scale SFT with Megatron checkpoint output | `sft/megatron_bridge` |
| Adapter tuning on Megatron checkpoint | `peft/megatron_bridge` |
| HF-native checkpoint with JSONL data | `sft/automodel` or `peft/automodel` |
| Memory is too tight for full SFT | PEFT/LoRA first |

## Config Rules

- Start with micro batch size 1 for new shapes.
- Keep global batch size divisible by data-parallel size.
- Use `load_hf_weights=false` when starting from a Megatron checkpoint.
- For adapter reliability, prefer simple checkpoint saves over async/optimizer
  saves unless the user explicitly needs them.

## Failure Modes

- `missing_packed_data`: run `data_prep/sft_packing`.
- `sequence_length_mismatch`: repack data or align `seq_length`.
- `missing_base_checkpoint`: set the Megatron base checkpoint for PEFT.
- `oom`: lower micro batch, enable recomputation, increase parallelism, or use
  LoRA.
