AdamW + cosine-with-warmup schedule, PAD-ignoring cross-entropy, per-epoch
CSV logging, best-val-loss checkpointing, early stopping (patience=5).
Same script handles both pre-training and fine-tuning via --init-from.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- src/dataset.py: ChordDataset wrapping .pt files with pad/truncate
- scripts/prepare_data.py: tokenize .chord to .pt with train/val/holdout
split, logs token length stats and style/function distributions
- src/external_converters/mcgill_to_chord.py: rewrite parser for real
McGill v2 format (2-column annotation, each bar in its own pipe group,
interval bass notation e.g. /5 and /b3)
- .gitignore: exclude data/processed/train, val, holdout subdirectories
- tests: 37 new tests for ChordDataset and converter (260 total, all pass)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>