README:
- processed/ tree now shows mcgill/ and user/ subdirs
- --style user -> --style H1K0 in quick-start prefix example
- pretrained.report.txt and finetuned.report.txt added to artifact tables
architecture.md (-> v1.1):
- remove stale music21 fallback mention from chord_parser section
- fix ChordDataset: on-demand loading, not eager; remove non-existent
make_dataloader from public interface
- fix train function name: train_model -> train
- update logging description: report goes to .report.txt, not stdout
- note that scripts use max_seq_len=256 (sequences top out at 195 tokens)
requirements.md (-> v1.1):
- FT-12: update from unified script to pretrain.py + train.py pair
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- scripts/run_pretrain.py -> scripts/pretrain.py: pre-trains on McGill
corpus (data/processed/mcgill/), saves checkpoints/pretrained.pt.
- scripts/train.py: rewritten as high-level fine-tune wrapper; loads
pretrained.pt, trains on data/processed/user/, saves finetuned.pt.
Both scripts include timing estimate, loss-curve plot, per-epoch report,
and --skip-training flag.
- README: updated section 7 to reflect new script names and separate
data directories.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>