Add 5/4, 7/4, 7/8, 9/8 to _VALID_TIMES and VOCAB (TIME_* tokens).
Vocab size grows from 81 to 85 tokens. _parse_metre in the McGill
converter assigns subdivision=8 to 7/8 and 9/8. Spec bumped to v2.2.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- STYLE_user renamed to STYLE_H1K0 in VOCAB (author's personal tag)
- Style field now accepts any [A-Za-z][A-Za-z0-9_]* identifier in .chord files
- Unknown styles fall back to STYLE_other at tokenization time with a log warning
- Test fixtures updated to style: other; drop closed _VALID_STYLES frozenset
- Spec bumped to v2.1: documents open style field, fallback behaviour, and §5.7
guide on registering a new style token
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- src/dataset.py: ChordDataset wrapping .pt files with pad/truncate
- scripts/prepare_data.py: tokenize .chord to .pt with train/val/holdout
split, logs token length stats and style/function distributions
- src/external_converters/mcgill_to_chord.py: rewrite parser for real
McGill v2 format (2-column annotation, each bar in its own pipe group,
interval bass notation e.g. /5 and /b3)
- .gitignore: exclude data/processed/train, val, holdout subdirectories
- tests: 37 new tests for ChordDataset and converter (260 total, all pass)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds src/external_converters/mcgill_to_chord.py with two public functions:
- convert_song(song_dir, output_dir) — converts one salami_chords.txt to
per-section .chord files (4–16 bars each, style=other)
- convert_dataset(dataset_dir, output_dir) — batch converts all songs
Key decisions:
- Harte qualities mapped to our 18-quality vocabulary; hdim7 → m7b5,
parenthetical alterations (e.g. 7(b9)) handled via regex
- Bar duration estimated from median non-trivial chord duration
- Mode (major/minor) inferred from tonic chord quality distribution
- Sections with <4 or >16 bars are skipped with a logged reason
- Unrecognized Harte chords skip the whole section (no silent corruption)
48 new tests in tests/test_mcgill_converter.py; total suite 223 passed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
chord_file_to_midi() parses the period in the user's original key (no
transposition), accumulates held-chord segments, then writes two pretty_midi
tracks: chords with root anchored at octave 4 (MIDI 60–71 + intervals) and
bass at octave 2 (MIDI 36–47). Extension notes are added as a fifth voice
at their standard interval above the root. Tempo is parameterised; the CLI
wrapper (python -m src.midi_export) supports --tempo BPM.
10 tests cover: file creation, parseability, instrument count and names,
chord/bass note counts for a 4-chord C-major fixture (14 chord + 4 bass),
octave placement assertions, and tempo affecting total duration.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds VOCAB (81 tokens), TOKEN_TO_ID, and ID_TO_TOKEN per spec §5.2.
tokenize_period() transposes to C/Am then emits BOS + metadata tokens +
per-bar chord/HOLD/NC tokens + BAR + EOS. detokenize_to_period() is the
exact inverse, returning a ChordPeriod in canonical key. The m(add9)
quality maps to QUAL_m_add9 in the vocab (parentheses not valid in token
names) via _qual_token/_token_qual helpers.
36 new tests cover vocabulary integrity, token sequence structure,
and full round-trip fidelity for all four valid fixture files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
src/tokenizer.py:
- parse_chord_file(Path) → ChordPeriod: reads header + bar body, strips //
comments, validates bar position counts and chord symbols, raises
ChordFormatError with filename and bar number on any violation.
- transpose_to_canonical(ChordPeriod) → ChordPeriod: shifts all chord roots
and bass notes by the semitone offset to C major / A minor; fast-path
returns the original object when shift == 0.
tests/test_chord_file_parser.py: 39 tests covering parsing of 4 valid fixtures
(C major, F# major, B minor, G# minor), error messages for 2 invalid
fixtures, and transposition correctness including slash chord root+bass.
tests/fixtures/: 6 .chord fixture files (4 valid, 2 invalid).
requirements.txt: pinned to current latest stable versions
(torch 2.12.0, music21 10.1.0, pretty_midi 0.2.11, matplotlib 3.10.9,
numpy 2.4.6, pandas 3.0.3, pytest 9.0.3); Python >= 3.11 noted.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>