A sequence of only metadata tokens followed by EOS would silently return a
ChordPeriod with bars=[], which would later crash or produce an empty .chord
file. Now raises immediately with a descriptive message. Added a failing-then-
passing test to cover this path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bar boundaries are now implicit — the detokenizer counts positions per bar
using TIME × SUB, and the generator gates EOS to bar boundaries only.
Removing the deterministic BAR token reduces vocab size from 85 to 84 and
lets the model focus on meaningful predictions.
- src/tokenizer.py: drop BAR from VOCAB (85→84); replace BAR-based
detokenize_to_period with position-counting logic; add write_chord_file;
fix _tokens_to_symbol for add9/m(add9) qualities
- tests/test_tokenizer.py: update vocab-size assertions to 84, structural
token test, remove bar-count test, add test_no_bar_token_in_vocab
- docs/chord_format_spec.md: bump to v2.3; document BAR removal in §5.2,
§5.3, §5.4, §5.5, §5.6, §6.2, and changelog
- CLAUDE.md: remove stale BAR reference, update vocab size to 84
- scripts/pretrain.py: raise max_seq_len 256→320 to cover regenerated
McGill data (mean=83, max=283 tokens with BAR-free tokenizer)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add 5/4, 7/4, 7/8, 9/8 to _VALID_TIMES and VOCAB (TIME_* tokens).
Vocab size grows from 81 to 85 tokens. _parse_metre in the McGill
converter assigns subdivision=8 to 7/8 and 9/8. Spec bumped to v2.2.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- STYLE_user renamed to STYLE_H1K0 in VOCAB (author's personal tag)
- Style field now accepts any [A-Za-z][A-Za-z0-9_]* identifier in .chord files
- Unknown styles fall back to STYLE_other at tokenization time with a log warning
- Test fixtures updated to style: other; drop closed _VALID_STYLES frozenset
- Spec bumped to v2.1: documents open style field, fallback behaviour, and §5.7
guide on registering a new style token
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds VOCAB (81 tokens), TOKEN_TO_ID, and ID_TO_TOKEN per spec §5.2.
tokenize_period() transposes to C/Am then emits BOS + metadata tokens +
per-bar chord/HOLD/NC tokens + BAR + EOS. detokenize_to_period() is the
exact inverse, returning a ChordPeriod in canonical key. The m(add9)
quality maps to QUAL_m_add9 in the vocab (parentheses not valid in token
names) via _qual_token/_token_qual helpers.
36 new tests cover vocabulary integrity, token sequence structure,
and full round-trip fidelity for all four valid fixture files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>