Runnable end-to-end report combining narrative, code, and inline figures:
data and .chord format, transformer working principle, two-stage training
curves, perplexity (3.58 -> 2.15), distribution-shift plot with a reading
legend, qualitative examples, and a generation demo. Written in a
first-person student voice.
- CLAUDE.md: report is now a Jupyter notebook; GOST formatting dropped
- requirements.txt: add nbconvert + ipykernel (optional, for the notebook)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Single-page form wrapping src.generate.generate_period: pick model, mode,
key, style, function, time, sampling params and optional prefix; returns
the chord grid plus downloadable .chord and .mid files. Russian usage
instructions are embedded on the same page.
Auto-length output is capped at 16 bars (the period maximum) so a model
that never emits EOS can't run away into dozens of NC/hold bars.
Added per the author's explicit request — web UI was previously out of
scope; updated CLAUDE.md and README accordingly. Choices for style/
function/time are derived from VOCAB so the form can't drift from the
tokenizer.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bar boundaries are now implicit — the detokenizer counts positions per bar
using TIME × SUB, and the generator gates EOS to bar boundaries only.
Removing the deterministic BAR token reduces vocab size from 85 to 84 and
lets the model focus on meaningful predictions.
- src/tokenizer.py: drop BAR from VOCAB (85→84); replace BAR-based
detokenize_to_period with position-counting logic; add write_chord_file;
fix _tokens_to_symbol for add9/m(add9) qualities
- tests/test_tokenizer.py: update vocab-size assertions to 84, structural
token test, remove bar-count test, add test_no_bar_token_in_vocab
- docs/chord_format_spec.md: bump to v2.3; document BAR removal in §5.2,
§5.3, §5.4, §5.5, §5.6, §6.2, and changelog
- CLAUDE.md: remove stale BAR reference, update vocab size to 84
- scripts/pretrain.py: raise max_seq_len 256→320 to cover regenerated
McGill data (mean=83, max=283 tokens with BAR-free tokenizer)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>