Runnable end-to-end report combining narrative, code, and inline figures: data and .chord format, transformer working principle, two-stage training curves, perplexity (3.58 -> 2.15), distribution-shift plot with a reading legend, qualitative examples, and a generation demo. Written in a first-person student voice. - CLAUDE.md: report is now a Jupyter notebook; GOST formatting dropped - requirements.txt: add nbconvert + ipykernel (optional, for the notebook) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
10 KiB
CLAUDE.md
This file gives Claude Code persistent context for the project. Read it before any non-trivial task.
Project overview
Name. hamori (Japanese ハモリ, "harmonization" in the sense of vocal harmony — adding a second voice to a melodic line). The name reflects the project's core idea: the model proposes harmonic ideas to complement a composer's existing intent, rather than writing music from scratch.
Goal. Train a small autoregressive transformer to generate harmonic periods (4–16 bar chord progressions) in the author's compositional style. Coursework deliverable for an ML class at RTU MIREA; also intended as a working creative tool.
Unit of generation. A single closed harmonic phrase (a "period"), not a full song.
Pipeline.
- Hand-transcribe own compositions from REAPER DAW projects into
.chordtext files. - Parse
.chord→ factorized token sequences. - Pre-train on a public corpus (McGill Billboard or similar).
- Fine-tune on the author's own corpus.
- Sample new periods conditioned on mode / time / style / function / optional chord prefix.
- Detokenize back to
.chord+ export to MIDI for use in REAPER.
Hard deadline. Less than one month, ~50 hours of work budget.
Tech stack
- Python 3.11+
- PyTorch (no Lightning unless complexity demands it — keep training loops readable)
- music21 for chord symbol parsing (
music21.harmony.ChordSymbol) - pretty_midi for MIDI generation
- pytest for unit tests
- matplotlib for plots in the report
- NumPy, pandas as standard
- Optional: Google Colab for training if local hardware is insufficient. Model is small enough that CPU is viable.
Avoid heavy abstractions. This is coursework, not a production system. Prefer simple imperative scripts over framework-style code.
Repository layout
hamori/
├── CLAUDE.md ← this file
├── README.md
├── requirements.txt
├── docs/
│ └── chord_format_spec.md ← authoritative format specification
├── data/
│ ├── raw_user/ ← hand-transcribed .chord files (own corpus)
│ ├── raw_external/ ← public corpora (McGill Billboard etc.)
│ ├── processed/ ← tokenized .pt files ready for training
│ └── holdout/ ← held-out periods for evaluation
├── src/
│ ├── __init__.py
│ ├── tokenizer.py ← .chord ↔ token sequences
│ ├── chord_parser.py ← chord symbol → (root, qual, ext, bass)
│ ├── midi_export.py ← .chord → MIDI for sanity check & user output
│ ├── dataset.py ← PyTorch Dataset over tokenized files
│ ├── model.py ← small transformer
│ ├── train.py ← pre-train and fine-tune entry points
│ ├── generate.py ← inference / sampling
│ ├── evaluate.py ← perplexity + distribution metrics
│ └── external_converters/
│ └── mcgill_to_chord.py ← convert McGill Billboard to .chord
├── tests/
│ ├── test_chord_parser.py
│ ├── test_tokenizer.py
│ ├── test_midi_export.py
│ └── fixtures/
│ └── *.chord
├── notebooks/
│ ├── 01_data_exploration.ipynb
│ ├── 02_training.ipynb
│ └── 03_evaluation.ipynb
└── checkpoints/
├── pretrained.pt
└── finetuned.pt
The .chord format
The authoritative specification is in docs/chord_format_spec.md. Always read it before modifying anything that touches the format or the tokenizer. Critical points summarized here for context only — if anything conflicts, the spec wins.
- One file = one harmonic period (4–16 bars).
- Header lines start with
#, listtitle,key,time,subdivision,style, optionalfunction. - Body: bars separated by
|, exactlysubdivisionpositions per bar (for 4/4), positions separated by single spaces. - A position holds: chord symbol,
.(hold previous),NC(no chord), or?(unknown). - Chord symbols:
<root><quality?><extension?>(/<bass>)?. 18 qualities, 7 extensions, slash inversions are mandatory and meaningful. - Tokenization: each new chord becomes exactly 4 tokens (
ROOT_x,QUAL_x,EXT_x,BASS_x). Hold =HOLD. Bar boundaries are not tokens — the detokenizer reconstructs them by counting positions (TIME×SUB). Plus metadata tokens at the start. - Keys are normalized. Before tokenization, the entire period is transposed: majors → C major, minors → A minor. The model never sees absolute keys. The vocabulary contains
MODE_major/MODE_minorbut noKEY_xtokens. - Vocabulary size: 84 tokens.
Model
A small autoregressive transformer:
- Layers: 2–4
- d_model: 128–256
- Heads: 4–8
- FFN dim: 4 × d_model
- Context length: 512 tokens (more than enough for any single period)
- Tied input/output embeddings
- Standard causal mask, next-token prediction with cross-entropy
- AdamW, cosine schedule, warmup ~5% of steps
- Dropout 0.1–0.2
Pre-training uses the full public corpus. Fine-tuning uses the own corpus with a smaller learning rate (e.g. 1e-5 vs 1e-4 for pre-training) and few epochs (5–15) to avoid catastrophic forgetting of harmonic regularities learned during pre-training.
Inference
- Top-p sampling (nucleus, p ≈ 0.9) with temperature ≈ 1.0 as defaults. Tunable.
- No beam search — it generally hurts on generative tasks like this.
- Generation is conditioned by feeding the BOS + metadata tokens explicitly, then optionally a chord prefix from the user.
- After generation, transpose from C/Am to the user's requested key.
- Output: both a
.chordfile and a MIDI file.
Evaluation
For the report:
- Perplexity on the holdout set, comparing pre-trained baseline vs fine-tuned.
- Distribution shift plots — histograms over chord qualities, extension presence, inversion frequency, root motion intervals — showing how fine-tuning moves the distribution toward the author's corpus.
- Qualitative cherry-picked generations — 3 examples with the same seed/prefix, generated by baseline vs fine-tuned, rendered to MIDI.
No formal blind listening test (out of scope for the deadline).
Working language
- Code, identifiers, code comments, log messages, commit messages: English.
- User-facing output, the academic report, and the README user guide: Russian (per university requirements). The report is delivered as a runnable Jupyter notebook (
notebooks/report.ipynb), not a GOST-formatted document — strict GOST formatting was explicitly dropped by the author (this is a student project, not formal coursework). - Conversations with the developer (the author): Russian.
When generating commit messages or code comments, write in English. When generating the report or any user-facing text, write in Russian.
Code style & conventions
- Type hints on all public functions.
- Docstrings: one-line summary + Args/Returns. Keep them concise.
- No
print()in library code — use theloggingmodule. CLI scripts may useprint. - Constants in
UPPER_SNAKE_CASEat module top. - Vocabulary, token IDs, and label maps live in
src/tokenizer.pyas module-level constants. - Random seeds: every training / generation script accepts a
--seedflag and setstorch.manual_seed,numpy.random.seed,random.seed. - Reproducibility is more important than performance. If a choice is between "fast" and "deterministic", choose deterministic.
Testing policy
- Every parser/tokenizer/MIDI module has unit tests.
- Tests use small
.chordfixtures intests/fixtures/. - Round-trip property:
tokenize(parse(file))followed bydetokenize(...)must reproduce the chord sequence (up to canonical normalization). - Don't bother unit-testing the training loop or the model. Test the data path.
Things to never do
- Do not change the
.chordformat without first updatingdocs/chord_format_spec.mdand bumping its version number. The format is the contract between the human-readable data and the model; changing one side silently breaks everything. - Do not modify files in
data/holdout/or use them during training. Holdout is held out. - Do not add new model architectures "to compare" unless explicitly asked. One model, done well, beats four half-done.
- Do not implement bells and whistles (real-time audio synthesis, beam search, voicing models). They are explicitly out of scope. (Exception: a minimal Gradio web UI in
app.pywas added at the author's explicit request — maintain it, but do not expand it into a full editor/DAW.) - Do not silently round or coerce unrecognized chord symbols. If a chord can't be parsed, raise an error with the file name, bar number, and position. Silent corruption of training data is the worst failure mode here.
Things to always do
- When asked to add a feature, first identify which module it belongs in (
src/...) and whether it requires a spec change. State this before writing code. - When the user describes a bug, write a failing test first, then fix.
- When dependencies change, update
requirements.txt. - When adding a CLI script, include
--helpoutput and a usage example in the script's docstring. - When generating a long answer, ask whether a CLI flag or a config file is preferred for new parameters. Default to CLI flags for simplicity.
Out-of-scope (explicit non-goals for this deliverable)
- Melody generation
- Voice leading / voicing inside chords above the bass
- Rhythmic patterns inside a held chord
- Arrangement, timbre, dynamics
- Web interface / GUI beyond the minimal Gradio form in
app.py(added by explicit request; no notation editor, playback, or version history) - Real-time MIDI integration with REAPER
- Modulation handling inside a single period
- J-Pop fine-tuning experiment (future work after coursework deadline)
If the user asks for any of these, remind them it's out of scope and ask whether to proceed anyway or defer.