docs: add README, architecture, glossary, requirements; update CLAUDE.md

Add four Russian-language project documents:
- README.md: user-facing guide (install, quick start, data prep, training,
  evaluation, limitations)
- docs/architecture.md v1.0: system architecture, data flow diagrams,
  module interfaces, 7 architectural decision records, extension points
- docs/glossary.md v1.0: musical, ML, and project-specific term definitions
- docs/requirements.md v1.0: functional/non-functional requirements,
  acceptance criteria, four use-case scenarios

Update CLAUDE.md with project name etymology (hamori / ハモリ) and rename
repo root reference from chord-gen to hamori. Refine chord_format_spec.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-19 11:00:21 +03:00
parent 9929209bcf
commit 75fa07bf6c
6 changed files with 2312 additions and 5 deletions
+13 -2
View File
@@ -4,11 +4,20 @@ This file gives Claude Code persistent context for the project. Read it before a
## Project overview
**Goal.** Train a small autoregressive transformer to generate harmonic periods (416 bar chord progressions) in the author's compositional style. Coursework deliverable for an ML class at RTU MIREA; also intended as a working creative tool.
**Name.** _hamori_ (Japanese ハモリ, "harmonization" in the sense of vocal
harmony — adding a second voice to a melodic line). The name reflects the
project's core idea: the model proposes harmonic ideas to complement a
composer's existing intent, rather than writing music from scratch.
**Goal.** Train a small autoregressive transformer to generate harmonic
periods (416 bar chord progressions) in the author's compositional style.
Coursework deliverable for an ML class at RTU MIREA; also intended as a
working creative tool.
**Unit of generation.** A single closed harmonic phrase (a "period"), not a full song.
**Pipeline.**
1. Hand-transcribe own compositions from REAPER DAW projects into `.chord` text files.
2. Parse `.chord` → factorized token sequences.
3. Pre-train on a public corpus (McGill Billboard or similar).
@@ -34,7 +43,7 @@ Avoid heavy abstractions. This is coursework, not a production system. Prefer si
## Repository layout
```
chord-gen/
hamori/
├── CLAUDE.md ← this file
├── README.md
├── requirements.txt
@@ -88,6 +97,7 @@ The authoritative specification is in `docs/chord_format_spec.md`. **Always read
## Model
A small autoregressive transformer:
- Layers: 24
- d_model: 128256
- Heads: 48
@@ -111,6 +121,7 @@ Pre-training uses the full public corpus. Fine-tuning uses the own corpus with a
## Evaluation
For the report:
1. **Perplexity** on the holdout set, comparing pre-trained baseline vs fine-tuned.
2. **Distribution shift plots** — histograms over chord qualities, extension presence, inversion frequency, root motion intervals — showing how fine-tuning moves the distribution toward the author's corpus.
3. **Qualitative cherry-picked generations** — 3 examples with the same seed/prefix, generated by baseline vs fine-tuned, rendered to MIDI.