hamori

Author	SHA1	Message	Date
H1K0	c56397df54	docs: add Russian project report notebook (notebooks/report.ipynb) Runnable end-to-end report combining narrative, code, and inline figures: data and .chord format, transformer working principle, two-stage training curves, perplexity (3.58 -> 2.15), distribution-shift plot with a reading legend, qualitative examples, and a generation demo. Written in a first-person student voice. - CLAUDE.md: report is now a Jupyter notebook; GOST formatting dropped - requirements.txt: add nbconvert + ipykernel (optional, for the notebook) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 17:07:00 +03:00
H1K0	c147c47acb	feat: add minimal Gradio web UI (app.py) Single-page form wrapping src.generate.generate_period: pick model, mode, key, style, function, time, sampling params and optional prefix; returns the chord grid plus downloadable .chord and .mid files. Russian usage instructions are embedded on the same page. Auto-length output is capped at 16 bars (the period maximum) so a model that never emits EOS can't run away into dozens of NC/hold bars. Added per the author's explicit request — web UI was previously out of scope; updated CLAUDE.md and README accordingly. Choices for style/ function/time are derived from VOCAB so the form can't drift from the tokenizer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 15:38:01 +03:00
H1K0	4aead2ea20	feat: remove BAR token; bump spec to v2.3; fix max_seq_len Bar boundaries are now implicit — the detokenizer counts positions per bar using TIME × SUB, and the generator gates EOS to bar boundaries only. Removing the deterministic BAR token reduces vocab size from 85 to 84 and lets the model focus on meaningful predictions. - src/tokenizer.py: drop BAR from VOCAB (85→84); replace BAR-based detokenize_to_period with position-counting logic; add write_chord_file; fix _tokens_to_symbol for add9/m(add9) qualities - tests/test_tokenizer.py: update vocab-size assertions to 84, structural token test, remove bar-count test, add test_no_bar_token_in_vocab - docs/chord_format_spec.md: bump to v2.3; document BAR removal in §5.2, §5.3, §5.4, §5.5, §5.6, §6.2, and changelog - CLAUDE.md: remove stale BAR reference, update vocab size to 84 - scripts/pretrain.py: raise max_seq_len 256→320 to cover regenerated McGill data (mean=83, max=283 tokens with BAR-free tokenizer) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 13:56:34 +03:00
H1K0	75fa07bf6c	docs: add README, architecture, glossary, requirements; update CLAUDE.md Add four Russian-language project documents: - README.md: user-facing guide (install, quick start, data prep, training, evaluation, limitations) - docs/architecture.md v1.0: system architecture, data flow diagrams, module interfaces, 7 architectural decision records, extension points - docs/glossary.md v1.0: musical, ML, and project-specific term definitions - docs/requirements.md v1.0: functional/non-functional requirements, acceptance criteria, four use-case scenarios Update CLAUDE.md with project name etymology (hamori / ハモリ) and rename repo root reference from chord-gen to hamori. Refine chord_format_spec.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 11:00:21 +03:00
H1K0	8672c10f78	chore: initialize project scaffold Add .gitignore (excludes .claude/, venv, checkpoints, processed data, external corpora), .gitattributes (LF normalization, binary markers), full directory tree with .gitkeep placeholders, and src __init__ stubs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 10:28:17 +03:00

5 Commits