hamori

Author	SHA1	Message	Date
H1K0	4aead2ea20	feat: remove BAR token; bump spec to v2.3; fix max_seq_len Bar boundaries are now implicit — the detokenizer counts positions per bar using TIME × SUB, and the generator gates EOS to bar boundaries only. Removing the deterministic BAR token reduces vocab size from 85 to 84 and lets the model focus on meaningful predictions. - src/tokenizer.py: drop BAR from VOCAB (85→84); replace BAR-based detokenize_to_period with position-counting logic; add write_chord_file; fix _tokens_to_symbol for add9/m(add9) qualities - tests/test_tokenizer.py: update vocab-size assertions to 84, structural token test, remove bar-count test, add test_no_bar_token_in_vocab - docs/chord_format_spec.md: bump to v2.3; document BAR removal in §5.2, §5.3, §5.4, §5.5, §5.6, §6.2, and changelog - CLAUDE.md: remove stale BAR reference, update vocab size to 84 - scripts/pretrain.py: raise max_seq_len 256→320 to cover regenerated McGill data (mean=83, max=283 tokens with BAR-free tokenizer) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 13:56:34 +03:00
H1K0	0682ccc140	docs: actualize README, architecture, requirements (v1.1) README: - processed/ tree now shows mcgill/ and user/ subdirs - --style user -> --style H1K0 in quick-start prefix example - pretrained.report.txt and finetuned.report.txt added to artifact tables architecture.md (-> v1.1): - remove stale music21 fallback mention from chord_parser section - fix ChordDataset: on-demand loading, not eager; remove non-existent make_dataloader from public interface - fix train function name: train_model -> train - update logging description: report goes to .report.txt, not stdout - note that scripts use max_seq_len=256 (sequences top out at 195 tokens) requirements.md (-> v1.1): - FT-12: update from unified script to pretrain.py + train.py pair Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 12:46:09 +03:00
H1K0	555205b7d2	docs: actualize vocab size (81→85), spec version (2.0→2.2), style tag (user→H1K0) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 03:23:55 +03:00
H1K0	f0352015cf	docs: simplify §8 filename convention to snake_case_title-function.chord Replace the YYYY_NNN_kebab-case scheme with title_in_snake_case-function.chord. Snake_case makes the title double-click-selectable; dash unambiguously separates the title from the optional function suffix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 00:51:22 +03:00
H1K0	3cd9c29d9f	feat: extend time signature support to 9 metres (5/4, 7/4, 7/8, 9/8) Add 5/4, 7/4, 7/8, 9/8 to _VALID_TIMES and VOCAB (TIME_* tokens). Vocab size grows from 81 to 85 tokens. _parse_metre in the McGill converter assigns subdivision=8 to 7/8 and 9/8. Spec bumped to v2.2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 00:37:05 +03:00
H1K0	4fd8ece170	refactor: replace fixed STYLE_user with open-ended style tag system - STYLE_user renamed to STYLE_H1K0 in VOCAB (author's personal tag) - Style field now accepts any [A-Za-z][A-Za-z0-9_]* identifier in .chord files - Unknown styles fall back to STYLE_other at tokenization time with a log warning - Test fixtures updated to style: other; drop closed _VALID_STYLES frozenset - Spec bumped to v2.1: documents open style field, fallback behaviour, and §5.7 guide on registering a new style token Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-20 00:29:52 +03:00
H1K0	355341dab9	docs: revise glossary to v1.1 — fix factual errors, improve ML explanations for beginners Six factual corrections: slash chord definition (inversions vs on-chords), logits described as unnormalized, #11 = eleventh not fourth, duplicate Polukadentsiya sentence removed, pre-training LR corrected to 1e-4, unverified PAD index claim removed. All ML term explanations rewritten with analogies accessible to a near-beginner. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 15:39:25 +03:00
H1K0	75fa07bf6c	docs: add README, architecture, glossary, requirements; update CLAUDE.md Add four Russian-language project documents: - README.md: user-facing guide (install, quick start, data prep, training, evaluation, limitations) - docs/architecture.md v1.0: system architecture, data flow diagrams, module interfaces, 7 architectural decision records, extension points - docs/glossary.md v1.0: musical, ML, and project-specific term definitions - docs/requirements.md v1.0: functional/non-functional requirements, acceptance criteria, four use-case scenarios Update CLAUDE.md with project name etymology (hamori / ハモリ) and rename repo root reference from chord-gen to hamori. Refine chord_format_spec.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 11:00:21 +03:00
H1K0	9929209bcf	docs: update chord format spec to v2.0 Revise and expand the authoritative .chord format specification: version bump 1.0 → 2.0, clarified period scope, updated tokenization rules, grammar tables, and key normalization details. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 10:41:57 +03:00
H1K0	967257a555	docs: add chord format specification Authoritative .chord file format spec covering header fields, body syntax, chord symbol grammar, tokenization rules, and key normalization. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 10:40:39 +03:00

10 Commits