84ba7b4743
- src/dataset.py: ChordDataset wrapping .pt files with pad/truncate - scripts/prepare_data.py: tokenize .chord to .pt with train/val/holdout split, logs token length stats and style/function distributions - src/external_converters/mcgill_to_chord.py: rewrite parser for real McGill v2 format (2-column annotation, each bar in its own pipe group, interval bass notation e.g. /5 and /b3) - .gitignore: exclude data/processed/train, val, holdout subdirectories - tests: 37 new tests for ChordDataset and converter (260 total, all pass) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
65 lines
736 B
Plaintext
65 lines
736 B
Plaintext
# Python
|
|
__pycache__/
|
|
*.py[cod]
|
|
*.pyo
|
|
*.pyd
|
|
.Python
|
|
*.egg-info/
|
|
dist/
|
|
build/
|
|
*.egg
|
|
.eggs/
|
|
|
|
# Virtual environments
|
|
.venv/
|
|
venv/
|
|
env/
|
|
ENV/
|
|
|
|
# pytest
|
|
.pytest_cache/
|
|
.cache/
|
|
htmlcov/
|
|
.coverage
|
|
coverage.xml
|
|
|
|
# Jupyter
|
|
.ipynb_checkpoints/
|
|
*.ipynb_checkpoints
|
|
|
|
# Model checkpoints (large binaries — commit only intentionally)
|
|
checkpoints/*.pt
|
|
checkpoints/*.pth
|
|
checkpoints/*.ckpt
|
|
|
|
# Processed data (reproducible from source)
|
|
data/processed/*.pt
|
|
data/processed/*.pkl
|
|
data/processed/train/
|
|
data/processed/val/
|
|
data/processed/holdout/
|
|
|
|
# External corpora (download separately; too large for git)
|
|
data/raw_external/
|
|
|
|
# OS
|
|
.DS_Store
|
|
Thumbs.db
|
|
desktop.ini
|
|
|
|
# IDEs
|
|
.idea/
|
|
*.swp
|
|
*.swo
|
|
|
|
# Claude Code
|
|
.claude/
|
|
|
|
# Logs
|
|
*.log
|
|
logs/
|
|
|
|
# Misc
|
|
*.tmp
|
|
*.bak
|