Files
hamori/.gitignore
T
H1K0 84ba7b4743 feat: add dataset, prepare_data pipeline and fix McGill converter
- src/dataset.py: ChordDataset wrapping .pt files with pad/truncate
- scripts/prepare_data.py: tokenize .chord to .pt with train/val/holdout
  split, logs token length stats and style/function distributions
- src/external_converters/mcgill_to_chord.py: rewrite parser for real
  McGill v2 format (2-column annotation, each bar in its own pipe group,
  interval bass notation e.g. /5 and /b3)
- .gitignore: exclude data/processed/train, val, holdout subdirectories
- tests: 37 new tests for ChordDataset and converter (260 total, all pass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 18:09:46 +03:00

65 lines
736 B
Plaintext

# Python
__pycache__/
*.py[cod]
*.pyo
*.pyd
.Python
*.egg-info/
dist/
build/
*.egg
.eggs/
# Virtual environments
.venv/
venv/
env/
ENV/
# pytest
.pytest_cache/
.cache/
htmlcov/
.coverage
coverage.xml
# Jupyter
.ipynb_checkpoints/
*.ipynb_checkpoints
# Model checkpoints (large binaries — commit only intentionally)
checkpoints/*.pt
checkpoints/*.pth
checkpoints/*.ckpt
# Processed data (reproducible from source)
data/processed/*.pt
data/processed/*.pkl
data/processed/train/
data/processed/val/
data/processed/holdout/
# External corpora (download separately; too large for git)
data/raw_external/
# OS
.DS_Store
Thumbs.db
desktop.ini
# IDEs
.idea/
*.swp
*.swo
# Claude Code
.claude/
# Logs
*.log
logs/
# Misc
*.tmp
*.bak