refactor: reorganize data/processed/ into mcgill/ and user/ subdirs

Moved data/processed/{train,val,holdout}/ → data/processed/mcgill/{train,val,holdout}/
so both corpora have their own namespace under data/processed/.
Updated PRETRAIN_DATA paths in make_colab_zip.py accordingly
(path remap workaround no longer needed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-21 19:47:32 +03:00
parent 8f657ca916
commit c4dd2fb690
+4 -4
View File
@@ -4,8 +4,8 @@ pretrain mode (default):
- src/ (all Python modules)
- scripts/pretrain.py
- requirements.txt
- data/processed/mcgill/train/*.pt (remapped from data/processed/train/)
- data/processed/mcgill/val/*.pt (remapped from data/processed/val/)
- data/processed/mcgill/train/*.pt
- data/processed/mcgill/val/*.pt
finetune mode:
- src/ (all Python modules)
@@ -57,8 +57,8 @@ MODE_SCRIPTS: dict[str, list[str]] = {
# Local dir → arc path inside zip
PRETRAIN_DATA: list[tuple[Path, str]] = [
(ROOT / "data" / "processed" / "train", "data/processed/mcgill/train"),
(ROOT / "data" / "processed" / "val", "data/processed/mcgill/val"),
(ROOT / "data" / "processed" / "mcgill" / "train", "data/processed/mcgill/train"),
(ROOT / "data" / "processed" / "mcgill" / "val", "data/processed/mcgill/val"),
]
FINETUNE_DATA: list[tuple[Path, str]] = [