val loss 1.19 → 0.77, val perplexity 3.29 → 2.15.
Best epoch 20, early stop at epoch 30 (patience=10).
Improvement over previous lr=1e-5 run (best val ppl 2.22).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
val loss 1.24 → 0.80, val perplexity 3.47 → 2.22.
Best epoch 50 (no early stop); convergence epoch 30.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Re-run pre-training results with the corrected 84-token vocabulary and
max_seq_len=320. Previous checkpoint was trained on stale data with BAR
tokens and a corrupted tokenizer.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>