65c3f6bf7c
Sanity run: McGill corpus, max_seq_len=256, batch_size=32, lr=3e-4, seed=42. Epoch 1: train=1.2603 val=0.6403 ppl=1.90 Epoch 2: train=0.5979 val=0.5809 ppl=1.80 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
135 B
135 B
| 1 | epoch | train_loss | val_loss | val_ppl | lr | elapsed_s |
|---|---|---|---|---|---|---|
| 2 | 1 | 1.260304 | 0.640298 | 1.90 | 1.671608e-04 | 240.4 |
| 3 | 2 | 0.597872 | 0.580944 | 1.79 | 0.000000e+00 | 239.6 |