8a73394df9
Re-run pre-training results with the corrected 84-token vocabulary and max_seq_len=320. Previous checkpoint was trained on stale data with BAR tokens and a corrupted tokenizer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2.2 KiB
2.2 KiB
| 1 | epoch | train_loss | val_loss | val_ppl | lr | elapsed_s |
|---|---|---|---|---|---|---|
| 2 | 1 | 2.043105 | 0.860380 | 2.36 | 2.205000e-04 | 13.1 |
| 3 | 2 | 0.682436 | 0.587271 | 1.80 | 2.998721e-04 | 11.9 |
| 4 | 3 | 0.567941 | 0.544875 | 1.72 | 2.991598e-04 | 12.0 |
| 5 | 4 | 0.529446 | 0.512912 | 1.67 | 2.978255e-04 | 12.5 |
| 6 | 5 | 0.505409 | 0.490817 | 1.63 | 2.958747e-04 | 12.4 |
| 7 | 6 | 0.484891 | 0.471718 | 1.60 | 2.933156e-04 | 12.5 |
| 8 | 7 | 0.467122 | 0.456903 | 1.58 | 2.901587e-04 | 12.7 |
| 9 | 8 | 0.450230 | 0.442813 | 1.56 | 2.864174e-04 | 12.9 |
| 10 | 9 | 0.435896 | 0.428490 | 1.53 | 2.821072e-04 | 13.1 |
| 11 | 10 | 0.425630 | 0.420062 | 1.52 | 2.772460e-04 | 13.1 |
| 12 | 11 | 0.414810 | 0.411151 | 1.51 | 2.718542e-04 | 12.9 |
| 13 | 12 | 0.405492 | 0.409687 | 1.51 | 2.659542e-04 | 12.9 |
| 14 | 13 | 0.396882 | 0.391923 | 1.48 | 2.595706e-04 | 12.9 |
| 15 | 14 | 0.387616 | 0.387274 | 1.47 | 2.527301e-04 | 12.8 |
| 16 | 15 | 0.379135 | 0.385116 | 1.47 | 2.454612e-04 | 12.9 |
| 17 | 16 | 0.371748 | 0.374518 | 1.45 | 2.377941e-04 | 13.0 |
| 18 | 17 | 0.364497 | 0.367260 | 1.44 | 2.297610e-04 | 12.9 |
| 19 | 18 | 0.357427 | 0.364524 | 1.44 | 2.213952e-04 | 12.9 |
| 20 | 19 | 0.350312 | 0.358540 | 1.43 | 2.127316e-04 | 12.9 |
| 21 | 20 | 0.342951 | 0.349801 | 1.42 | 2.038065e-04 | 12.9 |
| 22 | 21 | 0.337651 | 0.343782 | 1.41 | 1.946569e-04 | 12.8 |
| 23 | 22 | 0.330809 | 0.337008 | 1.40 | 1.853211e-04 | 12.8 |
| 24 | 23 | 0.324771 | 0.332336 | 1.39 | 1.758381e-04 | 12.8 |
| 25 | 24 | 0.319391 | 0.324907 | 1.38 | 1.662472e-04 | 12.8 |
| 26 | 25 | 0.314073 | 0.321501 | 1.38 | 1.565886e-04 | 12.9 |
| 27 | 26 | 0.309813 | 0.317718 | 1.37 | 1.469026e-04 | 12.8 |
| 28 | 27 | 0.304261 | 0.313438 | 1.37 | 1.372294e-04 | 12.9 |
| 29 | 28 | 0.299998 | 0.310763 | 1.36 | 1.276095e-04 | 12.9 |
| 30 | 29 | 0.295039 | 0.307241 | 1.36 | 1.180830e-04 | 12.9 |
| 31 | 30 | 0.290108 | 0.303446 | 1.35 | 1.086896e-04 | 12.8 |
| 32 | 31 | 0.288020 | 0.302041 | 1.35 | 9.946846e-05 | 12.8 |
| 33 | 32 | 0.283507 | 0.299317 | 1.35 | 9.045806e-05 | 12.8 |
| 34 | 33 | 0.280522 | 0.294816 | 1.34 | 8.169597e-05 | 12.8 |
| 35 | 34 | 0.275877 | 0.291919 | 1.34 | 7.321873e-05 | 12.9 |
| 36 | 35 | 0.273687 | 0.288819 | 1.33 | 6.506170e-05 | 12.8 |
| 37 | 36 | 0.270566 | 0.287831 | 1.33 | 5.725888e-05 | 13.0 |
| 38 | 37 | 0.267893 | 0.286515 | 1.33 | 4.984283e-05 | 13.0 |
| 39 | 38 | 0.265996 | 0.284756 | 1.33 | 4.284447e-05 | 13.0 |
| 40 | 39 | 0.264527 | 0.283663 | 1.33 | 3.629298e-05 | 13.0 |
| 41 | 40 | 0.262261 | 0.282717 | 1.33 | 3.021569e-05 | 12.9 |
| 42 | 41 | 0.260812 | 0.282175 | 1.33 | 2.463794e-05 | 12.8 |
| 43 | 42 | 0.258872 | 0.280704 | 1.32 | 1.958300e-05 | 12.8 |
| 44 | 43 | 0.257864 | 0.280204 | 1.32 | 1.507193e-05 | 12.8 |
| 45 | 44 | 0.256770 | 0.279358 | 1.32 | 1.112356e-05 | 12.8 |
| 46 | 45 | 0.254942 | 0.279263 | 1.32 | 7.754357e-06 | 13.0 |
| 47 | 46 | 0.255560 | 0.278873 | 1.32 | 4.978363e-06 | 12.8 |
| 48 | 47 | 0.255011 | 0.278650 | 1.32 | 2.807158e-06 | 12.9 |
| 49 | 48 | 0.254304 | 0.278583 | 1.32 | 1.249797e-06 | 12.8 |
| 50 | 49 | 0.252442 | 0.278481 | 1.32 | 3.127754e-07 | 12.8 |
| 51 | 50 | 0.253867 | 0.278494 | 1.32 | 0.000000e+00 | 12.8 |