data: update pretrained checkpoint results (BAR-free tokenizer)
Re-run pre-training results with the corrected 84-token vocabulary and max_seq_len=320. Previous checkpoint was trained on stale data with BAR tokens and a corrupted tokenizer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -3,65 +3,65 @@
|
||||
PRE-TRAINING REPORT
|
||||
====================================================
|
||||
Total epochs run : 50
|
||||
Best epoch (val loss) : 48
|
||||
Best epoch (val loss) : 49
|
||||
Convergence epoch : 42 (val ≤ best+1 %)
|
||||
Best val loss : 0.2542
|
||||
Best val perplexity : 1.29
|
||||
Final train loss : 0.2371
|
||||
Unique parameters : 1,384,128
|
||||
Best val loss : 0.2785
|
||||
Best val perplexity : 1.32
|
||||
Final train loss : 0.2539
|
||||
Unique parameters : 1,396,416
|
||||
Checkpoint : checkpoints/pretrained.pt
|
||||
Log CSV : checkpoints/pretrained.log.csv
|
||||
====================================================
|
||||
|
||||
epoch train val ppl lr
|
||||
----- -------- -------- ------- ----------
|
||||
1 2.0319 0.8082 2.24 2.20e-04
|
||||
2 0.6414 0.5509 1.73 3.00e-04
|
||||
3 0.5239 0.4964 1.64 2.99e-04
|
||||
4 0.4857 0.4720 1.60 2.98e-04
|
||||
5 0.4642 0.4475 1.56 2.96e-04
|
||||
6 0.4460 0.4348 1.54 2.93e-04
|
||||
7 0.4320 0.4170 1.52 2.90e-04
|
||||
8 0.4177 0.4097 1.51 2.86e-04
|
||||
9 0.4056 0.3969 1.49 2.82e-04
|
||||
10 0.3948 0.3910 1.48 2.77e-04
|
||||
11 0.3846 0.3788 1.46 2.72e-04
|
||||
12 0.3762 0.3707 1.45 2.66e-04
|
||||
13 0.3667 0.3642 1.44 2.60e-04
|
||||
14 0.3589 0.3532 1.42 2.53e-04
|
||||
15 0.3512 0.3455 1.41 2.45e-04
|
||||
16 0.3445 0.3431 1.41 2.38e-04
|
||||
17 0.3375 0.3367 1.40 2.30e-04
|
||||
18 0.3314 0.3323 1.39 2.21e-04
|
||||
19 0.3256 0.3229 1.38 2.13e-04
|
||||
20 0.3185 0.3193 1.38 2.04e-04
|
||||
21 0.3138 0.3150 1.37 1.95e-04
|
||||
22 0.3072 0.3112 1.37 1.85e-04
|
||||
23 0.3025 0.3034 1.35 1.76e-04
|
||||
24 0.2971 0.3030 1.35 1.66e-04
|
||||
25 0.2927 0.2928 1.34 1.57e-04
|
||||
26 0.2871 0.2899 1.34 1.47e-04
|
||||
27 0.2825 0.2893 1.34 1.37e-04
|
||||
28 0.2783 0.2863 1.33 1.28e-04
|
||||
29 0.2748 0.2824 1.33 1.18e-04
|
||||
30 0.2703 0.2783 1.32 1.09e-04
|
||||
31 0.2670 0.2750 1.32 9.95e-05
|
||||
32 0.2631 0.2718 1.31 9.05e-05
|
||||
33 0.2606 0.2691 1.31 8.17e-05
|
||||
34 0.2578 0.2691 1.31 7.32e-05
|
||||
35 0.2540 0.2667 1.31 6.51e-05
|
||||
36 0.2518 0.2650 1.30 5.73e-05
|
||||
37 0.2498 0.2630 1.30 4.98e-05
|
||||
38 0.2472 0.2601 1.30 4.28e-05
|
||||
39 0.2456 0.2587 1.30 3.63e-05
|
||||
40 0.2432 0.2584 1.29 3.02e-05
|
||||
41 0.2421 0.2572 1.29 2.46e-05
|
||||
42 0.2409 0.2567 1.29 1.96e-05
|
||||
43 0.2398 0.2560 1.29 1.51e-05
|
||||
44 0.2387 0.2553 1.29 1.11e-05
|
||||
45 0.2381 0.2550 1.29 7.75e-06
|
||||
46 0.2372 0.2550 1.29 4.98e-06
|
||||
47 0.2365 0.2546 1.29 2.81e-06
|
||||
48 0.2363 0.2542 1.29 1.25e-06 ←
|
||||
49 0.2359 0.2543 1.29 3.13e-07
|
||||
50 0.2371 0.2543 1.29 0.00e+00
|
||||
1 2.0431 0.8604 2.36 2.20e-04
|
||||
2 0.6824 0.5873 1.80 3.00e-04
|
||||
3 0.5679 0.5449 1.72 2.99e-04
|
||||
4 0.5294 0.5129 1.67 2.98e-04
|
||||
5 0.5054 0.4908 1.63 2.96e-04
|
||||
6 0.4849 0.4717 1.60 2.93e-04
|
||||
7 0.4671 0.4569 1.58 2.90e-04
|
||||
8 0.4502 0.4428 1.56 2.86e-04
|
||||
9 0.4359 0.4285 1.53 2.82e-04
|
||||
10 0.4256 0.4201 1.52 2.77e-04
|
||||
11 0.4148 0.4112 1.51 2.72e-04
|
||||
12 0.4055 0.4097 1.51 2.66e-04
|
||||
13 0.3969 0.3919 1.48 2.60e-04
|
||||
14 0.3876 0.3873 1.47 2.53e-04
|
||||
15 0.3791 0.3851 1.47 2.45e-04
|
||||
16 0.3717 0.3745 1.45 2.38e-04
|
||||
17 0.3645 0.3673 1.44 2.30e-04
|
||||
18 0.3574 0.3645 1.44 2.21e-04
|
||||
19 0.3503 0.3585 1.43 2.13e-04
|
||||
20 0.3430 0.3498 1.42 2.04e-04
|
||||
21 0.3377 0.3438 1.41 1.95e-04
|
||||
22 0.3308 0.3370 1.40 1.85e-04
|
||||
23 0.3248 0.3323 1.39 1.76e-04
|
||||
24 0.3194 0.3249 1.38 1.66e-04
|
||||
25 0.3141 0.3215 1.38 1.57e-04
|
||||
26 0.3098 0.3177 1.37 1.47e-04
|
||||
27 0.3043 0.3134 1.37 1.37e-04
|
||||
28 0.3000 0.3108 1.36 1.28e-04
|
||||
29 0.2950 0.3072 1.36 1.18e-04
|
||||
30 0.2901 0.3034 1.35 1.09e-04
|
||||
31 0.2880 0.3020 1.35 9.95e-05
|
||||
32 0.2835 0.2993 1.35 9.05e-05
|
||||
33 0.2805 0.2948 1.34 8.17e-05
|
||||
34 0.2759 0.2919 1.34 7.32e-05
|
||||
35 0.2737 0.2888 1.33 6.51e-05
|
||||
36 0.2706 0.2878 1.33 5.73e-05
|
||||
37 0.2679 0.2865 1.33 4.98e-05
|
||||
38 0.2660 0.2848 1.33 4.28e-05
|
||||
39 0.2645 0.2837 1.33 3.63e-05
|
||||
40 0.2623 0.2827 1.33 3.02e-05
|
||||
41 0.2608 0.2822 1.33 2.46e-05
|
||||
42 0.2589 0.2807 1.32 1.96e-05
|
||||
43 0.2579 0.2802 1.32 1.51e-05
|
||||
44 0.2568 0.2794 1.32 1.11e-05
|
||||
45 0.2549 0.2793 1.32 7.75e-06
|
||||
46 0.2556 0.2789 1.32 4.98e-06
|
||||
47 0.2550 0.2787 1.32 2.81e-06
|
||||
48 0.2543 0.2786 1.32 1.25e-06
|
||||
49 0.2524 0.2785 1.32 3.13e-07 ←
|
||||
50 0.2539 0.2785 1.32 0.00e+00
|
||||
|
||||
Reference in New Issue
Block a user