data: add fine-tuning run results (lr=3e-5, 30 epochs)
val loss 1.19 → 0.77, val perplexity 3.29 → 2.15. Best epoch 20, early stop at epoch 30 (patience=10). Improvement over previous lr=1e-5 run (best val ppl 2.22). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -2,12 +2,12 @@
|
||||
====================================================
|
||||
FINE-TUNING REPORT
|
||||
====================================================
|
||||
Total epochs run : 50
|
||||
Best epoch (val loss) : 50
|
||||
Convergence epoch : 30 (val ≤ best+1 %)
|
||||
Best val loss : 0.7987
|
||||
Best val perplexity : 2.22
|
||||
Final train loss : 0.6842
|
||||
Total epochs run : 30
|
||||
Best epoch (val loss) : 20
|
||||
Convergence epoch : 15 (val ≤ best+1 %)
|
||||
Best val loss : 0.7668
|
||||
Best val perplexity : 2.15
|
||||
Final train loss : 0.5379
|
||||
Unique parameters : 1,396,416
|
||||
Checkpoint : checkpoints/finetuned.pt
|
||||
Log CSV : checkpoints/finetuned.log.csv
|
||||
@@ -15,53 +15,33 @@
|
||||
|
||||
epoch train val ppl lr
|
||||
----- -------- -------- ------- ----------
|
||||
1 1.1851 1.2447 3.47 8.00e-06
|
||||
2 1.1298 1.1684 3.22 9.99e-06
|
||||
3 1.0748 1.1021 3.01 9.97e-06
|
||||
4 1.0044 1.0452 2.84 9.92e-06
|
||||
5 0.9631 0.9951 2.71 9.85e-06
|
||||
6 0.9039 0.9533 2.59 9.77e-06
|
||||
7 0.8676 0.9271 2.53 9.66e-06
|
||||
8 0.8483 0.9112 2.49 9.53e-06
|
||||
9 0.8178 0.8971 2.45 9.39e-06
|
||||
10 0.8153 0.8859 2.43 9.23e-06
|
||||
11 0.8041 0.8768 2.40 9.05e-06
|
||||
12 0.7914 0.8687 2.38 8.85e-06
|
||||
13 0.7781 0.8604 2.36 8.63e-06
|
||||
14 0.7714 0.8537 2.35 8.41e-06
|
||||
15 0.7687 0.8484 2.34 8.16e-06
|
||||
16 0.7626 0.8429 2.32 7.91e-06
|
||||
17 0.7494 0.8379 2.31 7.64e-06
|
||||
18 0.7452 0.8339 2.30 7.36e-06
|
||||
19 0.7460 0.8307 2.29 7.07e-06
|
||||
20 0.7266 0.8273 2.29 6.77e-06
|
||||
21 0.7221 0.8245 2.28 6.47e-06
|
||||
22 0.7240 0.8222 2.28 6.16e-06
|
||||
23 0.7214 0.8200 2.27 5.84e-06
|
||||
24 0.7176 0.8171 2.26 5.52e-06
|
||||
25 0.7116 0.8146 2.26 5.20e-06
|
||||
26 0.7123 0.8126 2.25 4.88e-06
|
||||
27 0.7051 0.8107 2.25 4.56e-06
|
||||
28 0.6971 0.8088 2.25 4.24e-06
|
||||
29 0.7002 0.8074 2.24 3.92e-06
|
||||
30 0.6953 0.8063 2.24 3.61e-06
|
||||
31 0.6879 0.8050 2.24 3.30e-06
|
||||
32 0.6918 0.8039 2.23 3.00e-06
|
||||
33 0.6907 0.8030 2.23 2.71e-06
|
||||
34 0.6921 0.8018 2.23 2.43e-06
|
||||
35 0.6917 0.8011 2.23 2.16e-06
|
||||
36 0.6789 0.8006 2.23 1.90e-06
|
||||
37 0.6821 0.8004 2.23 1.65e-06
|
||||
38 0.6891 0.8002 2.23 1.42e-06
|
||||
39 0.6865 0.7999 2.23 1.20e-06
|
||||
40 0.6906 0.7996 2.22 1.00e-06
|
||||
41 0.6841 0.7993 2.22 8.18e-07
|
||||
42 0.6851 0.7991 2.22 6.50e-07
|
||||
43 0.6892 0.7990 2.22 5.00e-07
|
||||
44 0.6899 0.7989 2.22 3.69e-07
|
||||
45 0.6792 0.7988 2.22 2.57e-07
|
||||
46 0.6809 0.7987 2.22 1.65e-07
|
||||
47 0.6885 0.7987 2.22 9.31e-08
|
||||
48 0.6840 0.7987 2.22 4.15e-08
|
||||
49 0.6854 0.7987 2.22 1.04e-08
|
||||
50 0.6842 0.7987 2.22 0.00e+00 ←
|
||||
1 1.1733 1.1905 3.29 2.40e-05
|
||||
2 1.0317 1.0171 2.77 3.00e-05
|
||||
3 0.9005 0.9146 2.50 2.99e-05
|
||||
4 0.8265 0.8777 2.41 2.98e-05
|
||||
5 0.7970 0.8517 2.34 2.96e-05
|
||||
6 0.7591 0.8368 2.31 2.93e-05
|
||||
7 0.7370 0.8194 2.27 2.90e-05
|
||||
8 0.7228 0.8066 2.24 2.86e-05
|
||||
9 0.6933 0.7973 2.22 2.82e-05
|
||||
10 0.6833 0.7923 2.21 2.77e-05
|
||||
11 0.6731 0.7879 2.20 2.71e-05
|
||||
12 0.6559 0.7830 2.19 2.65e-05
|
||||
13 0.6432 0.7776 2.18 2.59e-05
|
||||
14 0.6360 0.7746 2.17 2.52e-05
|
||||
15 0.6307 0.7731 2.17 2.45e-05
|
||||
16 0.6225 0.7715 2.16 2.37e-05
|
||||
17 0.6069 0.7695 2.16 2.29e-05
|
||||
18 0.6011 0.7682 2.16 2.21e-05
|
||||
19 0.6019 0.7682 2.16 2.12e-05
|
||||
20 0.5804 0.7668 2.15 2.03e-05 ←
|
||||
21 0.5749 0.7675 2.15 1.94e-05
|
||||
22 0.5770 0.7696 2.16 1.85e-05
|
||||
23 0.5672 0.7710 2.16 1.75e-05
|
||||
24 0.5646 0.7712 2.16 1.66e-05
|
||||
25 0.5569 0.7723 2.16 1.56e-05
|
||||
26 0.5561 0.7710 2.16 1.46e-05
|
||||
27 0.5515 0.7691 2.16 1.37e-05
|
||||
28 0.5428 0.7685 2.16 1.27e-05
|
||||
29 0.5428 0.7702 2.16 1.18e-05
|
||||
30 0.5379 0.7711 2.16 1.08e-05
|
||||
|
||||
Reference in New Issue
Block a user