data: update pretrained checkpoint results (BAR-free tokenizer)

Re-run pre-training results with the corrected 84-token vocabulary and
max_seq_len=320.  Previous checkpoint was trained on stale data with BAR
tokens and a corrupted tokenizer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

This commit is contained in:

Masahiko AMANO

2026-05-20 14:28:00 +03:00

parent 4aead2ea20

commit 8a73394df9

3 changed files with 107 additions and 107 deletions

checkpoints/pretrained_curves.png

LFS

BIN

View File

Binary file not shown.