data: update pretrained checkpoint results (BAR-free tokenizer)
Re-run pre-training results with the corrected 84-token vocabulary and max_seq_len=320. Previous checkpoint was trained on stale data with BAR tokens and a corrupted tokenizer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Binary file not shown.
Reference in New Issue
Block a user