Implements perplexity computation, chord distribution extraction (qualities,
extensions, inversions, root-motion intervals), 4-panel comparison plot, and
paired qualitative example generation for pretrained vs finetuned model.
Results on user val set: pretrained PPL 3.58 → finetuned PPL 2.15 (−40 %).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>