Towards distillation guarantees under algorithmic alignment

Published: 04 Oct 2025, Last Modified: 10 Oct 2025DiffCoAlg 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: model distillation, algorithmic alignment, learning theory, linear representation hypothesis
TL;DR: This paper proves a PAC-distillation guarantee under an algorithmic alignment assumption
Abstract: Distillation is the process of condensing learnt knowledge from a large neural network trained on large datasets to a more efficient one suitable for deployment. Building on recent developments in the learning theory of distillation (Boix-Adsera, 2024), we rigorously analyze a phenomenon in which if the target class of the distillation process is algorithmically aligned with the task at hand, in terms of a linear representation hypothesis (Elhage et al., 2022), then the distillation process can be efficient. This gives rise to a novel and rigorous characterization of algorithmic alignment that could be of independent interest.
Submission Number: 47
Loading