Keywords: model distillation, algorithmic alignment, learning theory, linear representation hypothesis
TL;DR: This paper proves a PAC-distillation guarantee under an algorithmic alignment assumption
Abstract: Distillation is the process of condensing learnt knowledge from a large
neural network trained on large datasets to a more efficient one suitable
for deployment. Building on recent developments in the learning theory of
distillation (Boix-Adsera, 2024), we rigorously analyze a phenomenon in
which if the target class of the distillation process is algorithmically aligned
with the task at hand, in terms of a linear representation hypothesis (Elhage
et al., 2022), then the distillation process can be efficient. This gives rise to
a novel and rigorous characterization of algorithmic alignment that could
be of independent interest.
Submission Number: 47
Loading