Keywords: Distillation, Epistemic, Reasoning
Abstract: Knowledge distillation (KD) is widely used to compress large language models,
yet its impact on models’ reasoning capacity remains poorly understood.
We present a theoretical framing of data-free and recursive KD as a
self-referential learning process in which students approximate their teachers’
approximations. Using Kolmogorov complexity and computable
information-theoretic proxies, we show that such recursive compression enforces
a monotonic reduction of information and bounds representational
richness. This perspective has direct implications for
mathematical and symbolic reasoning, where epistemic depth and
compositional structure are essential. We further relate this information
reduction to Shannon entropy and Minimum Description Length (MDL), and outline
new evaluation paradigms grounded in epistemic fidelity to assess whether
distilled models retain the structural knowledge required for robust reasoning.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 16
Loading