Unlocking SLM Potential for Data Analysis Code Generation via Non-Parametric Knowledge Distillation

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Knowledge Distillation, Data Code Generation
Abstract: Knowledge distillation from Large Language Models (LLMs) to locally hosted Small Language Models (SLMs) provides advantages for Data Analysis Code Generation (DACG) such as privacy protection. However, achieving effective distillation without resource-intensive training is challenging. This paper investigates whether LLMs can distill knowledge to SLMs through In-Context Learning (ICL), a training-free method for rapid task adaptation. We present the DarGO: Distillation and Adaptive Reasoning-Guided Orchestration framework, which facilitates automatic knowledge distillation from LLMs to SLMs. DarGO consists of three phases: exploration through an Model Orchestration Interface (MOI), Memory Collection of successful trajectories, and Knoweldge-driven Inference. We evaluate DarGO on three challenging DACG benchmarks (WikiTQ, TabMWP, and Bird-SQL), each with in-domain training sets that enable detailed analysis of knowledge distillation effectiveness. DarGO demonstrates a substantial relative performance improvement of 27.5\% on average for the student SLMs. To further observe generalization capabilities, we evaluate the \method across different teacher-student model combinations, knowledge transfer scenarios, and unified memory approaches for more advanced, test-only data analysis tasks. Our findings contribute a novel perspective on distillation methods that enhance high performance for SLMs while avoiding intensive fine-tuning.
Supplementary Material: zip
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 21305
Loading