Agents Help Agents: Exploring Training-Free Knowledge Distillation for Small Language Models in Data Science Code Generation

Jinyang Li; Jack Williams; Nick McKenna; Arian Askari; Nicholas Wilson; Reynold Cheng

Agents Help Agents: Exploring Training-Free Knowledge Distillation for Small Language Models in Data Science Code Generation

Jinyang Li, Jack Williams, Nick McKenna, Arian Askari, Nicholas Wilson, Reynold Cheng

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Knowledge Distillation, Large Language Models, Small Language Models, Data Science Code Generation, In-Context Learning, Agent Orchestration

TL;DR: This paper introduces the Agents Help Agents (AHA) framework for training-free knowledge distillation from large to small language models, improving code generation for data science.

Abstract: Knowledge distillation from Large Language Models (LLMs) to locally hosted Small Language Models (SLMs) provides advantages for Data Science Code Generation (DSCG) such as enhanced data privacy and reduced response times. However, achieving effective distillation without resource-intensive training is challenging. This paper investigates whether LLMs can distill knowledge to SLMs through In-Context Learning (ICL), a training-free method for rapid task adaptation. We present the **Agents Help Agents (AHA)** framework, which facilitates automatic knowledge distillation from LLMs to SLMs via agent orchestration. AHA consists of three phases: exploration through an **Agent Orchestration Interface (AOI)**, memory collection of successful examples, and inference augmented with distilled knowledge. The AOI orchestrates interactions between a LLM as a teacher agent and a SLM as a student agent. And we propose two distillation strategies: a static approach that aggregates an offline instruction set and a dynamic RAG-based approach that distills knowledge dynamically during inference. We evaluate AHA on three challenging code generation tasks for tabular data analysis: TabMWP, BirD-SQL, and WikiTQ. Experimental results demonstrate the effectiveness of AHA, leading to an average 27.5\% relative improvement in the performance of the Student Agent Phi-3-mini. Additionally, relative gains of 14.3\% and 30.9\% are observed in **Llama-3.1-8B** and **GPT-35-Turbo**, respectively, even though those models were not calibrated as part of the orchestration, highlighting the model-agnostic nature of the distilled knowledge in AHA. Further analysis compares distillation and demonstration techniques across different data input settings, providing insights into optimal configurations for DSCG.

Supplementary Material: zip

Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6905

Loading