SuperFC: Selective Data Utilization for a Sustainable and Effective Function-Calling Agent

Published: 2025, Last Modified: 09 Jan 2026IJCNN 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The function-calling agent is obtained by performing agent tuning to the large language model (LLM) on function-calling dataset. However, even state-of-the-art datasets (e.g., xlam-function-calling-60k datasets) still contain numerous misleading examples of low-quality data, wasting significant computational resources and result in an unnecessary carbon footprint. Furthermore, such inductive bad data negatively impacts the performance of the agent. In this paper, we propose a set of scoring criteria specifically tailored to evaluate function-calling data and use these criteria to develop a data filtering framework. By applying this framework to filter out low-quality data, we fine-tuned SuperFC, which demonstrates substantial improvements in both sustainability and performance. The SuperFC-7B training process reduced training time from 455 minutes to 85 minutes, resulting in a 80.02% reduction in carbon footprint. Simultaneously, fine-tuning on high-quality data subsets led to performance improvements of up to 3.68%. Additionally, we provide an in-depth analysis of the causes behind the low quality of synthetic function-calling data, offering valuable insights for future data synthesis in this domain. We have also released a high-quality function-calling dataset, available at: https://github.com/Zire-Young/SuperFC
Loading