Success is in the Details: Evaluate and Enhance Details Sensitivity of Code LLMs through Counterfactuals

Success is in the Details: Evaluate and Enhance Details Sensitivity of Code LLMs through Counterfactuals

ACL ARR 2025 May Submission891 Authors

16 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Code Sensitivity refers to the ability of Code LLMs to recognize and respond to details changes in problem descriptions, serving as an important dimension for evaluating Code LLM's ability. While current code benchmarks and instruction datasets primarily focus on difficulty and diversity, sensitivity is overlooked. We first introduce the CTF-Code benchmark, constructed using counterfactual perturbations—minimizing input changes while maximizing output changes. The evaluation shows that many models have a more than 10\% performance drop in performance on CTF-Code. To fully utilize sensitivity, we propose the CTF-Instruct incremental instruction fine-tuning framework, which extends on existing data and uses a selection mechanism to meet the three dimensions of difficulty, diversity, and sensitivity. Experiments show that models fine-tuned with CTF-Instruct data achieve over a 2\% improvement on CTF-Code, and more than a 10\% performance boost on LiveCodeBench, validating the feasibility of enhancing LLMs sensitivity to improve performance.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: code models; fine-tuning;benchmarking

Contribution Types: Publicly available software and/or pre-trained models, Data resources

Languages Studied: English,Programming Languages

Submission Number: 891

Loading