On the Fly Input Refinement for Code Language Models

Published: 2025, Last Modified: 06 Nov 2025ICSE Companion 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Deep learning-based code language models (CLMs) have become integral to addressing a variety of software engineering (SE) tasks. However, these models often suffer from performance degradation, such as mispredictions, which hinder their practical applicability. Traditional solutions, such as retraining, are resource-intensive due to the requirements of data labeling, model updates, and redeployment. This study introduces a novel approach to address these challenges by refining inputs directly at deployment, avoiding the overhead of retraining. The framework includes two primary steps: (1) input validation, which detects out-of-scope inputs likely to cause mispredictions, and (2) input adaptation, which transforms these inputs into in-scope inputs using semantic-preserving code transformations and optimized sampling techniques. The framework's effectiveness is demonstrated through experiments across three CLMs, achieving accuracy improvements of up to 8.78% and an AUC score of up to 0.924 for detecting out-of-scope inputs. This work highlights input refinement as a cost-effective, scalable alternative to retraining, enabling robust CLM performance in dynamic SE environments.
Loading