Synthesizing Performance Constraints for Evaluating and Improving Code Efficiency

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Test Generation, Code Analysis, Performance Analysis, Code Optimization, Fuzzing
TL;DR: We combine LLM-synthesized performance-characterizing constraints with fuzzing to uncover difficult-to-find code inefficiencies and generate performance-stressing tests.
Abstract: Large Language Models (LLMs) have been increasingly used to optimize code efficiency. Evaluating their effectiveness and further suggesting optimization opportunities often rely on high-quality tests to demonstrate the performance bottlenecks presented in the program. However, existing approaches rely on a limited set of hand-curated inputs or LLM-generated uninteresting length-stressing tests, failing to reveal more nuanced optimization opportunities. We present WEDGE, a framework for generating performance-stressing input given the program under test. WEDGE synthesizes explicit performance-characterizing constraints in the form of branch conditions to partition the programs' execution space into performance-specific regions. When integrated with the coverage-guided fuzzer, reaching different regions introduces explicit rewards for test generation to explore1 inefficient implementations. Our evaluation shows that WEDGE introduces a significant slowdown compared to the tests in CodeContests and those claimed to be optimized by existing approaches. From the utility perspective, integrating our tests substantially improves the existing code optimization approaches that1 rely on test-driven execution feedback. We release PERFFORGE, the performance1 tests generated by WEDGE, to benchmark future approaches for efficient code1 generation at https://github.com/elmerjfudd/wedge.
Supplementary Material: zip
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 13453
Loading