Keywords: competitive programming, self-consistency
Abstract: We introduce a stress testing approach to improve performance of large language reasoning models on challenging competitive programming problems. By combining stress testing—inspired from a technique commonly used by expert programmers—with self-consistency and self-debugging methods, we demonstrate significant improvements in solution accuracy. Our method generates multiple
brute-force solutions to validate and filter candidate solutions, leading to better performance than traditional majority voting approaches. Experimental results show that our approach successfully narrows the gap between pass@k and majority voting scores on the USACO benchmark for both o1-mini and o3-mini models, solving up to 246 out of 307 problems which is 17 more than the vanilla self-consistency.
Submission Number: 40
Loading