Keywords: LLM, Code Generation
Abstract: Large language models (LLMs) have made remarkable progress in code generation. However, the evaluation of their actual programming capabilities remains largely limited to solving standard coding problems, and a full understanding is under exploration. We propose Coding Triangle, a systematic approach for evaluating LLMs across three core dimensions: code editing, code implementation, and test case generation. Through comprehensive experiments on competitive programming benchmarks, we evaluate the model performance across these dimensions and uncover both self-consistency and self-inconsistency within the model's own cognition. Self-consistency often results in solutions that lack the diversity and robustness seen in human programmers, leading to a significant distribution shift between model cognition and human submissions. Our analysis of interactions between different dimensions reveals that self-inconsistency also exists, which may enable self-reflection and self-improvement and provide a potential direction for developing more powerful coding models.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 18716
Loading