L-CPC: Language-represented Competitive Programming Code annotation in multi-label settings

ACL ARR 2025 May Submission2758 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multi-label code annotation in competitive programming is challenging due to the integration of diverse algorithmic paradigms within a single program. We propose L-CPC, a framework that leverages NLP and large language models to annotate competitive programming code from the Codeforces dataset. Through a parallel architecture with modules like CodeBERT/UniXcoder, retrieval-based methods and state-of-art large language models' (LLMs') annotation, L-CPC shows improved performance, achieving a higher Jaccard Score and F1-score compared to traditional methods such as SVM and Random Forest. These natural language based methods better fit the code settings, and some parts are easy to adapt to other settings besides programming contest. While L-CPC effectively captures semantic relationships in code, certain challenges remain in handling complex cases and need future work.
Paper Type: Short
Research Area: NLP Applications
Research Area Keywords: Multi-label Code annotation, Competitive programming contest, NLP application, Large language models
Languages Studied: English, Chinese
Submission Number: 2758
Loading