L-CPC: Language-represented Competitive Programming Code annotation in multi-label settings

L-CPC: Language-represented Competitive Programming Code annotation in multi-label settings

ACL ARR 2025 May Submission2758 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Multi-label code annotation in competitive programming is challenging due to the integration of diverse algorithmic paradigms within a single program. We propose L-CPC, a framework that leverages NLP and large language models to annotate competitive programming code from the Codeforces dataset. Through a parallel architecture with modules like CodeBERT/UniXcoder, retrieval-based methods and state-of-art large language models' (LLMs') annotation, L-CPC shows improved performance, achieving a higher Jaccard Score and F1-score compared to traditional methods such as SVM and Random Forest. These natural language based methods better fit the code settings, and some parts are easy to adapt to other settings besides programming contest. While L-CPC effectively captures semantic relationships in code, certain challenges remain in handling complex cases and need future work.

Paper Type: Short

Research Area: NLP Applications

Research Area Keywords: Multi-label Code annotation, Competitive programming contest, NLP application, Large language models

Languages Studied: English, Chinese

Submission Number: 2758

Loading