Abstract: Multi-label code annotation in competitive programming is challenging due to the integration of diverse algorithmic paradigms within a single program. We propose L-CPC, a framework that leverages NLP and large language models to annotate competitive programming code from the Codeforces dataset. Through a parallel architecture with modules like CodeBERT/UniXcoder, retrieval-based methods and state-of-art large language models' (LLMs') annotation, L-CPC shows improved performance, achieving a higher Jaccard Score and F1-score compared to traditional methods such as SVM and Random Forest. These natural language based methods better fit the code settings, and some parts are easy to adapt to other settings besides programming contest. While L-CPC effectively captures semantic relationships in code, certain challenges remain in handling complex cases and need future work.
Paper Type: Short
Research Area: NLP Applications
Research Area Keywords: Multi-label Code annotation, Competitive programming contest, NLP application, Large language models
Languages Studied: English, Chinese
Submission Number: 2758
Loading