CodeComplex: A Time-complexity Dataset for Multi-language Source Codes

24 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Code complexity, Dataset, Neural network
TL;DR: Dataset for code complexity
Abstract: Deciding the computational complexity of algorithms is a really challenging problem, even for human algorithm experts. Theoretically, the problem of deciding the computational complexity of a given program is undecidable due to the famous Halting problem. So, we focus on cases where there are inputs and outputs, and of which we can know if the code is right or wrong. We propose our own dataset CodeComplex, which consists of 4,900 Java codes and 4,900 Python codes submitted to programming competitions by human programmers and their complexity labels annotated by a group of algorithm experts. As far as we are aware, the CodeComplex dataset is by far the largest code dataset for the complexity prediction problem. Then, we present experimental results from several baseline models using the SOTA code understanding neural models such as CodeBERT, GraphCodeBERT, PLBART, CodeT5, CodeT5+ and UniXcoder. We also give an analysis on the difficulties of code complexity and why the models are good/bad on predicting the time complexity. The CodeComplex dataset is available at https://anonymous.4open.science/r/CodeComplex-Data and material for reproduction is available at https://anonymous.4open.science/r/CodeComplex-Models.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9201
Loading