Keywords: Programming Languages, Program Embeddings, Code, Big Code
Abstract: Program embeddings are used to solve tasks such as \textit{code clone detection} and \textit{semantic labeling}. Solutions to these \textit{semantic tasks} should be invariant to semantics-preserving program transformations. When a program embedding function satisfies this invariance, we call it an \textit{equivalence-preserving program embedding function}. We say a programming language can be \textit{tractably embedded} when we can construct an equivalence-preserving program embedding function that executes in polynomial time in program/input length and produces program embeddings that are proportional to the input length. Determining whether a programming language can be tractably embedded is the \textit{equivalence-preserving program embedding problem}. We formalize this problem and theoretically characterize when programming languages can be tractably embedded. To validate our theoretical results, we use the BERT-Tiny model to learn an equivalence-preserving program embedding function for a programming language that can be tractably embedded and show the model fails to construct an equivalence-preserving program embedding function for a similar language that is intractable to embed.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
TL;DR: We develop a theory of program embeddings that preserve semantic equivalence and show when they are tractable to compute
10 Replies
Loading