Abstract: Identifying relationships among program elements is useful for program understanding, debugging, and analysis. One such kind of relationship is synonymy. Function synonyms are functions that play a similar role in code; examples include functions that perform initialization for different device drivers, and functions that implement different symmetric-key encryption schemes. Function synonyms are not necessarily semantically equivalent and can be syntactically dissimilar; consequently, approaches for identifying code clones or functional equivalence cannot be used to identify them. This paper presents Func2<pre>vec</pre>, a technique that learns an embedding mapping each function to a vector in a continuous vector space such that vectors for function synonyms are in close proximity. We compute the function embedding by training a neural network on sentences generated using random walks over the interprocedural control-flow graph. We show the effectiveness of Func2<pre>vec</pre> at identifying function synonyms in the Linux kernel. Finally, we apply Func2<pre>vec</pre> to the problem of mining error-handling specifications in Linux file systems and drivers. We show that the function synonyms identified by Func2<pre>vec</pre> result in error-handling specifications with high support.
Loading