CodeTrek: Flexible Modeling of Code using an Extensible Relational Representation

Pardis Pashakhanloo; Aaditya Naik; Yuepeng Wang; Hanjun Dai; Petros Maniatis; Mayur Naik

CodeTrek: Flexible Modeling of Code using an Extensible Relational Representation

Pardis Pashakhanloo, Aaditya Naik, Yuepeng Wang, Hanjun Dai, Petros Maniatis, Mayur Naik

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 PosterReaders: Everyone

Keywords: relational database, code representation, knowledge graph reasoning, program understanding

Abstract: Designing a suitable representation for code-reasoning tasks is challenging in aspects such as the kinds of program information to model, how to combine them, and how much context to consider. We propose CodeTrek, a deep learning approach that addresses these challenges by representing codebases as databases that conform to rich relational schemas. The relational representation not only allows CodeTrek to uniformly represent diverse kinds of program information, but also to leverage program-analysis queries to derive new semantic relations, which can be readily incorporated without further architectural engineering. CodeTrek embeds this relational representation using a set of walks that can traverse different relations in an unconstrained fashion, and incorporates all relevant attributes along the way. We evaluate CodeTrek on four diverse and challenging Python tasks: variable misuse, exception prediction, unused definition, and variable shadowing. CodeTrek achieves an accuracy of 91%, 63%, 98%, and 94% on these tasks respectively, and outperforms state-of-the-art neural models by 2-19% points.

One-sentence Summary: We present a relational database representation and corresponding neural module for source code and show its potential on program understanding tasks

Supplementary Material: zip

24 Replies

Loading