Abstract: Traditional bug detection mechanisms have focused on a limited set of important issues and have specialized detectors for each of them. As the code corpora continue to grow in size and complexity, newer opportunities for a developer to make mistakes emerge, leading to \textit{long tail of local bugs}. Hence, we must investigate generalizable approaches that can detect such bugs. In this paper, we formulate and use the inconsistency principle that can be applied to discover bugs at arbitrary code granularity, for example at the package level. We experiment with two types of formulations: Pointwise Mutual Information (PMI) based and Sequence based approaches that respectively model smaller and larger contexts. The techniques learn code usage patterns from the code under analysis and apply the learnings on the same code -- thereby enabling on-the-fly bug detection. Experiments are conducted with two different program representations: token-based and graph-based. We show how the different variations capture diverse and complementary types of issues. The system is deployed in industrial setting and has detected 12 types of bugs with 70\% acceptance by developers in real-world code reviews.
1 Reply
Loading