Keywords: Invariants, Software Engineering, Programming Languages
TL;DR: Learning from if-statements to generate program invariants directly from code
Abstract: Source code is meant to be executed, as well as read. Developers reason about its run-time properties by inferring invariants, which constrain program behavior; but they rarely encode these explicitly, so machine-learning methods don't have much aligned data to learn from. We propose an approach that adapts cues within existing if-statements regarding explicit run-time expectations to generate aligned datasets of code and implicit invariants. We also propose a contrastive loss to inhibit generation of illogical invariants. Our model learns to infer a wide vocabulary of invariants for arbitrary code, which can be used to detect and repair real bugs. This is complementary to trace-based methods, such as Daikon. Our results confirm that neural models can learn run-time expectations directly from code.