Dataflow-Guided Neuro-Symbolic Language Models for Type Inference

Gen Li; Yao Wan; Hongyu Zhang; Zhou Zhao; Wenbin Jiang; Xuanhua Shi; Hai Jin; Zheng Wang

Dataflow-Guided Neuro-Symbolic Language Models for Type Inference

Gen Li, Yao Wan, Hongyu Zhang, Zhou Zhao, Wenbin Jiang, Xuanhua Shi, Hai Jin, Zheng Wang

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This paper presents Nester, a neuro-symbolic approach to enhance LMs for type inference by integrating symbolic learning without increasing model size.

Abstract: Language Models (LMs) are increasingly used for type inference, aiding in error detection and software development. Some real-world deployments of LMs require the model to run on local machines to safeguard the intellectual property of the source code. This setting often limits the size of the LMs that can be used. We present Nester, the first neuro-symbolic approach that enhances LMs for type inference by integrating symbolic learning without increasing model size. Nester breaks type inference into sub-tasks based on the data and control flow of the input code, encoding them as a modular high-level program. This program executes multi-step actions, such as evaluating expressions and analyzing conditional branches of the target code, combining static typing with LMs to infer potential types. Evaluated on the ManyTypes4Py dataset in Python, Nester outperforms two state-of-the-art type inference methods (HiTyper and TypeGen), achieving 70.7\% Top-1 Exact Match, which is 18.3\% and 3.6\% higher than HiTyper and TypeGen, respectively. For complex type annotations like typing.Optional and typing.Union, Nester achieves 51.0\% and 16.7\%, surpassing TypeGen by 28.3\% and 5.8\%.

Lay Summary: When programmers write code, they often need to predict the kind of data a variable will hold—such as numbers, text, or more complex types—a process known as type inference. While large AI language models can assist with this task, many real-world scenarios (especially in proprietary software) demand lightweight models that run locally, which often limits their accuracy. To address this, we developed Nester, a hybrid AI system that combines neural networks with structured symbolic reasoning. Rather than relying solely on a large language model, Nester decomposes type inference into logical sub-tasks—such as analyzing code conditions and tracing data flow—and solves them step by step. This structured approach keeps the model compact while significantly boosting its reasoning capabilities. Our method helps developers detect type errors more efficiently, all while maintaining a lightweight footprint suitable for on-device deployment, ensuring that sensitive code remains secure. This approach can be extended to other areas of code intelligence, including code summarization, generation, and automated debugging.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/CGCL-codes/naturalcc/tree/main/examples/nester

Primary Area: Applications->Everything Else

Keywords: type inference, neuro-symbolic, language models, dataflow analysis

Submission Number: 11311

Loading