LLMs versus the Halting Problem: Revisiting Program Termination Prediction

ACL ARR 2026 January Submission6265 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs, Coding models, Coding reasoning, Halting problem, Verification
Abstract: Determining whether a program terminates is a central problem in computer science. Turing’s foundational result established the Halting Problem as undecidable, showing that no algorithm can universally determine termination for all programs and inputs. Consequently, automatic verification tools approximate termination, sometimes failing to prove or disprove; these tools rely on problem-specific architectures and abstractions, and are usually tied to particular programming languages. Recent success and progress in large language models (LLMs) raises the following question: can LLMs reliably predict program termination?. In this work, we evaluate LLMs on a diverse set of C programs from the Termination category of the International Competition on Software Verification (SV-Comp) 2025. Our results suggest that LLMs perform remarkably well at predicting program termination, where GPT-5 and Claude Sonnet-4.5 would rank just behind the top-ranked tool (using test-time-scaling), and Code World Model (CWM) would place just behind the second-ranked tool. While LLMs are effective at predicting program termination, they often fail to provide a valid witness as a proof. Moreover, LLMs performance drops as program length increases. We hope these insights motivate further research into program termination and the broader potential of LLMs for reasoning about undecidable problems.
Paper Type: Long
Research Area: Code Models
Research Area Keywords: Coding models, code reasoning, verification
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: C programming language
Submission Number: 6265
Loading