Steering Language Models for Theorem Proving

Steering Language Models for Theorem Proving

ICLR 2026 Conference Submission25424 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Theorem proving, activation steering

Abstract: Recent progress in automated theorem proving leverages Large Language Models (LLMs) for their capacity to comprehend informal mathematical statements and generate corresponding formal proofs. Even though these techniques perform well, very little exploration has been done to understand how language models interpret and utilize these informal mathematical cues to generate formal proofs more effectively. To address this, we explore activation steering, a lightweight, inference-time mechanism that identifies linear directions in a model’s residual activations corresponding to informal “thought” traces, and nudges those activations to improve proof construction entirely without finetuning. Unlike previous approaches, activation engineering offers valuable insights into language models’ internal reasoning dynamics encoded in their activation space. We evaluated these activation vectors on two distinct tasks: formal proof generation from formal theorems and formal proof generation from informal problem descriptions. Our contributions are twofold: (1) we propose an activation-based intervention technique to guide proof synthesis in LLMs; and (2) improve performance across two different decoding strategies without additional training.

Supplementary Material: zip

Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)

Submission Number: 25424

Loading