Datatype tagging and prompt alignment: a recipe for boosting LLMs on algorithmic tasks

ICLR 2026 Conference Submission24969 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: tokenizers, datatype tagging, algorithmic alignment, LLMs and coding, LLMs and arithmetic, algebra
TL;DR: We describe a compact recipe that aligns prompts with a typed program space and reliably emits a single legal Python expression. This helps LLMs align with algorithmic intents more easily and provides a quickfix boost to their algorithmic abilities
Abstract: This paper contributes toward strengthening the bridge between LLMs as programmers and classical ideas in programming languages (PL). Specifically, we show that aligning prompts with *typed programs* enables even small models to reliably emit one-line Python code. We present a simple yet effective recipe consisting of three key ingredients: (i) inline datatype tagging for prompt and code; (ii) a fine-tuned dual-head GPT-2-small with an auxiliary span probe over the prompt; and (iii) a fixed decoder that enforces a finite-state grammar, validates AST shape, and repairs outputs deterministically. On a stratified GPT-4o based dataset that covers primitives such as $\texttt{add}$, $\texttt{subtract}$, $\texttt{max}$, $\texttt{min}$, and $\texttt{sort}$, the decoder alone raises execution accuracy by over 40\% (from $0.58$ to $0.82$)! For counting and repeated addition, prompts map deterministically to single expressions (for example, $\texttt{s.count('r')}$ and $\texttt{sum([1]*100)}$), yielding near-zero errors within coverage. Our approach runs on a single GPU, and presents a proof-of-concept on how "datatype-aware tokenization'' and "grammar-first decoding,'' among other ideas inspired by PL, improve reliability, coverage, and quality at low cost.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 24969
Loading