FROST: Factual Reasoning via Optimized Stochastic Trajectories in Large Language Models during Inference

Published: 18 Apr 2026, Last Modified: 24 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: hallucination, exploration, dual-process reasoning, chain-of-thought, high-entropy generation, small model ensembles, information theory
Abstract: Large language models face a trade-off between factual consistency and reasoning diversity: deterministic decoding prioritizes reliability but may miss alternative solution paths, while high-temperature sampling increases exploration at the cost of accuracy. We present FROST (Factual Reasoning via Optimized Stochastic Trajectories), an inference-time framework that balances exploration and exploitation without additional training or context augmentation. FROST combines deterministic inference from a large model with targeted stochastic sampling from a smaller model, selecting outputs via multi-criteria validation over coherence, factual grounding, and semantic novelty. Across HotpotQA, CommonsenseQA, and MMLU, FROST achieves 2--5 percentage point improvements over standard chain-of-thought prompting and reduces unsupported outputs by 40\% relative to Standard CoT. Compared to Self-Consistency ensembles, FROST delivers comparable accuracy at 28\% lower inference cost through strategic delegation to smaller models. On an adversarial subset with unanswerable queries, FROST abstains on 34\% of cases versus 8\% for standard chain-of-thought, reducing false positives by 45\%. Task-stratified evaluation shows that exploration benefits scale with problem ambiguity. Generalization to mathematical reasoning, code generation, and multimodal domains remains future work.
Submission Type: Emerging
Copyright Form: pdf
Submission Number: 251
Loading