What makes the preferred thinking direction for LLM in Multi-choice Questions?

Yizhe Zhang; Richard He Bai; Zijin Gu; Ruixiang ZHANG; Jiatao Gu; Emmanuel Abbe; Samy Bengio; Navdeep Jaitly

What makes the preferred thinking direction for LLM in Multi-choice Questions?

Yizhe Zhang, Richard He Bai, Zijin Gu, Ruixiang ZHANG, Jiatao Gu, Emmanuel Abbe, Samy Bengio, Navdeep Jaitly

12 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reasoning, question answering

TL;DR: Exploring alternative to left-to-right factorization, we show that different reasoning orders can have impact on LLM performance on tasks like reasoning, commonsense, and truthfulness, with insights into when each factorization is most beneficial.

Abstract: Language models usually use left-to-right (L2R) autoregressive factorization. However, L2R factorization may not always be the best inductive bias. Therefore, we investigate whether alternative factorizations of the text distribution could be beneficial in some tasks. We investigate right-to-left (R2L) training as a compelling alternative, focusing on multiple-choice questions (MCQs) as a test bed for knowledge extraction and reasoning. Through extensive experiments across various model sizes (2B-8B parameters) and training datasets, we find that L2R is not always preferred over R2L models on MCQ benchmarks, especially on logical reasoning, commonsense understanding, and truthfulness assessment tasks. Our analysis reveals that this performance difference may be fundamentally linked to multiple factors including calibration, computability, and directional conditional entropy. We ablate the impact of these factors through controlled simulation studies using arithmetic tasks, where the impacting factors can be better disentangled. Our work demonstrates that exploring alternative factorizations of the text distribution can lead to improvements in LLM capabilities and provides theoretical insights into optimal factorization towards approximating human language distribution, and when each reasoning order might be more advantageous.

Supplementary Material: pdf

Primary Area: foundation or frontier models, including LLMs

Submission Number: 4279

Loading