From Self-Inconsistency to Stability: Achieving Order Invariant In-Context Learning

Mingyu Zhong; Yu-Neng Chuang; Kaituo Zhang; Guanchu Wang; Yang Sui; Na Zou

From Self-Inconsistency to Stability: Achieving Order Invariant In-Context Learning

Mingyu Zhong, Yu-Neng Chuang, Kaituo Zhang, Guanchu Wang, Yang Sui, Na Zou

20 Sept 2025 (modified: 03 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: In-Context Learning (ICL), Permutation Invariance, Jensen-Shannon Divergence (JSD), Distributional Alignment, Self-Inconsistency Optimization

Abstract: Large Language Models (LLMs) exhibit powerful reasoning capabilities, particularly when guided by in-context learning (ICL). However, their performance is brittle to demonstration order: accuracy can swing from perfect to random based solely on the permutation of input ordering. This sensitivity reveals a fundamental vulnerability where models rely on spurious positional correlations (noise) rather than semantic content (signal). To address this reliability gap, we introduce \textbf{Self-Inconsistency Optimization (\algname{})}, a simple model-agnostic post-training framework that teaches models to focus on \textit{what} is said, not \textit{how} it is arranged. \algname{} generates semantically equivalent inputs through permutation and explicitly trains the model to align its output distributions using our proposed self-inconsistency loss which is based on the Jensen--Shannon divergence. We provide a theoretical justification for our framework, proving that minimizing this self-inconsistency loss is sufficient to achieve the desired order invariance. Furthermore, the Bayesian update design of \algname{} provides a stable optimization process by decoupling the model's prior knowledge from the alignment objective, allowing it to integrate seamlessly with existing post-training pipelines such as reinforcement learning. Empirical evaluations on mathematical reasoning benchmarks show that \algname{} substantially mitigates order sensitivity while maintaining or even improving task accuracy. Our source code is available at \url{https://anonymous.4open.science/r/From-Self-Inconsistency-to-Stability-E0BC}.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 22415

Loading