Keywords: In-Context Learning (ICL), Permutation Invariance, Jensen-Shannon Divergence (JSD), Distributional Alignment, Self-Inconsistency Optimization
Abstract: Large Language Models (LLMs) exhibit powerful reasoning capabilities, particularly when guided by in-context learning (ICL). However, their performance is brittle to demonstration order: accuracy can swing from perfect to random based solely on the permutation of input ordering. This sensitivity reveals a fundamental vulnerability where models rely on spurious positional correlations (noise) rather than semantic content (signal). To address this reliability gap, we introduce \textbf{Self-Inconsistency Optimization (\algname{})}, a simple model-agnostic post-training framework that teaches models to focus on \textit{what} is said, not \textit{how} it is arranged. \algname{} generates semantically equivalent inputs through permutation and explicitly trains the model to align its output distributions using our proposed self-inconsistency loss which is based on the Jensen--Shannon divergence. We provide a theoretical justification for our framework, proving that minimizing this self-inconsistency loss is sufficient to achieve the desired order invariance. Furthermore, the Bayesian update design of \algname{} provides a stable optimization process by decoupling the model's prior knowledge from the alignment objective, allowing it to integrate seamlessly with existing post-training pipelines such as reinforcement learning. Empirical evaluations on mathematical reasoning benchmarks show that \algname{} substantially mitigates order sensitivity while maintaining or even improving task accuracy. Our source code is
available at \url{https://anonymous.4open.science/r/From-Self-Inconsistency-to-Stability-E0BC}.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 22415
Loading