Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits

ICLR 2026 Conference Submission19420 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: bandits, online learning, opinion dynamics, social media platforms
TL;DR: We present the first online framework for minimizing polarization and disagreement in the Friedkin–Johnsen model, using a two-stage low-rank matrix bandit algorithm with guarantees and strong empirical gains over linear bandits.
Abstract: We study the problem of minimizing polarization and disagreement in the Friedkin–Johnsen opinion dynamics model under incomplete information. Unlike prior work that assumes a static setting with full knowledge of users' innate opinions, we address the more realistic online setting where innate opinions are unknown and must be learned through sequential observations. This novel setting, which naturally mirrors periodic interventions on social media platforms, is formulated as a regret minimization problem, establishing a key connection between algorithmic interventions on social media platforms and theory of multi-armed bandits. In our formulation, a learner observes only a scalar feedback of the overall polarization and disagreement after an intervention. For this novel bandit problem, we propose a two-stage algorithm based on low-rank matrix bandits. The algorithm first performs subspace estimation to identify an underlying low-dimensional structure, and then employs a linear bandit algorithm within the compact dimensional representation derived from the estimated subspace. We prove that our algorithm achieves an $\widetilde{O}(\sqrt{T})$ cumulative regret over any time horizon $T$. Empirical results validate that our algorithm significantly outperforms a linear bandit baseline in terms of both cumulative regret and running time.
Primary Area: optimization
Submission Number: 19420
Loading