Co-Alignment: Rethinking Alignment as Bidirectional Human-AI Cognitive Adaptation

Published: 08 Oct 2025, Last Modified: 21 Oct 2025Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Cognitive alignment, Human-AI collaboration, Multi-agent reinforcement learning, Emergent communication, Information bottleneck
TL;DR: BiCA enables humans and AI to mutually adapt during collaboration instead of AI just conforming to humans (RLHF). Result: 85.5% vs 70.3% task success, better safety, proving co-alignment beats single-directional alignment.
Abstract: Current AI alignment through RLHF follows a single-directional paradigm—AI conforms to human preferences while treating human cognition as fixed. We propose a shift to co-alignment through Bidirectional Cognitive Alignment (BiCA), where humans and AI mutually adapt. BiCA uses learnable protocols, representation mapping, and KL-budget constraints for controlled co-evolution. In collaborative navigation, BiCA achieved 85.5\% success versus 70.3\% baseline, with 230\% better mutual adaptation and 332\% better protocol convergence (p < 0.001). Emergent protocols outperformed handcrafted ones by 84\%, while bidirectional adaptation unexpectedly improved safety (+23\% out-of-distribution robustness). The 46\% synergy improvement demonstrates optimal collaboration exists at the intersection, not union, of human and AI capabilities—validating the shift from single-directional to co-alignment paradigms.
Supplementary Material: zip
Submission Number: 220
Loading