ORION: Decoupling and Alignment for Unified Autoregressive Understanding and Generation

Published: 26 Jan 2026, Last Modified: 02 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: unified generation and understanding
Abstract: Unified multimodal Large Language Models (MLLMs) hold great promise for seamlessly integrating understanding and generation. However, monolithic autoregressive architectures, despite their elegance and conversational fluency, suffer from a fundamental semantic–structural conflict: optimizing for low-level reconstructability in generation leads to catastrophic forgetting of high-level semantic understanding. We present ORION, a unified framework that resolves this conflict through Decoupling and Alignment. A non-linear vision head decouples structural pressures from shared representations, while a novel Representation Consistency Loss explicitly aligns semantics during generation. Together with a curated progressive training recipe and high-quality multimodal data, our method enables balanced optimization of both capabilities. Built purely on a monolithic autoregressive backbone without task-specific separate parameters, ORION achieves performance on par with or exceeding recent state-of-the-art unified models that rely on more complex designs. These results validate monolithic autoregression as a simple, effective, and competitive path toward truly integrated multimodal intelligence.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 5375
Loading