Manifold-Guided Geometric Enhancement for Reliable Multimodal Interaction in Consumer Electronics

XiaoYu Xu, YuLan Pan, Xiaofeng Zhang, Xuhang Chen, Kim-Fung Tsang

Published: 2026, Last Modified: 20 Apr 2026IEEE Trans. Consumer Electron. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Large vision-language models are moving into consumer devices in recent years, where reliability matters as much as speed. A central obstacle is hallucination. We examine the geometry of hidden states and observe a depth-wise shift from near-linear structure in early layers to curved manifolds in deeper fusion layers. This points to a simple design rule for deployment: respect the local geometry when steering the model. Motivated by this finding, we introduce MAGE (Manifold-guided Geometric Enhancement), a training-free plug-in intervention that follows this rule. MAGE aligns the guidance with geodesic directions on the unit sphere, filters it through a visual-evidence subspace, aggregates across demonstrations, and applies layer-adaptive residual injections. In CHAIR, the level of sentence hallucination drops by 10.2 points ( $48.0\rightarrow 37.8$ ), and the level of instances drops by 2.7 points ( $13.9\rightarrow 11.2$ ). In MME, the total score increases by 73.1 ( $600.2\rightarrow 676.3$ ). In consumer electronics, the answers are clearer and better grounded, without additional retraining.