Abstract: Large vision-language models are moving into consumer devices in recent years, where reliability matters as much as speed. A central obstacle is hallucination. We examine the geometry of hidden states and observe a depth-wise shift from near-linear structure in early layers to curved manifolds in deeper fusion layers. This points to a simple design rule for deployment: respect the local geometry when steering the model. Motivated by this finding, we introduce MAGE (Manifold-guided Geometric Enhancement), a training-free plug-in intervention that follows this rule. MAGE aligns the guidance with geodesic directions on the unit sphere, filters it through a visual-evidence subspace, aggregates across demonstrations, and applies layer-adaptive residual injections. In CHAIR, the level of sentence hallucination drops by 10.2 points ( $48.0\rightarrow 37.8$ ), and the level of instances drops by 2.7 points ( $13.9\rightarrow 11.2$ ). In MME, the total score increases by 73.1 ( $600.2\rightarrow 676.3$ ). In consumer electronics, the answers are clearer and better grounded, without additional retraining.
Loading