Concept Realization Manifolds for Multi-Concept Activation and its (Dis)Entanglement in Large Language Models

TMLR Paper7143 Authors

24 Jan 2026 (modified: 14 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: This work extends the Bias-CAV framework by introducing Concept Realization Manifolds (CRMs) as a geometric foundation for analyzing multi-concept activations and their entanglement in large language models. A theoretical framework is presented that reframes concepts as operational geometric regularities rather than latent variables. Multi-Concept Activation Subspaces (MCAS) are introduced to jointly model multiple bias-related concepts, addressing limitations of single-concept approaches identified in prior work. The operational limits of disentanglement are formally characterized through the Irreducible Measure Entanglement Theorem, which establishes that while directional entanglement can be reduced or removed, measure entanglement (activation distribution overlap) may persist due to data correlations and model optimization objectives. Conditional disentanglement methods are developed to operationalize partial concept separation. A comprehensive terminology hierarchy is established, including Concept Entanglement Fields, Conditional Concept Manifolds, and Intersectional Concept Regions. The framework is applied to bias analysis through multi-concept intervention mechanisms with formal fidelity guarantees. Examination of layer-wise entanglement patterns reveals structured relationships between concepts across transformer layers. Multi-axis evaluation demonstrates that MCAS reduces cross-dimension spillover effects by 2.4--3.6× compared to baseline methods in the evaluated settings, addressing concerns about unintended consequences in targeted bias mitigation. For practitioners, the framework provides operational methods for analyzing intersectional bias patterns (e.g., gender $\times$ profession interactions) and improving model interpretability through conditional disentanglement in the tested scenarios, even when perfect concept separation is theoretically impossible.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Chao_Chen1
Submission Number: 7143
Loading