BACH-V: Bridging Abstract and Concrete Human-Values in Large Language Models

BACH-V: Bridging Abstract and Concrete Human-Values in Large Language Models

ACL ARR 2026 January Submission8079 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: abstract-concrete dichonomy, human values, Large language models, interpretability, LLM probing, LLM steering

Abstract: Do large language models (LLMs) genuinely understand abstract concepts, or merely manipulate them as statistical patterns? We introduce an abstraction-grounding framework that decomposes conceptual understanding into three capacities: interpretation of abstract concepts (Abstract-Abstract, A-A), grounding of abstractions in concrete events (Abstract-Concrete, A-C), and application of abstract principles to regulate concrete decisions (Concrete-Concrete, C-C). Using human values as a testbed—given their semantic richness and centrality to alignment—we employ probing (detecting value traces in internal activations) and steering (modifying representations to shift behavior). Across six open-source LLMs and ten value dimensions, probing shows that diagnostic probes trained solely on abstract value descriptions reliably detect the same values in concrete event narratives and decision reasoning, demonstrating cross-level transfer. Steering reveals an asymmetry: intervening on value representations causally shifts concrete judgments and decisions (A-C, C-C), yet leaves abstract interpretations unchanged (A-A), suggesting that encoded abstract values function as stable anchors rather than malleable activations. These findings indicate LLMs maintain structured value representations that bridge abstraction and action, providing a mechanistic and operational foundation for building value-driven autonomous AI systems with more transparent, generalizable alignment and control.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: probing, explanation faithfulness

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 8079

Loading