Bridging Internal Consistency and External Alignment: A Causal and Dynamic Interpretability Framework for LLM Generation

Bridging Internal Consistency and External Alignment: A Causal and Dynamic Interpretability Framework for LLM Generation

ACL ARR 2026 January Submission2724 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language model, Interpretability, Structural causal model, Dynamic

Abstract: Large Language Models (LLMs) are widely used in high-stakes applications, making their interpretability increasingly important. Existing interpretability methods are typically categorized into internal and external perspectives, which are often studied in isolation and tend to overlook two key aspects: causality and temporal dynamics. Explanations are often limited to surface correlations or static dependencies, failing to capture how influences evolve during autoregressive generation. To address these limitations, we propose a causal and dynamic interpretability framework for LLM generation. We first characterize the backdoor-adjusted causal effects of both the generated prefix and the prompt on the current token using a structural causal model. Next, we introduce two metrics to quantify contextual causal influence and question–answer causal influence. Overall, our work provides a unified causal view of internal consistency and external alignment in LLM generation dynamics.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: metrics;evaluation;joint LLM and time series modeling;

Contribution Types: Model analysis & interpretability

Languages Studied: English;Chinese

Submission Number: 2724

Loading