Auditing Black-Box Trends: Structural Inductive Bias Facilitates Causal Interpretability in Clinical Time Series

Aditya Kumar Karna; Trina Dutta Barlow

Auditing Black-Box Trends: Structural Inductive Bias Facilitates Causal Interpretability in Clinical Time Series

Aditya Kumar Karna, Trina Dutta Barlow

Published: 01 Mar 2026, Last Modified: 09 Apr 2026ICLR 2026 TSALM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Presentation Attendance: Yes, we will present in-person

Keywords: Time Series, Foundation Models, Causality, Interpretability, Safety Audit, Clinical AI

TL;DR: We introduce the Causal Hallucination Score (CHS) to audit foundation models for inverse causal semantics, showing that propensity-regularized models recover valid therapeutic signals in confounded clinical data.

Abstract: The deployment of predictive Transformer architectures in high-stakes healthcare presents a critical safety challenge: the divergence between forecasting accuracy and interventional validity. We term this the "Alignment Gap." In observational data, standard training objectives incentivize models to exploit "confounding by indication," often leading to inverted causal semantics. In this work, we present a simple audit protocol for quantifying this gap. We introduce the Causal Hallucination Score (CHS), a metric measuring the divergence between a foundation model's zero-shot counterfactuals and a structural reference instrument. Applying this to Lag-Llama and Chronos-T5, we reveal a severe safety failure: despite high predictive likelihood, naive prompting of these models reflects the dataset's observational bias (associating life-saving vasopressors with increased mortality). We demonstrate that a Propensity-Regularized GRU-D serves as an effective audit instrument, recovering a directionally consistent therapeutic signal (CATE: +0.005) validated by doubly robust estimation and placebo falsification. We release the code, dataset split, and evaluation protocol as a public benchmark to facilitate future safety audits of clinical foundation models.

Track: Research Track (max 4 pages)

Submission Number: 84

Loading