Reproducing and Extending Counterfactual Data Augmentation: A Study on Causal Identifiability and Stability in Reinforcement Learning

Shilpa Noushad; Sajan Kumar; Pratyush uppuluri

Reproducing and Extending Counterfactual Data Augmentation: A Study on Causal Identifiability and Stability in Reinforcement Learning

Shilpa Noushad, Sajan Kumar, Pratyush uppuluri

Published: 01 Apr 2026, Last Modified: 30 Apr 2026CLaRAMAS FullEveryoneRevisionsCC BY 4.0

Keywords: Causal Reinforcement Learning, Counterfactual Data Augmentation, Offline Reinforcement Learning, Structural Causal Models, Robustness

TL;DR: This work presents a ground-up reproduction of the CTRL framework, evaluating counterfactual data augmentation via a multi-factor validation matrix in CartPole-SD and extending external-validity testing to LunarLander, MuJoCo, and D4RL environments.

Abstract: We present a reproducibility-focused reimplementation and extension of CTRL, a causal reinforcement learning method based on counterfactual data augmentation. Beyond reproducing CartPole-SD, we run a controlled validation matrix over counterfactual fraction, noise level, dataset size, and generator quality, then test transfer to LunarLander, MuJoCo, and D4RL-style offline settings. The empirical pattern is consistent across runs: counterfactual augmentation is conditionally useful, not uniformly superior. In CartPole, it can improve clean returns, especially with larger datasets and stronger generators, but noisy-evaluation gains are modest. In cross-domain settings, outcomes are mixed and currently budget-limited. Our claim is therefore scoped: counterfactual augmentation can help offline RL in specific regimes, but reliability depends on data regime, generator fidelity, and evaluation protocol. By introducing a comparative analysis against a non-causal Base-S world model, we identify a critical 'coverage-versus-bias' tradeoff where excessive augmentation can amplify transition inaccuracies, a failure mode particularly evident in balance-heavy tasks. We further demonstrate that Bellman-score selection is insufficient to overcome these biases in high-variance regimes. Finally, we fill a significant gap in the community by providing a verified, ground-up open-source implementation of the CTRL architecture to facilitate further research in causal RL.

Paper Type: Full (minimum of 10 pages and a maximum of 16 excluding references)

Poster Opt In: Yes, I'm open to having my submission accepted as a poster (leave blank if you are submitting a poster, or if you DON'T want your submission to be accepted as a poster instead of a full or short paper)

Supplementary Material: zip

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 7

Loading