Causal Influence-Aware Counterfactual Data Augmentation

Núria Armengol Urpí; Georg Martius

Causal Influence-Aware Counterfactual Data Augmentation

Núria Armengol Urpí, Georg Martius

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: deep reinforcement learning, data augmentation, learning from demonstrations, out-of-distribution generalization

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: Data augmentation method to create synthetic samples from a fixed dataset to increase generalization and sample efficiency of learning from demonstaration algorithms.

Abstract: Pre-recorded data and human-collected demonstrations are both valuable and practical resources for teaching robots complex behaviors. Ideally, learning agents should not be constrained by the scarcity of available demonstrations, but rather generalize to as many new situations as possible. However, the combinatorial nature of real-world scenarios typically requires a huge amount of data to prevent neural network policies from picking up on spurious and non-causal factors. We propose CAIAC, a data augmentation method that can create feasible synthetic samples from a fixed dataset without the need to perform new environment interactions. Motivated by the fact that an agent may only modify the environment through its actions, we swap causally $\textit{action}$-unaffected parts of the state-space from different observed trajectories in the dataset. In high-dimensional benchmark environments, we observe an increase in generalization capabilities and sample efficiency.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7407

Loading