# Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access

This folder contains code for the paper *Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond
Full-State Access*, submitted to ICML 2026.

## Abstract

Asymmetric actor-critic methods are widely used in partially observable reinforcement learning, but typically assume
full state observability to condition the critic during training, which is often unrealistic in practice. We introduce
the informed asymmetric actor-critic framework, allowing the critic to be conditioned on arbitrary state-dependent
privileged signals without requiring access to the full state. We show that any such privileged signal yields unbiased
policy gradient estimates, substantially expanding the set of admissible privileged information. This raises the problem
of selecting the most adequate privileged information in order to improve learning. For this purpose, we propose two
novel informativeness criteria: a dependence-based test that can be applied prior to training, and a criterion based on
improvements in value prediction accuracy that can be applied post-hoc. Empirical results on partially observable
benchmark tasks and synthetic environments demonstrate that carefully selected privileged signals can match or
outperform full-state asymmetric baselines while relying on strictly less state information.

## Installation of dependencies

We recommend to create a new conda environment using Python version 3.10 to run our code. If Anaconda or Miniconda is
not yet installed, install it first.

To create a new conda environment, run

- `conda create -n "aac-310" python=3.10`.

Then activate the environment and install the dependencies from the `requirements.txt` using

- `pip install -r requirements.txt`.

If you encounter any problems during installation please have a look at the packages listed in the file and install them
manually.

## Project structure

The folder `benchmark_tasks` contains the code changed for running the six benchmark tasks "Hell-Heaven-3", "
Shopping-5", "Car-Flag", "Cleaner", "7x7-Memory-Four-Room", and "9x9-Memory-Four-Room", and visualizing the
corresponding learning curves (cf. Figure 1). It also contains the underlying data for the results presented in the
article's main body.

The folder `informativeness` contains the code for the synthetic informed POMDP environments and the corresponding
actor-critic models. It also contains the code for computing the informativeness criteria.