# Sherman-Morrison Actor Critic
Code base for paper "Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning"

## Instructions

There are two options for running the code: locally or through Docker.


### Option 1: Local Python

#### Installation

```
# pytorch (Choose the version that suits your system)
 pip install torch==2.2.2+cu121 torchvision==0.17.2+cu121 torchaudio==2.2.2 --extra-index-url https://download.pytorch.org/whl/cu121

# Other requirements
pip install -r requirements.txt
```

#### Run experiment 

`python src/runner/mujoco_runner.py --env_name "mujoco" --task_name "hopper" --algorithm_name "smac" --num_mini_batch 1 --episode_length 1000 --num_env_steps 50000`


### Option 2: Docker

#### Prerequisites
1. Docker
2. NVIDIA CUDA

#### Wandb Logging (Optional)

If you wish to log your runs to wandb:
1. Create a file named `docker/.env`
2. Add your API key inside: `WANDB_API_KEY=<YOUR_WANDB_API_KEY>`

#### Create Docker Image

`docker compose -f docker/docker-compose.yml up --build`

#### Run Docker Container

`docker compose -f docker/docker-compose.yml run --rm run_smac --env_name mujoco --task_name hopper`

_Note: You can modify the environment, map, algorithm, or any of the hyperparameters from `config.py` as CLI arguments in the command above._

## Citing the Project
To cite this repository in publications:

```
@misc{huo2026rank1approximationinversefisher,
      title={Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning}, 
      author={Yingxiao Huo and Satya Prakash Dash and Radu Stoican and Samuel Kaski and Mingfei Sun},
      year={2026},
      eprint={2601.18626},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.18626}, 
}
```

## License
This project is licensed under the MIT License.


