# Domain-Invariant Per-Frame Feature Extraction for Cross-Domain Imitation Learning with Visual Observations

This repository is the under-review-version implementation of "Domain-Invariant Per-Frame Feature Extraction for Cross-Domain Imitation Learning with Visual Observations (DIFFIL)", which is submitted to ICLR2026.

## Requirements

We utilized Anaconda for environment configuration.

The experimental tasks are grounded in OpenAI Gym and DeepMind Control Suite.

Prior to installing mujoco-py and executing the experiment, a MuJoCo activation key is requisite, which can be obtained from https://www.roboti.us/license.html. (expires on October 18, 2031.)

We leveraged a GeForce RTX 3090 GPU as our computational resource, with CUDA 11.4 and CUDNN 8.2.4 installed.

To establish the conda environment with requirements:

```
conda env create -f DIFFIL.yaml
```

If the installation is successful, activate the conda environment:
```
conda activate DIFFIL
```

To facilitate the loading of CUDA and MuJoCo libraries, it is necessary to append environment variables to the bashrc:
```
LD_LIBRARY_PATH="/home/user_name/cuda/lib64:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/user_name/.mujoco/mjpro150/bin
```

The list of environments `--env_name` available for training expert policies are the following:
* InvertedPendulum-v2
* InvertedDoublePendulum-v2
* Reacher2-v2
* Reacher3-v2
* DMPendulum
* DMCartPoleSwingUp
* DMCheetah
* DMWalker
* DMHopper


## Collect expert datasets 

Prior to conducting model training, it is necessary to train the Expert policy to be imitated and construct the dataset using the following command:
```
python collect_expert_data.py --env_name InvertedDoublePendulum-v2
```
The environments accessible for expert policy training and data collection, specified via the `--env_name` parameter, encompass the following:
* InvertedPendulum-v2
* InvertedDoublePendulum-v2
* Reacher2-v2
* Reacher3-v2
* DMPendulum
* DMCartPoleSwingUp
* DMAcrobot
* DMCheetah
* DMWalker
* DMHopper

## Collect random and learner datasets 
Similarly, the random data and initialized learner dataset for training can be constructed using the following command:
```
python collect_random_data.py --env_name InvertedPendulum
```

The environments accessible for random data collection, specified via the `--env_name` parameter, encompass the following:
* InvertedPendulum
* Reacher
* DMPendulum
* DMCartPoleSwingUp
* DMAcrobot
* DMCheetah
* DMWalker
* DMHopper



## Model training

Upon completion of dataset collection, proceed with model training and performance evaluation using the following command:
```
python run_experiment.py --exp_name test_running --env_name InvertedPendulum-v2 --env_type to_two --gpu_id 0 --epochs 500 --recon 0.5 --fcon 0.1 --label_source 10 --label_target 0.001 --label_frame 10 --fwgan_disc 1 --fwgan_gen 0.05 --fwgan_alpha 0.5 --model_num_per_epoch 200 --model_RL_per_step 2000
```
The environments and environment types accessible via the `--env_name` and `--env_type` parameters encompass the following:
* `--env_name=InvertedPendulum-v2` &rarr; `--env_type=to_two`
* `--env_name=InvertedDoublePendulum-v2` &rarr; `--env_type=to_one`
* `--env_name=Reacher2-v2` &rarr; `--env_type=to_three`
* `--env_name=Reacher3-v2` &rarr; `--env_type=to_two`
* `--env_name=DMPendulum` &rarr; `--env_type=to_cartpoleswingup` or `to_acrobot`
* `--env_name=DMCartPoleSwingUp` &rarr; `--env_type=to_pendulum` or `to_acrobot`
* `--env_name=DMCheetah` &rarr; `--env_type=to_walker` or `to_hopper`
* `--env_name=DMWalker` &rarr; `--env_type=to_cheetah` or `to_hopper`
* `--env_name=DMHopper` &rarr; `--env_type=to_walker` or `to_cheetah`

The hyperparameters utilized for model training can be adjusted through the following parameters:
* epochs (The number of total training step)
* recon (Reconstruction loss scale)
* fcon (Feature consistency loss scale)
* label_source (Sequence label loss scale - source)
* label_target (Sequence label loss scale - target)
* label_frame (Frame label loss scale)
* fwgan_disc (WGAN discriminator loss scale)
* fwgan_gen (WGAN generator loss scale)
* fwgan_alpha (WGAN control factor)
* model_num_per_epoch (The number of model training per epoch)
* model_RL-per_step (The number of RL training per epoch)

## License

All content in this repository is licensed under the MIT license.