We propose to learn preference representations and regularized conditional diffusion model for trajectory generation and decision-making across single- and mulit-task scenarios.

## Run

The training process includes training the preference representations, the optimal representation, and the diffusion model. You can train all the models in the D4RL benchmarks:
```
python train_all.py --env_name hopper-medium-expert --repre_type dist --max_iters 1000 --normalize --K 100 --batch_size 32 --pw 'average' --z_dim 16 --info_loss_weight 0.01 --condition_guidance_w 1.2 --seed 200
```

The supplemental part of DPM-solver is at line 367~393 of diffuser/models/diffusion.py. We can replace the original sampling funciton with the dpm_sample function.

## Acknowledgement

The construction of diffusion models in our method is partially based on [Decision Diffuser](https://github.com/anuragajay/decision-diffuser). The processment for the preference data and the construction of preference representations are partially based on [OPPO](https://github.com/bkkgbkjb/OPPO)
