We propose to learn preference representations and regularized conditional diffusion model for trajectory generation and decision-making across single- and mulit-task scenarios.

## Run

The training process includes training the preference representations, the optimal representation, and the diffusion model. You can train all the models in the D4RL benchmarks:
```
python train_all.py --repre_type dist --condition_guidance_w 1.2 --K 20 --batch_size 32 --seed 100 --max_iters 2000 --info_loss_weight 0.5 --pw 'average' --z_dim 16 --optimal
```

The dataset used on MetaWorld benchmarks is same as [MTDiff](https://github.com/tinnerhrhe/MTDiff).

The supplemental part of DPM-solver is at line 367~393 of diffuser/models/diffusion.py. We can replace the original sampling funciton with the dpm_sample function.

## Acknowledgement

The construction of diffusion models in our method is partially based on [Decision Diffuser](https://github.com/anuragajay/decision-diffuser). The processment for the preference data and the construction of preference representations are partially based on [OPPO](https://github.com/bkkgbkjb/OPPO)
