# HINT: Hierarchical Interaction Modeling for Autoregressive Multi-Human Motion Generation

This repository provides training scripts for our motion generation framework, which consists of two main stages:
1. Training a MotionVAE to learn canonicalized latent representations.
2. Training an Interaction-Aware Diffusion model to generate multi-human motions conditioned on text and interaction cues.

---



## Dataset

We use the **InterHuman** dataset, please download it from *InterGen* and place it into `./data`.  

Then, preprocess InterHuman dataset:

```
python ./data_scripts/extract_dataset_interhuman_single_d262.py
```

Preprocessed sequences should be placed under:

```
./data/InterHuman/seq_data_single_interaction_d262_fps30_mirror_exchangeyz
```

Config files are provided under:

```
./config_files/config_hydra/motion_primitive/
```

---



## Training

### Step 1: Train MotionVAE

```bash
python mld/train_mvae.py \
    --exp_name "mvae_interhuman" \
    --data_args.data_dir './data/InterHuman/seq_data_single_interaction_d262_fps30_mirror_exchangeyz' \
    --data_args.dataset 'interhuman_d262' \
    --data_args.interaction 0 \
    --data_args.enforce_gender None \
    --data_args.enforce_zero_beta 0 \
    --train_args.use_predicted_joints 1 \
    --padding \
    --data_args.cfg_path './config_files/config_hydra/motion_primitive/interhuman_h4_f16_r4.yaml'
```

### Step 2: Train Interaction-Aware Diffusion

```bash
python mld/train_mld.py \
    --exp_name 'mld_interhuman' \
    --denoiser_args.mvae_path './mvae/mvae_interhuman_h4_f16_r4/checkpoint_300000.pt' \
    --data_args.data_dir './data/InterHuman/seq_data_single_interaction_d262_fps30_mirror_exchangeyz' \
    --data_args.cfg_path './config_files/config_hydra/motion_primitive/interhuman_h4_f16_r4.yaml' \
    --data_args.dataset 'interhuman_d262_wpe' \
    --data_args.interaction 1 \
    --data_args.enforce_gender None \
    --data_args.enforce_zero_beta 0 \
    --padding \
    --denoiser_args.model_type transformer \
    --denoiser_args.no_shared_mask \
    --denoiser_args.use_inter \
    --denoiser_args.no_inter_first \
    --denoiser_args.no_text_first \
    --train_args.use_predicted_joints 1 \
    --train_args.weight_feature_rec 1.0 \
    --denoiser_args.merge_his_relpose \
    --denoiser_args.attention_sep \
    --use_interaction_loss \
    --scale_interloss_timestep 0.6 \
    --train_args.weight_rel_orient 1e-4 \
    --train_args.weight_joint_affinity 1e-1 \
    --train_args.weight_distance_map 1e-1 \
    --denoiser_args.no_load_text_embedding \
    --denoiser_args.no_use_indi_text \
    --denoiser_args.clip_version 'ViT-L/14@336px' \
    --denoiser_args.text_ca \
    --denoiser_args.text_sep \
    --denoiser_args.text_encoder_version 'v3' \
    --react_prob 0.0 \
    --warmup_steps 0 \
    --denoiser_args.train_rollout_history 'rollout' \
    --denoiser_args.train_rollout_type 'full' \
    --denoiser_args.use_extra_pe
```



## Evaluation

Please download the `eval_model` from *InterGen* and place it under the directory `./evaluation/eval_model/InterHuman`

To evaluate the trained model, you can run:

```bash
python evaluation/eval_inter_react_mvae.py --device 0 --dataset 'interhuman_d262' --eval_mode final --denoiser_checkpoint <path_to_model> --batch_size=96 --eval_model_args.process_mode 1 --use_predicted_joints 1 --load_from_file --guidance_param 4.0
```

