# VeMo: Video Language Models are Human-Aligned Evaluators for Text to Motion Generation
This repository contains the code implementation, processed data of our paper.



## Setup
```
conda create -n vemo python=3.11
conda activate vemo
pip install -r requirements.txt
```

## Resources
Download [OpenGVLab/InternVL3-14B](https://huggingface.co/OpenGVLab/InternVL3-14B) to `./storage/vlm/InternVL3_14B`

- We have released the code, rendering software used to generate the videos, see `./src/visualize` and `./src/blender`.
- We have provided the coarse grained labels in `./storage/eval_scores` for fully reproducing our experimental results.
- We will release extra resources to support the research community, including unprocessed motion data in 22-joint SMPL format and rendered videos with fine-grained oracle/user annotations. 


## Run demo to get VeMo score and entropy on a demo sample
```bash
python demo/demo.py
```

## Reproduce the main experimental results
```bash
python src/evaluate_system.py
```