# Code for paper Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains

## Installation

- Requirements: Python 3.12
- Then run command:
> pip3 install -e .
> pip3 install -r requirements.txt

## Run

- To use your custom dataset, please edit `./verl/utils/dataset_xxx.py`
- To change training parameters, please edit `./config.yaml`
- To edit reward function, please edit `./model/reward.py`

After these modifications, you can run
> bash ./scripts/train.sh

To merge checkpoints to huggingface format, please use
> python3 scripts/model_merger.py --local_dir <path_to_actor>