# VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models

This repository contains the training and model merging code for **VisCodex**, a unified multimodal code generation model.

## Installation

Install dependencies with:

```bash
pip install -e ".[torch,metrics]" --no-build-isolation
```

## Model Merging

Merge vision and code models by running:

```bash
python model_merge.py
```

## Training

Model training is conducted using LLaMA-Factory.

## Note

For dataset and implementation details, please refer to the main paper.
