# TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models

## 💻 Installation

It is recommended to use Anaconda.

```bash
# Create Conda env
conda create -n twinvla python=3.10 -y
conda activate twinvla

# Install Requirements
pip install -r requirements.txt

# Install TwinVLA
pip install -e .
```

## 🤖 Quick usage

```python
from twinvla.model.twinvla import TwinVLA

model = TwinVLA(checkpoint_dir)

actions = model.predict_action(
    unnorm_key=unnorm_key, 
    instruction=instruction, 
    image=front_img,
    image_wrist_r=right_wrist_img,
    image_wrist_l=left_wrist_img,
    proprio=proprio,
)

for action in actions:
    robot.excute(action)
```

## Pretrainig SingleVLA

1. Donwload OXE dataset

2. run pretrain_singlevla.sh (make sure you put correct path for data_root_dir)


## 🤖 Finetuning TwinVLA on RoboTwin 2.0

1. Download RoboTwin 2.0

    Follow the offical guideline for installing RoboTwin 2.0.
    [RoboTwin Installation](https://robotwin-platform.github.io/doc/usage/robotwin-install.html#1-dependencies)

2. Download dataset

    Download RoboTwin 2.0 dataset from huggingface
    ```
    huggingface-cli download TianxingChen/RoboTwin2.0 --include */aloha-agilex_clean_50.zip --local-dir /path/to/download
    ```

    Then unzip aloha-agilex_clean_50.zip

3. Preprocess dataset (Hdf5 to Hdf5)

    Modify preprocess_robotwin.sh to set the paths correctly.

    ```
    sh preprocess_robotwin.sh task_name
    ```

4. Generate RLDS dataset (Hdf5 to RLDS)

    ```
    sh generate_rlds_robotwin.sh task_name preprocessed/hdf5/path rlds/output/dir
    ```


5. Finetune TwinVLA

    We already registered Robotwin's RLDS dataset, you can directly start finetuning TwinVLA.

    Make sure you put correct *data_root_dir* and *data_mix*.

    ```
    sh train_twinvla.sh
    ```

6. Evaluate TwinVLA

    Copy and paste the folder *TwinVLA-RoboTwin* to *policy/* in RoboTwin folder.

    ```
    # EASY evaluation
    sh evaluation.sh put_object_cabinet demo_clean demo_clean 0 0 checkpoint_dir

    # HARD evaluation
    sh evaluation.sh blocks_ranking_size demo_randomized demo_clean 0 0 checkpoint_dir 
    ```


## 🔥 Finetuning TwinVLA on your own robot dataset

Since we are based on OpenVLA's RLDS dataset loader, we currently only support fine-tuning with RLDS datasets. However, we provide instructions to convert datasets in HDF5 format into the RLDS format. If you already have RLDS, you can start from Step 2.

1. Prepare your dataset

    In this section, we explain how to convert your custom dataset, stored in the HDF5 file format, into the RLDS format.

    - Copy and paste the folder `rlds_generator/aloha_dish_drainer`.
    - Change the folder name to your dataset name `rlds_generator/my_dataset`
    - Change the generator file name to your dataset name `rlds_generator/my_dataset/my_dataset.py`
    - Open `my_dataset.py` and change the Class name to your datset name. `MyDataset` Note that underbar should be replaced to capital letter.
    - Specify the directory of your hdf5 dataset folder
        ```python
        def _split_generators(self, dl_manager: tfds.download.DownloadManager):
        """Define data splits."""
        return {
            'train': self._generate_examples(path='/path/to/dataset/*.hdf5'),
        }
        ```
    - Modify `_parse_example(hdf5_path)` according to your dataset format.
    - If your done, go to `rlds_generator/my_dataset` and run `tfds build --data_dir path_for_rlds` to generate RLDS.

2. Register your dataset

    You need to register your RLDS datset by adding "twinvla/datasets/rlds/oxe/configs.py" and "twinvla/datasets/rlds/oxe/transforms.py" and "twinvla/datasets/rlds/oxe/hzs.py".

3. Start finetuning

    Run `train_twinvla.sh`. Make sure you specified the arguments correctly, including changing `output_dir` for checkpoints.


## 💡 Try new VLM backbones!

Our version of TwinVLA is based on Eagle2-1B, but we have established a foundation that allows us to test various VLM backbones for our preliminary experiments.

1. Make SingleVLA template

    Since TwinVLA is built on top of SingleVLA, introducing a new backbone model requires creating a template for SingleVLA.
    Please run the code below to generate a template for the new backbone.
    ```bash
    python3 singlevla_gen.py --model_type InternVL3_1B
    ```

    Template will be generated into `twinvla/model/singlevlas/model_type.py`

2. Complete the SingleVLA template

    Complete the generated SingleVLA template. You can refer the templates already implemented.

3. Test SingleVLA / Extend to TwinVLA

    Use the generated SingleVLA for single-arm manipulation or extend it to TwinVLA.

    To extend, create a TwinVLA template in `twinvla/model/twinvlas` and fill in the template by referring to the implemented files. Since a custom attention mechanism is used, if you plan to use a VLM backbone other than Qwen2, you must also implement the corresponding functions. Please refer to `TwinVLAMetaModel` in `twinvla/model/base_model.py`.
