# RIPT-VLA

> **ICLR 2026 Submission #2218 - Supplementary Material**

**Interactive Post-Training for Vision-Language-Action Models**  
Anonymous implementation of RIPT-VLA. Parts of the repo are built on a fork of [QueST](https://quest-model.github.io).

> **RIPT-VLA** improves *any* pretrained VLA backbone (e.g., **QueST**, **OpenVLA-OFT**) using only **sparse binary success rewards**.  
> Through **K-rollout interaction**, **dynamic sampling**, and **leave-one-out advantage estimation**, we reach state-of-the-art success rates and successful in extremely low-data regimes.

<p align="center">
  <img src="image/teaser.png" width="100%" alt="RIPT-VLA overview">
</p>


## 🔥 Highlights
* **Plug-and-Play Post-Training** – fine-tune *any* VLA model with only task-success signals (no dense rewards, no value nets).  
* **SOTA-performance** – **94.3%** success rate on LIBERO-90 with QueST + RIPT; **97.5** success rate on LIBERO Suites (Goal, Spatial, Object, Long) with OpenVLA-OFT + RIPT.
* **Low-data Regime** – Extreme Low-Data Success – RIPT-VLA turns failure-prone models (e.g., 4% success with 1 demo) into performant agents (**97+**) using only sparse binary rewards and just 15 iterations.

## 📚 Table of Contents

1. [Getting Started](#🚀-getting-started)
2. [Training Models from Scratch](#🏋️-training-models-from-scratch)
3. [Model Zoo (Available Upon Acceptance)](#🤗-model-zoo-available-upon-acceptance)
4. [Quest RIPT Training](#🏋️-quest-ript-training)
5. [OpenVLA-OFT RIPT Training](#🏋️-openvla-oft-ript-training)
6. [Core RIPT Code Overview](#🧪-core-ript-code-overview)

## 🚀 Getting Started

### 💠 Installation

Follow the instructions in [INSTALL.md](INSTALL.md) for QueST and OpenVLA-OFT.

### 📦 Paths
Replace the following paths in `config/paths.yaml`:

```yaml
paths:
  output_prefix: /path/to/experiment/output # Checkpoint and log output directory
  data_prefix: /path/to/libero/data # LIBERO data directory
  wandb_project: ript-vla # Your wandb project name
```

### Quick Start: RIPT a SFT Model on LIBERO-90

This is an example of how to RIPT a SFT QueST model on LIBERO-90:

1. Install Quest and LIBERO following [INSTALL.md](INSTALL.md)
2. Train the SFT model from scratch using the provided training scripts (see Stage 1 and Stage 2 sections below).

- (optional) Evaluate the SFT model on LIBERO-90:
  - Fill in the path for `checkpoint_path` with the SFT checkpoint path in the evaluation script in `scripts/quest/eval/libero_90.sh`
  - Run the evaluation script with the number of GPUs you want to use:
  ```bash
  bash scripts/quest/eval/libero_90.sh $NUM_GPU
  ```
3. Run the RIPT script with the number of GPUs you want to use:
  - Fill in the path for `checkpoint_path` with the SFT checkpoint path in the RIPT script in `scripts/quest/stage_3_ript/libero_90.sh`
  - Run the RIPT script with the number of GPUs you want to use:
    ```bash
    bash scripts/quest/stage_3_ript/libero_90.sh $NUM_GPU
    ```

The script will:
- Load the pre-trained SFT model
- Run RIPT training on LIBERO-90
- Log results to WandB
- Use `$NUM_GPU` to specify the number of GPUs to use (Recommended at least 3 GPUs).

For complete details of the training process, see the [Quest + RIPT Training](#🏋️-quest-ript-training) section.


## 🏋️ Training Models from Scratch

**Note for Reproducibility:** All models should be trained from scratch using the provided training scripts. Pre-trained checkpoints are not provided in this anonymous version.

### QueST Training

Train QueST models on LIBERO suites using the three-stage process:

1. **Stage 1**: Autoencoder training - `scripts/quest/stage_1_autoencoder/libero_*.sh`
2. **Stage 2**: Supervised Fine-Tuning - `scripts/quest/stage_2_sft/libero_*.sh`  
3. **Stage 3**: RIPT training - `scripts/quest/stage_3_ript/libero_*.sh`

Supported benchmarks: LIBERO-90, LIBERO-GOAL, LIBERO-LONG, LIBERO-OBJECT, LIBERO-SPATIAL

### OpenVLA-OFT Training

Train OpenVLA-OFT models using the official OpenVLA-OFT repository for stages 1-2, then apply RIPT:

1. **Stages 1-2**: Follow [OpenVLA-OFT LIBERO instructions](https://github.com/moojink/openvla-oft/blob/main/LIBERO.md)
2. **Stage 3**: RIPT training - `scripts/openvla_oft/stage_3_ript/libero_*.sh`

Supported benchmarks: LIBERO-GOAL, LIBERO-LONG, LIBERO-OBJECT, LIBERO-SPATIAL

## 🤗 Model Zoo (Available Upon Acceptance)

**All pre-trained models will be released upon paper acceptance.**

Upon acceptance, we will provide a comprehensive model zoo including:

### QueST Checkpoints
- **SFT and RIPT checkpoints** for all LIBERO suites (LIBERO-90, LIBERO-GOAL, LIBERO-LONG, LIBERO-OBJECT, LIBERO-SPATIAL)
- Model size: ~80MB each
- Ready-to-use checkpoints for both supervised fine-tuned (SFT) and RIPT-enhanced models

### OpenVLA-OFT Checkpoints  
- **Scale headers and LoRA adaptors** for all LIBERO suites (LIBERO-GOAL, LIBERO-LONG, LIBERO-OBJECT, LIBERO-SPATIAL)
- Scale headers: ~300MB each
- RIPT LoRA adaptors: ~1GB each
- Compatible with official OpenVLA-OFT base models

**Note:** Until acceptance, please use the training scripts provided above to reproduce all results from scratch.

## 🏋️ Quest RIPT Training

Activate the `ript-vla` conda environment and run the following commands:

### Stage 1: (Optional) Pre-Training of QueST autoencoder

- *This stage is required as no pre-trained checkpoints are provided.*
- Run `scripts/quest/stage_1_autoencoder/libero_*.sh` for different LIBERO suites.
- This stage only trains the autoencoder of QueST that is used for SFT.
- Only 1 GPU is needed.
- Example for LIBERO-90:
```bash
bash scripts/quest/stage_1_autoencoder/libero_90.sh
```

### Stage 2: (Optional) Supervised Fine-Tuning of QueST

- *This stage is required as no pre-trained checkpoints are provided.*
- Run `scripts/quest/stage_2_sft/libero_*.sh` for different LIBERO suites.
- This stage conducts supervised fine-tuning of QueST.
- Only 1 GPU is needed.
- Example for LIBERO-90:
```bash
bash scripts/quest/stage_2_sft/libero_90.sh
```

### Stage 3: RIPT (Reinforcement Interactive Post-Training)
- Check `scripts/quest/stage_3_ript/libero_*.sh` for different LIBERO suites.
- Fill in the path for `checkpoint_path` with the SFT checkpoint path from Stage 2.
- Use `$NUM_GPU` to specify the number of GPUs to use (Recommended 3 GPUs for LIBERO suites and 6 GPUs for LIBERO-90).
- This stage conducts RIPT training of QueST.
- Example for LIBERO-90:
```bash
bash scripts/quest/stage_3_ript/libero_90.sh $NUM_GPU
```

Key flags:

* `algo.rloo_batch_size`: number of rollouts to use for RLOO K-sampling (default: 8)
* `algo.num_ppo_epochs`: number of PPO epochs (default: 20)
* `algo.ppo_batch_size`: number of PPO batches (default: 6 = 1 * 6 GPU)
* `train_dataloader.batch_size`: batch size for training (default: 180 = 30 initializations * 6 GPU)
* `training.n_steps`: number of training steps (default: 15)
* `training.rollout_steps`: number of rollout interval steps (default: 3)
* `algo.enable_dynamic_sampling`: enable dynamic sampling (default: true)

### Evaluation

- Run `scripts/quest/eval/libero_*.sh` for different LIBERO suites.
- Fill in the path for `checkpoint_path` with the RIPT/SFT checkpoint path.
- Example for LIBERO-90:
```bash
bash scripts/quest/eval/libero_90.sh $NUM_GPU
```


## 🏋️ OpenVLA-OFT RIPT Training

Activate the `ript_vla_openvla_oft` conda environment and run the following commands:

### Stage 1 + Stage 2: Pre-Training + SFT of OpenVLA-OFT on LIBERO-Suites

- Use the pre-trained and SFTed OpenVLA-OFT checkpoint from the [OpenVLA-OFT repo](https://github.com/moojink/openvla-oft/blob/main/LIBERO.md).
- Download the OpenVLA-OFT full model from the official OpenVLA-OFT repository for each task suite.
- Train the SFTed scale head using the provided training scripts.

### Stage 3: RIPT of OpenVLA-OFT on LIBERO-Suites

- Check `scripts/openvla_oft/stage_3_ript/libero_*.sh` for different LIBERO suites.
- Fill in the path for `checkpoint_path` with the SFT checkpoint path from the official OpenVLA-OFT repo.
- Fill in the path for `header_checkpoint` with the SFTed scale head from your training.
- Fill in the path for `lora_adaptor_ckpt` with the RIPT LoRA Adaptor checkpoint:
  - With a previously trained RIPT model if continuing from an existing checkpoint, or
  - With `null` if starting from the SFT model.
- Use `$NUM_GPU` to specify the number of GPUs to use (Recommended 4 GPUs for LIBERO suites).
- This stage conducts RIPT training of OpenVLA-OFT.
- Example for LIBERO-LONG:
```bash
bash scripts/openvla_oft/stage_3_ript/libero_long.sh $NUM_GPU
```

### Evaluation

- Run `scripts/openvla_oft/eval/libero_*.sh` for different LIBERO suites.
- Fill in the path for `checkpoint_path` with the SFT checkpoint path from the official OpenVLA-OFT repo.
- Fill in the path for `header_checkpoint` with the SFTed scale head from your training.
- Fill in the path for `lora_adaptor_ckpt` with the RIPT LoRA Adaptor checkpoint:
  - With the RIPT model saved from Stage 3, or
  - With your trained LoRA Adaptor checkpoint (if evaluating from SFT model), or
  - Set to `null` for evaluation from SFT model.

- Example for LIBERO-GOAL:
```bash
bash scripts/openvla_oft/eval/libero_long.sh $NUM_GPU
```


## 🧪 Core RIPT Code Overview

### 1. Model Interface

`ModelAdapter` provides an adaptation layer that connects the VLA models to the RIPT optimizer. It handles:
* Path: `ript/algos/rl_optimizers/model_interface.py`
* Computing action log probabilities
* Retrieving the policy model for optimization

### 2. Libero Runner with Context Caching

`LiberoRunner_rl` handles the interaction with the Libero environment with context caching:
* Path: `ript/env_runner/libero_runner.py: LiberoRunner_rl`
* Supports batch rollout generation
* Manages task-specific context caching (e.g., context tokens, output action indices) for action log probability computation

### 3. Rollout Generator

`RolloutGenerator` handles the generation of rollouts for RL training with Dynamic Sampling:
* Path: `ript/algos/rl_optimizers/rollout_generator.py`
* Manages environment interactions
* Gathers rollouts across environments
* Supports early stopping to improve efficiency
* Handles distributed rollout generation

### 4. RL Optimizer

`RLOptimizer` implements the Leave-on-out PPO (LOOP) algorithm:
* Path: `ript/algos/rl_optimizers/rl_optimizer.py`
* Processes generated rollouts
* Computes rewards and advantages with Leave-One-Out Advantage Estimation
* Applies PPO updates to the policy
* Collects and returns optimization metrics

### How to add a new VLA model?

1. Implement the `ModelAdapter` for the new model following the current `ModelAdapter` interface.
2. Implement the `LiberoRunner_rl` for the new model to cache model context for action log probability computation. For example, `LiberoRunner_rl` caches the context tokens and output action indices for QueST.
3. Add the new model and RIPT config following the existing `quest_rl.yaml` and `openvla_oft_rl.yaml` format.