# Fidelity-Aware Data Composition for Robust Robot Generalization

This repository contains the official implementation for "Fidelity-Aware Data Composition for Robust Robot Generalization." Our work focuses on developing robust and generalizable robotic policies by leveraging strategic data composition. The primary training data, including tri-view robot videos, is sourced from the [AgiBotWorld-Beta](https://huggingface.co/datasets/agibot-world/AgiBotWorld-Beta) dataset.

## System Requirements

To successfully run the experiments, please ensure your environment meets the following specifications:

-   **GPU:** NVIDIA GPUs with Ampere architecture or newer (e.g., RTX 30 Series, A100).
-   **NVIDIA Driver:** A version compatible with CUDA 12.6.
-   **OS:** Linux x86-64.
-   **Glibc:** Version 2.31 or higher (e.g., shipped with Ubuntu 22.04+).
-   **Python:** Version 3.10.

---

## 1. Data Preparation

Our model is trained on the AgiBotWorld-Beta dataset. This is a gated dataset on Hugging Face, and you must agree to their terms and share your contact information to gain access.

Once access is granted, we recommend using the `huggingface-cli` for a resumable download of the dataset:

```bash
huggingface-cli download \
  --resume-download \
  --repo-type dataset agibot-world/AgiBotWorld-Beta \
  --local-dir ./AgiBotWorld-Beta
```

Alternatively, you can load the dataset directly within a Python script using the `datasets` library, which will handle downloading and caching:

```python
from datasets import load_dataset

# This will download and cache the dataset upon first run
dataset = load_dataset("agibot-world/AgiBotWorld-Beta")
```

> **Note:** The full dataset is 43.8TB. A smaller 7GB sample is available for preliminary inspection and development.

---

## 2. Installation

We recommend using Conda for environment management.

### a. Create and Activate Conda Environment

First, create a Conda environment with Python 3.10 and activate it:

```bash
# Create a new conda environment named 'mvaug_env'
conda create -n mvaug_env python=3.10 -y

# Activate the environment
conda activate mvaug_env
```

### b. Install Dependencies

With the environment activated, install the required packages using `pip`. The following command includes an extra index URL to fetch the correct PyTorch build compatible with CUDA 12.6:

```bash
pip install -U "mvaug[cu126]" \
  --extra-index-url https://nvidia-cosmos.github.io/cosmos-dependencies/cu126_torch201/simple
```

---

## 3. Running Distributed Jobs

We provide a convenient shell script, `cluster_train.sh`, to handle the complexities of launching distributed jobs with `torchrun`. It is used for both training and can be adapted for multi-node inference.

### a. Pre-Run Configuration

Before launching a job, you must edit the `cluster_train.sh` script and replace the following placeholders with your system-specific paths:

-   `<CONDA_PATH>`: The absolute path to your Conda installation (e.g., `~/miniconda3` or `~/anaconda3`).
-   `<ENV_NAME>`: The name of the Conda environment you created in step 2 (e.g., `mvaug_env`).
-   `<PROJECT_ROOT_PATH>`: The absolute path to the root directory of this project.

### b. Multi-Node Configuration

The script is designed for multi-node execution and can be configured using environment variables. It uses sane defaults for single-node execution.

**Key Environment Variables:**

| Variable      | Description                                             | Default Value |
| ------------- | ------------------------------------------------------- | ------------- |
| `WORLD_SIZE`  | The total number of nodes in the distributed job.       | `1`           |
| `RANK`        | The rank of the current node, from 0 to `WORLD_SIZE-1`. | `0`           |
| `MASTER_ADDR` | The IP address of the node with `RANK=0`.               | `127.0.0.1`   |
| `MASTER_PORT` | The network port on the master node for communication.  | `13742`       |

---

## 4. Training

The main entry point for training is `scripts/train_mvaug.py`. This script uses a YAML file for base configuration and allows command-line overrides for quick experiments.

### a. Configuration System

All training parameters are defined in YAML files located in the `configs/` directory. You must specify which configuration to use via the `--config_file` argument.

Any parameter in the YAML file can be overridden from the command line using dot notation. For example, to change the learning rate defined in `optimizer.lr`, you can pass the argument `--optimizer.lr 0.0005`.

### b. Usage Examples

All training jobs should be launched using the `cluster_train.sh` script to ensure the distributed environment is set up correctly.

#### Single-Node Training

```bash
# Make the script executable
chmod +x cluster_train.sh

# Launch training with a specific config file
./cluster_train.sh scripts/train_mvaug.py --config_file configs/my_experiment_config.yaml

# Launch training while overriding batch size and learning rate
./cluster_train.sh scripts/train_mvaug.py \
  --config_file configs/my_experiment_config.yaml \
  --train_dataloader.batch_size 64 \
  --optimizer.lr 1e-4
```

#### Multi-Node Training

Configure the environment variables on each node as described in Section 3 and run the same command.

**On the master node (Rank 0, IP: 192.168.1.101):**

```bash
export WORLD_SIZE=2 RANK=0 MASTER_ADDR=192.168.1.101
./cluster_train.sh scripts/train_mvaug.py --config_file configs/dist_train.yaml
```

**On the worker node (Rank 1):**

```bash
export WORLD_SIZE=2 RANK=1 MASTER_ADDR=192.168.1.101
./cluster_train.sh scripts/train_mvaug.py --config_file configs/dist_train.yaml
```

---

## 5. Inference

We provide `scripts/infer_mvaug.py` for generating video outputs from a trained model. The script performs chunk-wise processing, enabling inference on arbitrarily long videos.

### a. Configuration

Inference configuration is handled directly within the `prepare_model_and_data` function in `scripts/infer_mvaug.py`. **You must modify this script** before running inference to set the correct paths.

Please update the following placeholder variables:

-   **`config_file`**: Path to the `.yaml` configuration file used for training the model.
-   **`root_dir`**: Path to the root directory where your experiment outputs are saved.
-   **`model_dir`**: The specific path to the model checkpoint directory (e.g., `<EXPERIMENT_TIMESTAMP>/<MODEL_STEP>`).
-   **Validation Data Paths**: The paths within `trainer.args.data['val']` (e.g., `jsonl_path_list`, `video_folder_list`).
-   **`save_dir`**: The root directory where the generated videos will be saved.

### b. Running Inference

The script is executed directly and uses `fire` to parse command-line arguments that control the generation process.

#### Key Arguments

| Argument                | Type     | Default     | Description                                                                                                                               |
| ----------------------- | -------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `merge_view_into_width` | `bool`   | `False`     | If `True`, concatenates the three camera views horizontally into a single video. Otherwise, saves them as separate files.                 |
| `save_gt`               | `bool`   | `False`     | If `True` and `merge_view_into_width` is `True`, saves the ground-truth video for comparison.                                              |
| `relight_type`          | `str`    | `relight_0` | Specifies the relighting effect to apply to the initial frame.                                                                            |
| `steps`                 | `int`    | `4`         | The number of DDIM inference steps to use for generation.                                                                                 |
| `multiprocess`          | `bool`   | `False`     | Set to `True` for multi-GPU inference. See the advanced section below.                                                                    |
| `relight_candidates`    | `list`   | `None`      | A list of `relight_type` strings to be used in multi-GPU inference, with one string per GPU process.                                      |

#### Example Usage

To run inference with default settings and 10 inference steps:

```bash
python scripts/infer_mvaug.py --steps 10
```

To run inference and save the output as a single wide video:

```bash
python scripts/infer_mvaug.py --steps 10 --merge_view_into_width True
```

### c. Advanced: Multi-GPU Inference

The script supports running different inference tasks (e.g., applying different relighting effects) in parallel across multiple GPUs. This feature should be launched with `torchrun`. It assigns each `relight_type` from the `relight_candidates` list to a different GPU process.

**Example:**

To run 4 different relighting experiments in parallel on 4 GPUs:

```bash
torchrun --nproc_per_node=4 scripts/infer_mvaug.py \
  --multiprocess True \
  --steps 10 \
  --relight_candidates "['relight_0', 'relight_1', 'relight_2', 'vignette']"
```

This command launches four processes. The process on GPU 0 will use `'relight_0'`, GPU 1 will use `'relight_1'`, and so on. The generated videos will be saved into separate subdirectories named after their respective `relight_type`.