# Learning Generalizable Task Progress Estimators via Test-Time Adaptation

This repository contains code for training and evaluating task progress estimators using test-time adaptation (TTT) to achieve generalization in unseen environments and robot embodiments.

------------------------------------------------------------------------------

## Project Structure

    VLM-ADAPT/
    ├── checkpoints/       # Model checkpoints
    ├── datasets/          # Evaluation datasets (videos, descriptions)
    ├── results/           # Output evaluation logs and metrics
    ├── src/               # Source code and config files
    │   ├── configs/
    │   ├── evaluate.py
    │   ├── train.py
    │   ├── gvl/           # GVL baseline scripts (Gemini, GPT-4o)
    │   ├── metaworld/     # Offline RL experiments in Meta-World
    │   └── models/
    ├── requirements.txt
    └── README.md

------------------------------------------------------------------------------

## Setup

Install all requirements:
    pip install -r requirements.txt

------------------------------------------------------------------------------

## Training

All key parameters are specified in `src/configs/config.py`. The following settings reflect the best-performing configuration:

- batch_sampling: "dissimilarity"
    Samples mini-batches of sub-trajectories that are maximally different in representation space, mitigating temporal shortcut bias.

- lambda_self: 0.5
    Weight for the self-supervised (adaptation) loss relative to the supervised (progress) loss.

- num_epochs: 5
    Number of meta-training epochs.

- base_lr: 1e-4
    Learning rate for the optimizer during meta-training.

- projection_dim: 64
    Size of the projection for adapting CLIP features (dimension of adaptation and prediction heads).

Launch training:
    cd src
    python train.py --config configs/config.py

------------------------------------------------------------------------------

## Evaluation

Evaluation parameters are also set in `src/configs/config.py`:

- ttt_lr: 0.1
    Learning rate for test-time adaptation (TTT) updates.

- ttt_epochs: 1
    Number of adaptation steps per test sequence.

- model_dirs: ["../checkpoints/your_model_checkpoint.pth"]
    List of checkpoint paths for evaluation.

- projection_dims: [64]
    Projection dimension(s) for evaluation (should match training).

Evaluation workflow:
    - The model is loaded and frozen except for the adaptation module.
    - For each test video, the adaptation module is updated online using a self-supervised loss (reconstructing its own features after a random projection).
    - After adaptation, the model predicts a progress value for each timestep.
    - Results are saved in `results/` as `.csv` or `.json`.

Launch evaluation:
    python evaluate.py --config configs/config.py

------------------------------------------------------------------------------

## Method Overview

- The frozen CLIP encoder extracts features for each frame and the task description.
- Features are concatenated and projected.
- An adaptation module (MLP) is meta-trained to update its own weights at test time using a self-supervised reconstruction loss.
- A prediction head (MLP) estimates progress (0=start, 1=completion) for each frame.
- At test time, only the adaptation module updates online for each test video, enabling adaptation to new environments and tasks.

------------------------------------------------------------------------------

### GVL (Gemini & GPT-4o) Baselines

The GVL baselines use large generative models to estimate task progress in zero-shot and few-shot settings.

1.  **Set up your API Key**: Create a file named `.env` inside the `src/gvl/` directory and add your Google API key.
    ```
    # In src/gvl/.env
    GOOGLE_API_KEY="your_api_key_here"
    ```

2.  **Run the Scripts**: Navigate to the `gvl` directory and run the evaluation scripts.
    ```bash
    cd src/gvl/
    
    # Run zero-shot evaluation
    python gemini_zero_shot.py
    
    # Run few-shot evaluation
    python gemini_few_shot.py
    ```

------------------------------------------------------------------------------

## Offline RL in Meta-World MT-10

These experiments test the effectiveness of using our trained progress estimator as a reward signal for an offline RL agent.

1.  **Configure the Pipeline**: Open the relevant configuration file (e.g., `src/metaworld/config_rl.py`) and set the `checkpoint_path` variable to point to your trained model checkpoint.

2.  **Run the Offline RL Pipeline**: From the `src` directory, execute the main pipeline script, passing the appropriate config file.
    ```bash
    cd src
    python -m metaworld.run_pipeline
    ```

------------------------------------------------------------------------------

## Logging

- Training and evaluation metrics are logged to Weights & Biases if enabled.

------------------------------------------------------------------------------

## Data

- Example evaluation datasets: datasets_smpl/
- Results for all experiments: results/
