# TRACE Deployment Gate Artifact

This artifact contains the code to reproduce the "Deployment Gate on DomainNet" experiment (Section 6.3, Table 3) from the ICLR 2026 paper: "TRACE: Theoretical Risk Attribution under Covariate-shift Effects".

The goal of this experiment is to decide whether to replace a source model `Q` (trained on `real`) with a candidate model `tilde Q` (fine-tuned on `sketch`) by using a single, interpretable score to flag harmful updates.

## Directory Structure

```
ICLR_ActiveSelection_Artifact/
├── README.md                 <- This file
├── environment.yml           <- Conda environment definition
├── requirements.txt          <- Pip requirements
├── data/
│   └── download_domainnet.sh <- Script to download and set up DomainNet
├── configs/
│   └── domainnet_gate.yaml   <- Config for the deployment gate experiment
├── src/
│   └── ...                   <- All Python source code
├── run_deployment_gate.sh    <- Main script to run the experiment
└── post_processing/
    └── compute_gate_metrics.py <- Script to generate the final table from results
```

## Setup

### 1. Create Conda Environment

We recommend using Conda to manage dependencies.

```bash
# Create and activate the conda environment
conda env create -f environment.yml
conda activate trace
```

Alternatively, you can use `pip`:
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

### 2. Download DomainNet Dataset

The experiment runs on the DomainNet dataset. The provided script will download the necessary splits (`real` and `sketch`).

```bash
bash data/download_domainnet.sh
```
This will place the data under `data/domainnet/`.


## Running the Experiment

The entire experiment can be run using a single script. This script will perform all necessary steps:
1.  Train the source model `Q` on the `real` domain.
2.  Fine-tune 20 candidate models (`tilde Q`) on the `sketch` domain with different hyperparameters to simulate various updates.
3.  For each candidate, compute the gate scores (`TRACE-W`, `TRACE-MMD`, and `-MSP`) and the true harm `|ΔR|`.
4.  Save all results to a single CSV file in the `outputs/` directory.

```bash
bash run_deployment_gate.sh
```

**Note**: This experiment requires a GPU with at least 11GB of VRAM and may take several hours to complete.

## Reproducing the Results Table

After the experiment finishes, you can generate the AUROC/AUPRC table from the paper using the `compute_gate_metrics.py` script.

This script reads the `gate_metrics.csv` file produced by the main experiment and calculates the AUROC and AUPRC for each gate score at different harm thresholds. It will print the results and save a LaTeX-formatted table to `outputs/deployment_gate_results.tex`.

```bash
python post_processing/compute_gate_metrics.py outputs/<RUN_DIR>/gate_metrics.csv
```

Replace `<RUN_DIR>` with the timestamped output directory created by `run_deployment_gate.sh`.
