# ACT-ViT

## Table of Contents

- [Installation](#installation)
- [Handeling Datasets](#handeling-datasets)
    - [Generating Raw Datasets](#generating-raw-datasets)
  - [Preprocess Raw Datasets](#preprocess-raw-datasets)
- [Running ACT-ViT](#running-act-vit)

# Installation

First create a conda environment
```
conda env create -f ACT_ViT_env.yml
```
and activate it
```
conda activate ACT_ViT
```
# Handeling Datasets

Please download the TriviaQA dataset—specifically, the unfiltered version—from https://nlp.cs.washington.edu/triviaqa/. 
You’ll need the following files:
- data/triviaqa-unfiltered/unfiltered-web-train.json

- data/triviaqa-unfiltered/unfiltered-web-dev.json

## Generating Raw Datasets

1. **Make the script executable:**
   ```bash
   chmod +x ./scripts/generate_DC_raw_datasets.sh
   ```

2. **Run the script:**
   ```bash
   ./scripts/generate_DC_raw_datasets.sh [BASE_RAW_DATA_DIR]
   ```

Replace [BASE_RAW_DATA_DIR] with the path where you want the raw datasets to be saved.

## Preprocess Raw Datasets

1. **Make the script executable:**
  ```bash
  chmod +x ./scripts/preprocess_raw_datasets_ACT.sh
  ```
2. **Run the script:**
  ```bash
  ./scripts/preprocess_raw_datasets_ACT.sh [BASE_RAW_DATA_DIR] [BASE_PRE_PROCESSED_DATA_DIR] [NUM_PARALLEL_JOBS]
  ```

# Running ACT-ViT
To run ACT-ViT on all 15 LLM-dataset combinations jointly, execute the following:
```bash
wandb sweep ./sweeps/ACT/HD/ACT_ViT_sweep.yaml
```
