# Anonymous Reproducibility Package (ICLR'26)

This repository contains the anonymous code package accompanying an ICLR'26 submission: "BasePrompt: Prompting for RNA Fitness Prediction". It provides code to reproduce RNAGym evaluation with EVO models.


## Data
Note that we have prepared the data already. You can skip this step.

- Data Source (RNAGym): https://github.com/MarksLab-DasLab/RNAGym
- Direct Download (official): https://marks.hms.harvard.edu/rnagym/fitness_prediction/fitness_processed_assays.zip
- Place all RNAGym CSVs under `data/DMS_RNAGym_substitutions` (our configs point here by default).

Install and prepare data (bash):
```bash
mkdir -p data
cd data

# 1) Download from the official RNAGym source
wget "https://marks.hms.harvard.edu/rnagym/fitness_prediction/fitness_processed_assays.zip" -O fitness_processed_assays.zip

# 2) Unzip to a temporary folder
unzip fitness_processed_assays.zip -d rnagym_tmp

# 3) Create the expected folder for this repo
mkdir -p DMS_RNAGym_substitutions

# 4) Move all CSVs under the expected folder
find rnagym_tmp/assays -type f -name "*.csv" -exec mv {} DMS_RNAGym_substitutions/ \;

# 5) (Optional) Remove temporary files
rm -rf rnagym_tmp fitness_processed_assays.zip

# 6) Confirm files exist
ls -la DMS_RNAGym_substitutions | head

cd ..
```

Notes:
- Data source and download method credit: RNAGym (MarksLab-DasLab), see https://github.com/MarksLab-DasLab/RNAGym.

## Quick Start

- Environment:
  - `uv sync`
  - `uv pip install evo-model --no-build-isolation`

- EVO1 8K base on RNAGym:
  - Baseline: `uv run -m scripts.run conf/rnagym/evo1/evo-1-8k-base_rnagym.py`
  - BasePrompt: `uv run -m scripts.run conf/rnagym/evo1/evo-1-8k-base_rnagym_baseprompt.py`

- EVO1.5 8K base on RNAGym:
  - Baseline: `uv run -m scripts.run conf/rnagym/evo1/evo-1-5-8k-base_rnagym.py`
  - BasePrompt: `uv run -m scripts.run conf/rnagym/evo1/evo-1-5-8k-base_rnagym_baseprompt.py`

- EVO2 on RNAGym:
  - You need to first install [EVO2](https://github.com/ArcInstitute/evo2?tab=readme-ov-file#setup).
  - EVO2 7B
    - Baseline: `uv run -m scripts.run conf/rnagym/evo2/evo2-7b_rnagym.py`
    - BasePrompt: `uv run -m scripts.run conf/rnagym/evo2/evo2-7b_rnagym_baseprompt.py`
  - EVO2 7B base
    - Baseline: `uv run -m scripts.run conf/rnagym/evo2/evo2-7b-base_rnagym.py`
    - BasePrompt: `uv run -m scripts.run conf/rnagym/evo2/evo2-7b-base_rnagym_baseprompt.py`
  - EVO2 40B
      - Baseline: `uv run -m scripts.run conf/rnagym/evo2/evo2-40b_rnagym.py`
      - BasePrompt: `uv run -m scripts.run conf/rnagym/evo2/evo2-40b_rnagym_baseprompt.py`

## Anonymity (ICLR Supplementary)
- All configs are anonymized.
- Keep `use_wandb = False` during review. If using WandB post‑review, fill:
  - `entity = "ANONYMOUS"` → replace with your org/user after review
  - `wandb_proj_name = "ANON_PROJECT"` → replace with your project name after review
- Data path and local environment variables should not contain identifying info.
