# RADAR: Robust And Data Aware Reasoning Benchmark
The **Robust And Data Aware Reasoning (RADAR)** benchmark is designed to evaluate the ability of language models to demonstrate **data-awareness**—that is, to recognize, reason over, and appropriately handle complex data artifacts such as:

- Missing data  
- Bad values  
- Outliers  
- Inconsistent formatting  
- Inconsistent multi-column logic  

The full dataset includes **53 tasks** grounded in real-world data tables and varies across data artifact types and table dimensions (by token count and number of columns). In total, RADAR provides **2,980 unique query-table task instances**.

In addition each task comes with a suite of expert-written perturbation functions.
One example is shown in `radar/tasks/funcs/influenza_like_illness.py`

## Getting Started

### Script installation
   ```bash
   source install.sh
   ```
### Step by step installation
1. Create and activate the Conda environment:

   ```
   conda env create -f environment.yaml
   conda activate radar
   ```

2. For development, we use [poetry](https://python-poetry.org/docs/) on top of conda.
   After installing poetry and activating the conda environment, run:

   ```
   poetry install
   ```

   Every time you need to install new packages, run `poetry add <package>`. This calls `pip install` and will automatically update the `pyproject.toml` file.
   It will also create a `poetry.lock` file, which you should commit to the repo.


### Checking you can run a LLM endpoint
```bash
python radar/run_llm.py
```

### Download RADAR
Put RADAR and unzip into radar-benchmark. Set the environment variable `DATASET_FOLDER` to the path of radar-benchmark (or directly edit the `.env` file)

## Running Baselines
* `direct_prompting.ipynb` shows how to run the baseline (you just need to define your own LLM call function)
* `code_agent.ipynb` shows how to run the code agent baseline (you just need to define your own LLM call function)
