# Anonymous Code Submission

Code for "Instance-Level Costs for Nuanced Classifier Evaluation"

## Structure

```
src/
├── core/           # Core utilities
│   ├── metrics.py  # Evaluation metrics (accuracy, NEC, weighted_accuracy)
│   ├── seed.py     # Reproducibility utilities
│   ├── weights.py  # Sample weighting strategies
│   └── logging.py  # Wandb logging wrapper
├── data/           # Data loading and preprocessing
│   ├── __init__.py # Dataset dispatcher
│   ├── jigsaw.py   # Jigsaw toxicity loader
│   └── preprocess_*.py  # Preprocessing scripts for each dataset
├── models/         # Model implementations
│   ├── tfidf.py    # TF-IDF + logistic regression
│   ├── text_embed.py   # Transformer embeddings + classifier
│   ├── image_embed.py  # ResNet embeddings + classifier
│   ├── tabular.py  # HistGradientBoosting for tabular data
│   └── *_finetune.py   # End-to-end fine-tuning models
├── tasks/          # Task runners
│   ├── classify.py     # Classification experiments
│   └── delta_regress.py  # Delta regression experiments
└── runners/        # CLI entry points
    └── run_experiment.py  # Main experiment runner

scripts/            # Experiment scripts and result summarization
configs/            # Hydra configuration files
tests/              # Unit tests
```

## Requirements

Key dependencies (see `requirements.txt` for full list):
- Python 3.9+
- PyTorch >= 2.0
- Transformers >= 4.40
- scikit-learn >= 1.6
- pandas >= 2.2
- numpy >= 2.0

## Installation

```bash
# Create virtual environment
python -m venv venv && source venv/bin/activate

# Install dependencies
pip install -r requirements.txt
```

## Data Preparation

### Jigsaw Toxic Comment Classification
1. Download `train.csv` from [Kaggle](https://www.kaggle.com/datasets/julian3833/jigsaw-toxic-comment-classification-challenge)
2. Place in `data/jigsaw/train.csv`
3. Run: `python -m src.data.preprocess_jigsaw`

### Turkey Injury Classification
1. Download from [Zenodo](https://zenodo.org/records/8115942)
2. Extract to `data/turkey/`
3. Run: `python -m src.data.preprocess_turkey`

### NHANES Hypertension
1. Download XPT files from [CDC NHANES 2013-2014](https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/overview.aspx?BeginYear=2013):
   - DEMO_H.xpt, BPX_H.xpt, BMX_H.xpt
2. Place in `data/nhanes/raw/`
3. Run: `python -m src.data.preprocess_nhanes`

### iNaturalist Wild/Cultivated
1. Download images from iNaturalist API
2. Run Gemini labeling: `python scripts/label_inaturalist_gemini.py`
3. Run: `python -m src.data.preprocess_inaturalist`

## Usage

### Running a Single Experiment

```bash
# Classification with TF-IDF on Jigsaw
python -m src.runners.run_experiment \
    --dataset jigsaw \
    --model tfidf \
    --method classification \
    --weighting absdelta \
    --seed 42

# Fine-tuning RoBERTa on Jigsaw
python -m src.runners.run_experiment \
    --dataset jigsaw \
    --model roberta_finetune \
    --method classification \
    --weighting none \
    --seed 42

# Image classification on Turkey
python -m src.runners.run_experiment \
    --dataset turkey \
    --model resnet50 \
    --method classification \
    --weighting absdelta \
    --seed 42
```

### Weighting Strategies

| Strategy | Description |
|----------|-------------|
| `none` | Uniform weights (baseline) |
| `absdelta` | Weight by \|delta\| (distance from boundary) |
| `alpha_balanced` | Class-balanced + \|delta\| weighting |

### Reproducing Main Results

```bash
# Run all P1 experiments (baseline comparison)
bash scripts/run_p1_experiments.sh

# Summarize results
python scripts/summarize_p1_results.py

# Generate paper figures
python scripts/generate_paper_plots.py
```

## Models

| Model | Type | Description |
|-------|------|-------------|
| `tfidf` | Text | TF-IDF + logistic regression |
| `roberta` | Text | RoBERTa embeddings + logistic regression |
| `roberta_finetune` | Text | End-to-end RoBERTa fine-tuning |
| `resnet50` | Image | ResNet-50 embeddings + logistic regression |
| `resnet_finetune` | Image | End-to-end ResNet fine-tuning |
| `histgbm` | Tabular | HistGradientBoostingClassifier |

## Testing

```bash
pytest tests/ -v
```

## License

Anonymous submission - license to be added upon acceptance.
