
# Fair TTE Prediction: Medical Imaging Datasets and Codebase Usage

This repository provides detailed documentation for the codebase and access information for three medical imaging datasets used to benchmark fairness in time-to-event (TTE) prediction: **AREDS**, **MIMIC-CXR**, and **ADNI**.

---

## 📊 Dataset Overview

| Dataset     | Task                  | Modality       | Size     | Censoring Rate | Mean TTE         |
|-------------|-----------------------|----------------|----------|----------------|------------------|
| AREDS       | Late AMD              | Retinal Fundus | 129,708  | 83.9%          | 4.4 years        |
| MIMIC-CXR   | In-hospital Mortality | Chest X-ray    | 269,360  | 61.7%          | 488.6 days       |
| ADNI        | Alzheimer’s Disease   | Brain MRI      | 2,227    | 63.2%          | 35.9 months      |

Subgroup statistics based on sensitive attributes (age, sex, race) are provided in the full paper.

---

## 📁 Dataset Descriptions

### AREDS

- **Description:** Clinical trial dataset for studying age-related macular degeneration (AMD) via retinal fundus photography.
- **Modality:** Color fundus images (left and right eyes).
- **TTE Outcome:** Time to diagnosis of late AMD from image acquisition.
- **Size:** 129,708 images.
- **Access:** [AREDS on dbGaP](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000001.v3.p1) (controlled access).

### MIMIC-CXR

- **Description:** Chest X-ray dataset linked to clinical events including in-hospital mortality.
- **Modality:** Chest X-ray (JPG format).
- **TTE Outcome:** Time from X-ray study to recorded in-hospital mortality or 1-year censoring.
- **Size:** 269,360 images.
- **Access:**
  - [MIMIC-IV](https://physionet.org/content/mimiciv/3.1/)
  - [MIMIC-CXR-JPG](https://physionet.org/content/mimic-cxr-jpg/2.1.0/)

### ADNI

- **Description:** Longitudinal neuroimaging dataset for Alzheimer’s disease research.
- **Modality:** Brain MRI scans.
- **TTE Outcome:** Time from MRI scan to Alzheimer’s diagnosis or last follow-up.
- **Size:** 2,227 scans.
- **Access:** [ADNI at LONI](https://adni.loni.usc.edu) (requires registration and Data Use Agreement).

---

## ⚖️ Fairness Attributes

Each dataset supports subgroup analysis using one or more of the following sensitive attributes:
- **Age**
- **Sex**
- **Race** (AREDS, MIMIC-CXR only)

---

## 📌 Notes

- Images from the final visits used to define TTE labels are excluded to prevent label leakage.
- Details on image selection, preprocessing, and TTE label construction are provided in the associated paper and appendix.

---

## 📜 License

Please refer to the original dataset licenses for terms of use.

---

For more details, see the paper and `Appendix C: Dataset Details`.


---

## 🛠️ Installation

To set up the environment for running the benchmark and reproducing experiments:

```bash
conda env create -f environment.yml
conda activate fairtte
```

---

## 🚀 Usage

To run experiments using the benchmark code:

```bash
python script/train.py \
  --surv_model DeepHit \
  --fair_model GroupDRO \
  --dataset mimiccxr \
  --sensitive_attribute sex \
  --metric ctd \
  --pretrained \
  --gpu 0 \
  --hparams_seed 0 \
  --seed 0 \
  --num_workers 8 \
  --shift x \
  --group_shift 0

```

Available options:
| Argument                | Description                             | Choices                                                                                 |
| ----------------------- | --------------------------------------- | --------------------------------------------------------------------------------------- |
| `--surv_model`          | TTE model to use                        | `DeepHit`, `NnetSurv`, `PMF`                                                            |
| `--fair_model`          | Fairness algorithm to apply             | `None`, `Regularization`, `GroupDRO`, `DomainInd`, `Reweighting`, `DomainIndAggregated` |
| `--dataset`             | Dataset to use                          | `mimiccxr`, `areds`, `adni`                                                             |
| `--sensitive_attribute` | Group attribute for fairness evaluation | `sex`, `age`, `race`                                                                    |
| `--metric`              | Evaluation metric                       | `ctd`, `brier`, `auc`                                                     |
| `--pretrained`          | Whether to use a pretrained image model | (flag, no value needed)                                                                 |
| `--gpu`                 | GPU ID to use                           | e.g., `0`                                                                               |
| `--hparams_seed`        | Seed for hyperparameter selection       | Any integer                                                                             |
| `--seed`                | Random seed for reproducibility         | Any integer                                                                             |
| `--num_workers`         | Number of workers for data loading      | e.g., `8`                                                                               |
| `--shift`               | Distribution shift type                 | `None`, `x`, `y`, `d`                                                                   |
| `--group_shift`         | Specific group for shift simulation     | `None`, `0`, `1`                                                                        |

---

## 📬 Contact

For questions or contributions, please open an issue on this GitHub repo.
