# Fair TTE Prediction: Medical Imaging Datasets

This repository provides detailed documentation and access information for three medical imaging datasets used to benchmark fairness in time-to-event (TTE) prediction: **AREDS**, **MIMIC-CXR**, and **ADNI**.

---

## 📊 Dataset Overview

| Dataset     | Task                  | Modality       | Size     | Censoring Rate | Mean TTE         |
|-------------|-----------------------|----------------|----------|----------------|------------------|
| AREDS       | Late AMD              | Retinal Fundus | 129,708  | 83.9%          | 4.4 years        |
| MIMIC-CXR   | In-hospital Mortality | Chest X-ray    | 269,360  | 61.7%          | 488.6 days       |
| ADNI        | Alzheimer’s Disease   | Brain MRI      | 2,227    | 63.2%          | 35.9 months      |

Subgroup statistics based on sensitive attributes (age, sex, race) are provided in the full paper.

---

## 📁 Dataset Descriptions

### AREDS

- **Description:** Clinical trial dataset for studying age-related macular degeneration (AMD) via retinal fundus photography.
- **Modality:** Color fundus images (left and right eyes).
- **TTE Outcome:** Time to diagnosis of late AMD from image acquisition.
- **Size:** 129,708 images.
- **Access:** [AREDS on dbGaP](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000001.v3.p1) (controlled access).

### MIMIC-CXR

- **Description:** Chest X-ray dataset linked to clinical events including in-hospital mortality.
- **Modality:** Chest X-ray (JPG format).
- **TTE Outcome:** Time from X-ray study to recorded in-hospital mortality or 1-year censoring.
- **Size:** 269,360 images.
- **Access:**
  - [MIMIC-IV](https://physionet.org/content/mimiciv/3.1/)
  - [MIMIC-CXR-JPG](https://physionet.org/content/mimic-cxr-jpg/2.1.0/)

### ADNI

- **Description:** Longitudinal neuroimaging dataset for Alzheimer’s disease research.
- **Modality:** Brain MRI scans.
- **TTE Outcome:** Time from MRI scan to Alzheimer’s diagnosis or last follow-up.
- **Size:** 2,227 scans.
- **Access:** [ADNI at LONI](https://adni.loni.usc.edu) (requires registration and Data Use Agreement).

---

## ⚖️ Fairness Attributes

Each dataset supports subgroup analysis using one or more of the following sensitive attributes:
- **Age**
- **Sex**
- **Race** (AREDS, MIMIC-CXR only)

---

## 📌 Notes

- Images from the final visits used to define TTE labels are excluded to prevent label leakage.
- Details on image selection, preprocessing, and TTE label construction are provided in the associated paper and appendix.

---

## 📜 License

Please refer to the original dataset licenses for terms of use.

---

For more details, see the paper and `Appendix C: Dataset Details`.


## 📬 Contact

For questions or contributions, please open an issue on this GitHub repo.
