This script processes subject data from the OAI (Osteoarthritis Initiative) dataset, extracting clinical treatments,
clinical information, physical activity data, x-ray image paths, Kellgren-Lawrence (KL) Grades from radiographic assessments, medication usage data, and (optionally)
clinical outcomes for various time points (months). The resulting data is compiled into a CSV file named `dataset.csv`.

### CSV Structure

The output CSV file (`dataset.csv`) will have the following structure:

1. **src_subject_id**: The unique identifier for each subject.
2. **sex**: The subject's sex.
3. **ageyears**: The subject's age in years.
4. **cohort**: The cohort classification for the subject (`Progression`, `Incidence`, or `Control`).

For each time point (month), the following columns are added (where `<month>` represents the month code, e.g., `00`, `12`, `24`, etc.):

- **\<month\>_xray**: Path to the first `.jpg` file found in the subject's directory for the given month.
- **\<month\>_KLGrade**: The Kellgren-Lawrence Grade for the given month, indicating the severity of osteoarthritis as observed in x-ray images. This is extracted for specific months where x-ray data is available.

For each clinical treatment type, the following columns are added:

- **\<month\>_\<treatment_name\>**: Indicates if a treatment occurred within the past 12 months for the given time point.
    - `1`: The treatment was recorded as occurring within the past 12 months.
    - `0`: The treatment was not recorded as occurring within the past 12 months.
    - `-1`: Missing or unrecorded data for the treatment.

For clinical information and physical activity data, the following columns are added:

- **\<month\>_BMI**: The Body Mass Index of the subject at the given time point.
- **\<month\>_L_WOMAC_Disability**: Left knee WOMAC Disability score.
- **\<month\>_R_WOMAC_Disability**: Right knee WOMAC Disability score.
- **\<month\>_L_WOMAC_Pain**: Left knee WOMAC Pain score.
- **\<month\>_R_WOMAC_Pain**: Right knee WOMAC Pain score.
- **\<month\>_L_WOMAC_Stiffness**: Left knee WOMAC Stiffness score.
- **\<month\>_R_WOMAC_Stiffness**: Right knee WOMAC Stiffness score.
- **\<month\>_L_WOMAC_Total**: Left knee WOMAC Total score.
- **\<month\>_R_WOMAC_Total**: Right knee WOMAC Total score.
- **\<month\>_KOOS**: Knee injury and Osteoarthritis Outcome Score.

Physical activity data from the PASE (Physical Activity Scale for the Elderly):

- **\<month\>_PASE1** to **\<month\>_PASE6**: Reflect different types of physical activities.
- **\<month\>_PASE1HR** to **\<month\>_PASE6HR**: Corresponding number of hours per day for each PASE activity.

For medication usage data from the Medication Inventory Form (MIF), the following columns are added:

- **\<month\>_LIDOCAINE**: Usage frequency of Lidocaine.
- **\<month\>_VOLTAREN**: Usage frequency of Diclofenac Sodium (Voltaren).
    - Values correspond to the frequency code from the MIF dataset.
    - `-1`: Medication was not used or data is missing.

Optionally, clinical outcomes related to joint replacements are included if the `OUTCOMES` variable is set to `True`:

- **\<month\>_L_KneeReplacement**: Indicates if a left knee replacement occurred at or before the given time point.
- **\<month\>_R_KneeReplacement**: Indicates if a right knee replacement occurred at or before the given time point.
- **\<month\>_L_HipReplacement**: Indicates if a left hip replacement occurred at or before the given time point.
- **\<month\>_R_HipReplacement**: Indicates if a right hip replacement occurred at or before the given time point.
    - `1`: The replacement occurred.
    - `0`: The replacement did not occur.

### Time Points (Months)

The time points are based on the study visits and are mapped as follows:

- **00**: Baseline
- **12**: 12-month follow-up
- **18**: 18-month follow-up
- **24**: 24-month follow-up
- **30**: 30-month follow-up
- **36**: 36-month follow-up
- **48**: 48-month follow-up
- **60**: 60-month follow-up
- **72**: 72-month follow-up
- **84**: 84-month follow-up
- **96**: 96-month follow-up
- **108**: 108-month follow-up

### X-Ray Features

For specific time points, x-ray features are extracted:

- **KLGrade**: The Kellgren-Lawrence Grade (KL Grade) is a commonly used system to classify the severity of osteoarthritis using x-ray images. Grades range from 0 (no OA) to 4 (severe OA). The script extracts KL Grades for both left and right knees where available.

KL Grades are extracted for the following months (based on `MONTH_ORDER`):

- **00**: Baseline
- **12**: 12 months
- **24**: 24 months
- **36**: 36 months
- **48**: 48 months
- **72**: 72 months
- **96**: 96 months

### Treatments Tracked

The following treatments are tracked for each applicable time point (months not in `["00", "18", "30", "60", "84", "108"]`):

- **L_Arthroscopy**: Left arthroscopy.
- **R_Arthroscopy**: Right arthroscopy.
- **L_Meniscectomy**: Left meniscectomy.
- **R_Meniscectomy**: Right meniscectomy.
- **L_Hyl_Injection**: Left hyaluronic acid injection.
- **R_Hyl_Injection**: Right hyaluronic acid injection.
- **L_Steroid_Injection**: Left steroid injection.
- **R_Steroid_Injection**: Right steroid injection.
- **NSAIDS**: Non-steroidal anti-inflammatory drugs usage.
- **NSAIDRX**: Prescription NSAIDs usage.

### Clinical Information

For each applicable time point, the following clinical information is extracted:

- **\<month\>_BMI**: Body Mass Index at the given time point.
- **\<month\>_L_WOMAC_Disability**: Left knee WOMAC Disability score.
- **\<month\>_R_WOMAC_Disability**: Right knee WOMAC Disability score.
- **\<month\>_L_WOMAC_Pain**: Left knee WOMAC Pain score.
- **\<month\>_R_WOMAC_Pain**: Right knee WOMAC Pain score.
- **\<month\>_L_WOMAC_Stiffness**: Left knee WOMAC Stiffness score.
- **\<month\>_R_WOMAC_Stiffness**: Right knee WOMAC Stiffness score.
- **\<month\>_L_WOMAC_Total**: Left knee WOMAC Total score.
- **\<month\>_R_WOMAC_Total**: Right knee WOMAC Total score.
- **\<month\>_KOOS**: Knee injury and Osteoarthritis Outcome Score.

Physical activity data from the PASE (Physical Activity Scale for the Elderly):

- **\<month\>_PASE1** to **\<month\>_PASE6**: Different types of physical activities.
- **\<month\>_PASE1HR** to **\<month\>_PASE6HR**: Number of hours per day spent on each PASE activity.

### Medication Usage (MIF Data)

Medication usage is extracted from the Medication Inventory Form (MIF) for specific medications:

- **\<month\>_LIDOCAINE**: Usage frequency of Lidocaine.
- **\<month\>_VOLTAREN**: Usage frequency of Diclofenac Sodium (Voltaren).

Medications are tracked for months where `MONTH_ORDER` is not in `["07", "09", "11"]` (i.e., months not corresponding to 60, 84, or 108 months).

### Clinical Outcomes (Optional)

Clinical outcomes from `Outcomes99.txt` are included in the dataset as additional columns if the `OUTCOMES` variable is set to `True`. These outcomes include:

- **\<month\>_L_KneeReplacement**
- **\<month\>_R_KneeReplacement**
- **\<month\>_L_HipReplacement**
- **\<month\>_R_HipReplacement**

These are mapped to the corresponding months based on the visit month after the replacement occurred.

### Example Columns

For a subject with ID `9001` who has data recorded for the 24-month and 36-month time points, the CSV might include:

| src_subject_id | sex | ageyears | cohort     | 24_xray               | 24_KLGrade | 24_L_Arthroscopy | 24_BMI | 24_PASE1 | ... | 36_xray               | 36_KLGrade | 36_L_Meniscectomy | 36_BMI | 36_PASE1 | ... | 24_LIDOCAINE | 36_VOLTAREN | ... |
|----------------|-----|----------|------------|-----------------------|------------|------------------|--------|----------|-----|-----------------------|------------|-------------------|--------|----------|-----|--------------|--------------|-----|
| 9001           | M   | 65       | Progression | path/to/24m_xray.jpg | 2          | 0                | 27.5   | 10       | ... | path/to/36m_xray.jpg | 3          | 1                 | 27.8   | 12       | ... | 1            | 2            | ... |

### Notes

- The script processes the data for each subject across multiple time points, extracting relevant clinical, imaging, and medication data.
- Missing or unrecorded data is indicated by `-1` or may not be included in the dataset for that time point.
- The `OUTCOMES` variable controls whether clinical outcomes from `Outcomes99.txt` are included.
- The script assumes that the directory structure and file naming conventions are consistent across the dataset.
- KL Grades are extracted for specific months where x-ray data is available.
- Clinical information such as WOMAC scores and KOOS provide insights into the subjects' joint function and quality of life.
- Physical activity data from PASE and BMI provide insights into the subjects' lifestyle and health status, which may be relevant in osteoarthritis progression.
- Medication usage data from the MIF provides information on specific medications that may influence the condition or its treatment.
- The x-ray image paths included in the dataset can be used for image analysis or linking with imaging data.
- The script is designed to be flexible, allowing for adjustments in variables such as `OUTCOMES` to include or exclude certain data as needed.