# CXCL13 as a Prognostic Biomarker in Muscle-Invasive Bladder Cancer (TCGA-BLCA)

Reproducible pipeline to (1) preprocess TCGA PanCancer Atlas BLCA data with Python and (2) run survival analyses with R.
Outputs include Kaplan–Meier (OS/PFS) curves, multivariable Cox models, and forest plots.

## 📂 Repository Layout (current)

```
.
├── CXCL13_expression.R
├── CXCL13_highrisk_samples.csv          # generated by R script
├── CXCL13_lowrisk_samples.csv           # generated by R script
├── km_os_CXCL13_annot.png               # generated by R script
├── km_os_CXCL13_risktable.png           # generated by R script
├── km_os_CXCL13_combined.png            # generated by R script
├── km_dfs_CXCL13_annot.png              # generated by R script
├── km_dfs_CXCL13_risktable.png          # generated by R script
├── km_dfs_CXCL13_combined.png           # generated by R script
├── Rplots.pdf                           # auto-created by R (can be ignored)
├── data/
│   ├── raw/                             # put cBioPortal downloads here
│   │   ├── data_mrna_seq_v2_rsem.txt
│   │   ├── data_clinical_patient.txt
│   │   └── data_clinical_sample.txt
│   └── processed/                       # made by Python scripts
│       └── tcga_pancanceraltas_with_CXCL13subgroups.csv
├── plots/                               # forest plots + duplicate KMs (png)
│   ├── forestplot_os_multivariable.png
│   ├── forestplot_pfs_multivariable.png
│   ├── km_os_CXCL13_annot.png
│   └── km_dfs_CXCL13_annot.png
├── py/
│   ├── generic_gener.py                 # builds CXCL13 labels (enriched/not)
│   └── cleanerCLINSHEET_subgroup.py     # merges labels + clinical
├── requirements.txt
└── README.md
```

## 🔧 Requirements

### Python (≥ 3.10)

Create a venv and install deps:

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

### R (≥ 4.1)

Install packages:

```r
install.packages(c("survival","survminer","ggplot2","ggpubr"))
```

(Recommended on Ubuntu for faster binaries)

```r
options(repos = c(RSPM = "https://packagemanager.posit.co/cran/__linux__/jammy/latest"))
```

## 🚀 How to Run

> Run all commands from the **repo root**.

### 1) Python preprocessing

Place the three cBioPortal files in `data/raw/`:

* `data_mrna_seq_v2_rsem.txt`
* `data_clinical_patient.txt`
* `data_clinical_sample.txt`

Run the scripts:

```bash
# Step 1: generate CXCL13 labels (75th percentile split)
python py/generic_gener.py

# Step 2: merge clinical + labels
python py/cleanerCLINSHEET_subgroup.py
```

This creates:

```
data/processed/blca_tcga_CXCL13_enrichment.csv
data/processed/tcga_pancanceraltas_with_CXCL13subgroups.csv   ← consumed by R
```

### 2) R survival analysis

```bash
Rscript CXCL13_expression.R
```

This writes **figures and tables to the repo root** (as you see above) and **forest plots** to `plots/`.

## 📊 Outputs (where to look)

* **KM curves & risk tables (root):**
  `km_os_CXCL13_annot.png`, `km_dfs_CXCL13_annot.png`,
  `km_os_CXCL13_risktable.png`, `km_dfs_CXCL13_risktable.png`,
  `km_*_combined.png`
* **Forest plots (plots/):**
  `plots/forestplot_os_multivariable.png`,
  `plots/forestplot_pfs_multivariable.png`
* **Sample lists (root):**
  `CXCL13_highrisk_samples.csv`, `CXCL13_lowrisk_samples.csv`

> Note: `Rplots.pdf` is an R default graphics device artifact; it’s safe to ignore or delete.


## 📄 License & Data

* Code: MIT
* Data: Downloaded by user from [cBioPortal](https://www.cbioportal.org/) (not redistributed here).

## ✉️ Citation

> “CXCL13 as a Prognostic Biomarker of Survival Outcomes in Muscle-Invasive Bladder Cancer (TCGA-BLCA).”


