Ovarian Cancer Subtype Classification Challenge

Problem statement
You are given hematoxylin and eosin (H&E) histopathology image patches of ovarian carcinoma spanning five common subtypes:
- HGSC: High-Grade Serous Carcinoma
- CC: Clear-Cell Ovarian Carcinoma
- EC: Endometrioid
- LGSC: Low-Grade Serous
- MC: Mucinous Carcinoma

The task is to build a model that predicts the subtype label for each image in the test set. Robust solutions will typically combine careful data processing, augmentation, feature engineering, model training, and ensembling.

Files provided
- train.csv: CSV with two columns: id, label. Each row corresponds to one training image and its ground-truth label.
- train_images/: Folder containing all training images. The id column in train.csv exactly matches a filename inside this folder.
- test_data.csv: CSV with one column: id. Each row corresponds to one test image that requires a prediction.
- test_images/: Folder containing all test images. The id column in test_data.csv exactly matches a filename inside this folder.
- sample_submission.csv: Example of a valid submission file with randomly assigned labels. Use it as a template for the required format.

Important notes
- Filenames are randomized and do not reveal class information to avoid label leakage.
- Images vary in appearance and may exhibit domain and staining differences. Expect class imbalance.
- No external data is required, but you may use any standard ML/DL techniques.

Submission format
- A CSV named submission.csv with exactly two columns (in this order): id, label
- The id values must match those in test_data.csv exactly and appear exactly once.
- The label must be one of: {CC, EC, HGSC, LGSC, MC}.

Evaluation metric
- Macro-averaged F1 score over the classes present in the test set.
  • For each class c present in the test set, F1_c = 2 · (precision_c · recall_c) / (precision_c + recall_c), with 0 used for any undefined precision/recall.
  • The final score is the unweighted mean of F1_c across the classes present in the test set.
- This choice balances performance across classes and mitigates class imbalance.


Data files for this competition
- train.csv
- train_images/
- test_data.csv
- test_images/
- sample_submission.csv

By submitting, you agree that your predictions are generated without accessing any hidden labels. The organizers will score your submission using the macro F1 calculation described above.