Title: Quran Verse Surah Classification from Audio

Problem statement
Given an audio clip of a single Quran verse recited by the same reciter, predict which Surah (chapter, 1–114) the verse belongs to. Each clip contains only one verse. This is a challenging 114-class audio classification problem that requires careful audio processing, feature engineering, and robust modeling to handle variable verse durations and strong inter-class similarities.

Why this is interesting
- Real-world audio: Natural variability in verse length, pauses, and prosody.
- High granularity: 114 classes with imbalanced sample counts (short vs long surahs), encouraging robust methods and careful evaluation.
- Generalizable skills: Participants will practice audio preprocessing, augmentation, representation learning (e.g., log-mels, MFCCs), and multi-class classification.

Files provided
- train/
  - train.csv: Metadata with columns: id, audio_path, surah_id. Paths are relative to the dataset root. Audio files live under train/audio/ and have anonymized names.
  - audio/: MP3 audio files for training.
- test/
  - test.csv: Metadata with columns: id, audio_path. Paths are relative to the dataset root. Audio files live under test/audio/ and have anonymized names.
  - audio/: MP3 audio files for testing.
- sample_submission.csv: A template with columns: id, surah_id and randomized valid labels.

Target
- surah_id: Integer in [1, 114] indicating the Surah index of the verse.

Evaluation
- Metric: Macro-averaged F1 across all 114 Surah classes.
  - Each class receives equal weight, making the evaluation fair with respect to class imbalance and encouraging models that perform robustly on both frequent and rare Surahs.
  - The submission must contain predictions for all test ids, formatted as a CSV with headers: id,surah_id. The surah_id must be an integer in [1, 114].

Submission format
- A CSV file with columns: id,surah_id
- id must exactly match the ids in test.csv; no extra or missing rows.
- surah_id must be an integer in [1, 114].

Data details
- Audio: High-quality MP3; sample rates and bitrates are consistent.
- Duration: Verse durations vary widely.


Final data files for the competition
- train/train.csv
- train/audio/* (training audio files)
- test/test.csv
- test/audio/* (test audio files)
- sample_submission.csv

Rules and constraints
- The file names are anonymized to prevent label leakage. Do not rely on original file names.
