Competition: Bird Species Audio Identification

Problem statement
Participants must build a model that identifies the bird species from short audio recordings. You are provided with a training set of labeled audio clips and a test set of unlabeled audio clips. Your task is to predict the species label for each test clip.

Deliverables
- Submit a CSV file with two columns: id,label containing your predicted label for each test audio id.

Data description
- train_audio/: folder of training audio clips in MP3 format. Filenames are anonymized as <id>.mp3 to prevent label leakage.
- test_audio/: folder of test audio clips in MP3 format. Filenames are anonymized as <id>.mp3 to prevent label leakage.
- train.csv: mapping from audio id to species label. Columns: id,label.
- test.csv: list of test audio ids. Column: id.
- labels.csv: list of all valid labels (species names). Column: label.
- sample_submission.csv: a sample submission with random valid labels. Columns: id,label.

Important notes
- Audio durations and recording conditions vary; robust preprocessing is required (e.g., resampling, trimming/splitting, noise reduction).
- Species are imbalanced; macro-averaged metrics reward performance across all classes. Consider stratified training, class-balanced sampling, and data augmentation.
- Do not rely on any file names for labels; only use the provided CSV files.

Evaluation
- Metric: Macro-averaged F1 score over the species present in the ground truth. This treats each species equally, regardless of frequency, encouraging strong performance on rare classes.
- Submission format: a CSV with header id,label and exactly one row per test id. Labels must be drawn from labels.csv.


Final files included in this competition
- train_audio/
- test_audio/
- train.csv
- test.csv
- labels.csv
- sample_submission.csv

