Competition: Urban Sound Event Classification (UrbanSound8K)

Problem statement
Build a model that classifies short urban audio clips (<= 4 seconds) into one of 10 sound event classes:
- air_conditioner
- car_horn
- children_playing
- dog_bark
- drilling
- engine_idling
- gun_shot
- jackhammer
- siren
- street_music

Your goal is to maximize macro-averaged F1 on the hidden test set.

Data description
The competition files you will use are:
- train.csv: CSV with columns [id, label]. Each row corresponds to one training audio clip.
- test.csv: CSV with column [id]. Each row corresponds to one test audio clip whose label you must predict.
- train_audio/: Directory containing all training audio clips referenced by train.csv ids.
- test_audio/: Directory containing all test audio clips referenced by test.csv ids.
- sample_submission.csv: A correctly formatted example submission.

Important details
- Audio files are WAV with original sampling rates/channels preserved; these may vary across clips. Durations are up to ~4 seconds.
- File names in train_audio/ and test_audio/ are anonymized and do not contain label information. Use the ids in the CSV files to locate the corresponding audio files.
- The dataset is class-imbalanced; some classes have fewer clips than others. The evaluation metric compensates for this imbalance.
- The full dataset scale is preserved (thousands of clips). You are expected to design robust data pipelines; runtime is not restricted.

Submission format
Submit a CSV named submission.csv with exactly these columns and constraints:
- id: Must match exactly one id in test.csv.
- label: One of the 10 class names listed above, spelled exactly.
Every test id must appear exactly once. No extra or missing ids. The row order does not matter.

Evaluation metric
Submissions are evaluated using macro-averaged F1 across the 10 classes. For each class c, precision and recall are computed from the confusion matrix of your predictions versus the hidden ground truth; class-wise F1 is 2 * precision * recall / (precision + recall), defined as 0 when precision + recall is 0. The final score is the unweighted mean of the 10 class-wise F1 scores. This treats all classes equally and is robust to class imbalance.

Final competition files
- train.csv
- test.csv
- sample_submission.csv
- train_audio/ (directory)
- test_audio/ (directory)

Build strong audio ML pipelines and push the limits of urban sound recognition. Good luck!