Competition: Dysarthric and Control Speech Keyword Recognition (Noise-Reduced UASPEECH)

Problem statement
Participants are tasked with building a robust keyword recognition system that predicts the spoken keyword for each audio clip from noise-reduced recordings of both dysarthric and control speakers. Each .wav file contains a single utterance. Your goal is to predict the lexical token (e.g., C10, CW37, D0, LA) for every test audio file.

Why this is challenging and meaningful
- Data spans multiple speakers with diverse articulation patterns (dysarthric and control), microphone channels, and recording blocks/sessions, with a large vocabulary (digits and up to 100+ common words plus additional tokens such as D0, LA). Models must generalize across speakers and sessions while being robust to residual acoustic artifacts.

Files provided
- train.csv: training metadata with two columns: id,label. The id is the audio filename (no path), and label is the target class token (e.g., C10, CW37, D0, LA).
- test.csv: test metadata with one column: id. The id is the audio filename (no path).
- sample_submission.csv: a sample file showing the expected submission format.
- train_audio/: folder containing the training .wav files referenced by train.csv.
- test_audio/: folder containing the test .wav files referenced by test.csv.

Target definition
- label is the lexical token embedded in the original filenames, defined as the token that appears between the session marker and microphone marker (e.g., in F03_B1_CW37_M5.wav the label is CW37). All datasets are curated so that the same label set used in training also appears in testing, enabling supervised learning without unseen class leakage.

Evaluation metric
- Submissions must be a CSV with columns id,label and exactly the same set of ids as in test.csv (order is irrelevant). The competition metric is macro-averaged F1 score over the set of true classes present in the test set. This equally rewards performance across common and rare keywords and is robust to class imbalance.
- Ties are broken by micro-averaged F1.

Submission format
- CSV with header: id,label
- id: the filename of the test audio (e.g., audio_012345.wav)
- label: a predicted class token from the training label set (e.g., C10, CW37, D0, LA)

Data notes
- Files have been renamed to neutral ids to prevent label leakage; only the CSVs map ids to labels for training.
- Audio is in .wav format. You are free to downsample or preprocess as desired. Non-speech segments or channel inconsistencies may exist; robust preprocessing is recommended.

Final data files
- train.csv, test.csv, sample_submission.csv
- train_audio/, test_audio/

Scoring details
- Primary metric: macro F1 over true classes in the test set.
- If two submissions have identical macro F1, the one with higher micro F1 ranks higher.

Good luck! Build a model that can robustly recognize keywords across challenging dysarthric and control speech.