Amateur Radio Callsign Identification from 2-Meter FM Simplex Audio

Problem statement
Participants are tasked with building a system that identifies the transmitting station’s callsign from shortwave audio clips recorded on the 2-meter national call frequency (146.52 MHz) in Southern California. Each clip contains a single transmission event captured by an SDR receiver. The goal is multi-class audio classification: given an audio clip id, predict the most likely callsign label.

Why it’s challenging
- Real-world audio: varying SNR, intermittent speech, squelch tails, RFI, and distinct non-voice events (e.g., SWEEP, PIRATE). 
- Open-set feel with long-tail distribution: many callsigns with few examples each.
- Temporal and spectral nuance: FM voice characteristics, background noise, repeater artifacts, and operator microphones all interact.

Data description
- train.csv: id,callsign. Training labels for audio clips found in train_audio/.
- test.csv: id. Evaluation ids for audio clips found in test_audio/.
- train_audio/: WAV files for training (one per id, mono PCM).
- test_audio/: WAV files for testing (one per id, mono PCM).
- sample_submission.csv: Example of the required submission format with random valid labels.

Important notes
- Filenames in the dataset encode timestamps and frequency; however, all audio files have been renamed to opaque ids (clip_######.wav) to prevent label leakage. Do not assume any chronological or speaker identity information from the ids.
- Callsign labels include licensed operators (e.g., KN6OS, KE6TLT) and non-licensed classifications such as RFI, SWEEP, PIRATE, and WHISTLER. Treat all unique strings in callsign as class labels.
- Some classes may be very rare; robust approaches are expected to manage class imbalance.

Evaluation
- Primary metric: Macro-averaged F1 score across all callsign classes that appear in the test set. This treats each class equally, rewarding models that perform well on both head and tail classes.
- Submission format: CSV with header id,callsign. One row per id in test.csv, predicting exactly one label per id.

Files to use
- Training: train.csv and audio files in train_audio/.
- Evaluation: test.csv and audio files in test_audio/; submit predictions in id,callsign format.

Rules and clarifications
- External data and pretraining are allowed if they are publicly available and equally accessible to all participants. Make sure to cite sources and respect licenses (dataset is CC BY-NC 4.0).
- Do not attempt to infer labels from metadata outside the audio signal; ids are randomized and contain no semantic information.
- Multi-label predictions are not permitted; each clip corresponds to a single callsign label.

Reproducibility
- The provided prepare.py script performs a deterministic stratified split and ensures that every class that appears in the test set is present in the training set at least once. It also generates a sample_submission.csv following the required format.


Metric details
Macro F1 is computed by: (1) calculating precision and recall per class over the test ids, (2) computing per-class F1 = 2 * precision * recall / (precision + recall), with 0 when both precision and recall are 0, and (3) averaging F1 equally across all classes.

Submission validity checks
- Header must be id,callsign.
- The set of ids must match test.csv exactly (no missing or extra ids).
- All ids must be unique and free of path characters.
- callsign values must be non-empty strings.

Good luck and have fun building robust audio identification models that can handle realistic, noisy, and imbalanced ham radio transmissions!
