Competition: Music Emotion Prediction (Valence & Arousal)

Problem statement
Participants are challenged to build models that predict the emotional content of music tracks in terms of continuous valence and arousal scores (scale 1–9). Given an audio clip and pre-computed acoustic features, your task is to predict the mean valence and mean arousal of each track. This is a multi-target regression problem grounded in affective computing for music.

What you are given
- train.csv: Metadata for the training set with the following columns:
  - song_id: unique identifier for a track (numeric)
  - audio_file: filename of the corresponding MP3 audio in the train_audio folder
  - feature_file: filename of the corresponding feature CSV in the train_features folder (openSMILE-style aggregated features)
  - valence_mean: ground-truth mean valence in [1, 9]
  - arousal_mean: ground-truth mean arousal in [1, 9]
- test.csv: Metadata for the test set with the following columns:
  - song_id
  - audio_file: filename of the corresponding MP3 audio in the test_audio folder
  - feature_file: filename of the corresponding feature CSV in the test_features folder
- train_audio/: MP3 audio files for training
- test_audio/: MP3 audio files for testing
- train_features/: Per-track feature CSV files for training
- test_features/: Per-track feature CSV files for testing
- sample_submission.csv: A valid example submission file

Target
For each row in test.csv, predict two continuous targets:
- valence_mean
- arousal_mean
Both are continuous values in [1, 9].

Submission format
Submit a CSV file with exactly these columns and no header changes:
- song_id: integer identifiers matching those in test.csv
- valence_mean: your predicted mean valence in [1, 9]
- arousal_mean: your predicted mean arousal in [1, 9]
Do not reorder or omit any test rows; every song_id in test.csv must appear exactly once.

Evaluation
The leaderboard score is the average Concordance Correlation Coefficient (CCC) between your predictions and the ground truth for the two targets (valence_mean and arousal_mean):
- Compute CCC(valence)
- Compute CCC(arousal)
- Final score = (CCC(valence) + CCC(arousal)) / 2
CCC measures both accuracy and precision by combining correlation and bias terms. Scores range from -1 to 1, with 1 indicating perfect agreement. During evaluation, predictions are clipped to [1, 9] and NaNs/Infs are handled conservatively.

Data split and integrity
- The split preserves the joint distribution of valence and arousal via stratification in a 2D-binned space.
- File names have been anonymized to avoid label leakage. Do not attempt to infer labels from filenames.
- All audio and feature files are included and aligned with the CSVs; there is a one-to-one correspondence between audio_file and feature_file for each song_id.

Files you will find after running prepare.py (already done for this competition):
- train.csv, test.csv
- train_audio/, test_audio/
- train_features/, test_features/
- sample_submission.csv

Rules/pitfalls
- Values must be numeric and finite. Out-of-range values are clipped to [1, 9] for scoring but may harm your CCC.
- Check your submission for missing or extra song_id values; any mismatch will be rejected.


Good luck and have fun pushing the state of music emotion recognition!
