Competition: LAV-DF Multi-Modal Deepfake Manipulation Detection

Problem statement
You are given short videos where audio and/or visuals may have been locally manipulated. Your task is to build a model that, for each video, predicts two probabilities:
- p_video_fake: probability that the visual stream contains any localized manipulation
- p_audio_fake: probability that the audio stream contains any localized manipulation

Unlike standard deepfake detection, manipulations here may be brief and localized. Strong solutions typically exploit both modalities and temporal structure: e.g., frame-level facial artifacts, lip-sync inconsistencies, cross-modal alignment, spectral anomalies, and transient artifacts.

Data description
- train_videos/: MP4 files for training.
- test_videos/: MP4 files for testing (no labels provided).
- train.csv: one row per training video with columns:
  - video_id: the file name within train_videos/ (e.g., vid_000123.mp4)
  - label_video_fake: 1 if visuals were manipulated, else 0
  - label_audio_fake: 1 if audio was manipulated, else 0
- sample_submission.csv: submission template with columns [video_id, p_video_fake, p_audio_fake]

Notes
- Filenames are randomized and do not reveal labels or grouping.
- Videos originate from groups in which one “original” clip may have multiple manipulated derivatives. Group integrity is enforced across train/test, preventing leakage.
- Class balance is not uniform; expect both positive and negative examples for each label in the train and test sets.

Submission format
Submit a CSV with the following columns:
- video_id: the file name in test_videos/ (e.g., vid_000987.mp4)
- p_video_fake: predicted probability in [0, 1]
- p_audio_fake: predicted probability in [0, 1]

Evaluation
Mean Average Precision (mAP) across the two labels is used:
- For each label (video_fake and audio_fake), we compute Average Precision (AP) from the submitted probabilities and ground-truth binary labels.
- The final score is the mean of the two APs.
Implementation details:
- Predictions are clipped to [0, 1]. NaN/Inf are treated as invalid and will cause the submission to fail validation.
- If a label is degenerate in the test set (no positives or no negatives), the competition organizers guarantee the split avoids such cases; participants do not need to handle this.


Files to train with
- train_videos/
- train.csv

Files to predict on
- test_videos/
- sample_submission.csv (for structure only)

Metric sanity checks
- Probabilities must be within [0, 1].
- video_id keys must exactly match the test_videos/ filenames.

Good luck! This challenge is designed to reward robust, multimodal modeling and careful handling of localized manipulations.