Title: Kinetics-Action: Video Action Recognition (Subset)

Problem statement
Build a model that recognizes human actions in short video clips. Each video belongs to exactly one class (single-label multi-class classification). You are provided with a curated subset of the Kinetics dataset, reorganized for this competition.

Data description
- train_videos/: Folder of training videos with anonymized filenames.
- test_videos/: Folder of test videos with anonymized filenames.
- train.csv: Training metadata with columns:
  - video_id: Unique ID for the video (matches the filename without extension).
  - filepath: Relative path to the video file under train_videos/.
  - label: Ground-truth action class name (string), one of the classes in this subset.
- test.csv: Test metadata with columns:
  - video_id: Unique ID for the video (matches the filename without extension).
  - filepath: Relative path to the video file under test_videos/.
- sample_submission.csv: Example submission file with columns:
  - video_id: As in test.csv.
  - label: Your predicted class for each video_id.

Notes
- Filenames and paths are anonymized to avoid any label leakage; rely on the CSVs for labels and file locations.
- Videos are 5–15 seconds MP4s from YouTube, and cover a diverse set of human actions. The subset contains 400 classes with roughly 10–40 videos per class.

Evaluation
- Submissions must be a CSV with exactly two columns in this order: video_id,label, with one row per row in test.csv.
- The score is macro-averaged F1 (unweighted mean of per-class F1), computed on the hidden test labels. This metric treats all classes equally and is robust to class imbalance.
- Ties and undefined cases are handled safely (zero division yields F1=0 for the affected class; the overall score remains finite).

Data split
- We randomly and stratifiably split videos per class into train and test with an 80/20 ratio, ensuring every class appears in both train and test.
- Filenames are anonymized by hashing, and any original class-revealing names are removed to avoid leakage.

Files to use
- Training: train.csv and train_videos/.
- Testing: test.csv and test_videos/.
- Submission format: sample_submission.csv as a template.

Important constraints
- Keep the column names and their order exactly as specified.
- Only classes appearing in train.csv are valid predictions.
- Do not infer labels from filenames or folder names; use the provided CSVs exclusively.
