Title: Multimodal Proctoring Events — Segment-level Classification Challenge

Overview
Design an automated proctoring model that detects exam-related behaviors from synchronized webcam, egocentric wearcam, and microphone streams. Your task is to classify fixed-length 10-second segments for held-out subjects into one of seven classes:
- 0: no suspicious/cheating activity
- 1–6: six distinct behavior categories (anonymous labels)

This is a multimodal, imbalanced, segment-level classification problem. You are encouraged to leverage audio-visual fusion, temporal modeling, and robust feature engineering.

What you get
- train_segments.csv: Segment-level annotations for training
  - columns: subject_id, segment_start, segment_end, label
  - segment_start and segment_end are in seconds, segment_end is exclusive
  - labels are integers in [0, 6]
- test_data.csv: The segment keys you must predict for
  - columns: subject_id, segment_start, segment_end
- sample_submission.csv: An example submission with the correct format
- train/media/: For each training subject, three files per subject
  - {subject_id}_audio.wav
  - {subject_id}_webcam.avi
  - {subject_id}_wearcam.avi
- test/media/: The corresponding three files per test subject

Important details
- Segmentation: Each subject session is partitioned into consecutive, non-overlapping 10-second segments from t=0 up to the end of the recording. The final segment may be shorter than 10 seconds. A segment’s label is defined by the ground-truth category with the largest temporal overlap; if no ground-truth interval overlaps the segment, the label is 0.
- Subject split: Train and test contain disjoint subjects to ensure generalization to unseen individuals and environments.
- Anonymization: Media filenames are normalized to avoid leaking identity or label information.

Task
Given the media for each subject and the segment definitions, predict the behavior class for every row in test_data.csv. Your submission must contain exactly the same rows (same subject_id, segment_start, segment_end) with an additional integer label column.

Evaluation
- Metric: Macro-averaged F1 across 7 classes (0–6). This macro F1 equally weights all classes, rewarding models that perform well on minority behavior categories as well as the majority “no activity” class.
- Submission format:
  - CSV header: subject_id, segment_start, segment_end, label
  - label must be an integer in [0, 6]
  - The set of (subject_id, segment_start, segment_end) keys must match test_data.csv exactly. Duplicates or missing rows invalidate the submission.


Deliverables
- One CSV file with exactly these columns: subject_id, segment_start, segment_end, label.
- Do not include any extra columns. Ensure numeric types are valid and no NaNs/Infs are produced in preprocessing or model outputs.

Data files in this competition
- Training: train_segments.csv and train/media/
- Test: test_data.csv and test/media/
- Format helper: sample_submission.csv

Good luck building a robust multimodal proctoring event detector!
