# Dataset Organization Structure Explanation

## 1. Overview

MIntRec2.0 contains 1,245 high-quality dialogues and 15,040 samples, each annotated with one of 30 fine-grained classes within a new intent taxonomy, spanning text, video, and audio modalities.

## 2. File Structure

### 2.1 Text Data
- text
    - train.tsv
    - dev.tsv
    - test.tsv

Text data are stored in the "text" folder. Each .tsv file within this folder contains the following columns:
    - `Dialogue_id`: The dialogue index
    - `Utterance_id`: The utterance index 
    - `Text`: Text utterances
    - `Label`: Multimodal label
    - `Start_timestamp`: The starting timestamp of an utterance
    - `End_timestamp`: The ending timestamp of an utterance
    - `Source`: The TV series source
    - `speakername`: The identity of the speaker

### 2.2 Video Data
- video
    - dia0_utt0.mp4
    - dia0_utt1.mp4
    - dia0_utt2.mp4
    - ...

Each MP4 file is named according to the following convention: Dialogue_id + _ + Utterance_id + .mp4. Due to upload size limitations on OpenReview, we release 73 instances of raw videos as examples. The entire raw dataset, well-extracted features, and codes will be released after publication.
