# Video Detail Description

## Task Description

This repository contains an evaluation dataset designed for assessing the performance of video models. The dataset includes human-generated detailed descriptions of videos, which have been used to generate several QA pairs with the help of GPT-3.5. The evaluation focuses on multiple dimensions of the responses generated by GPT-3.5.

- Questions: Each question in this dataset follows the format: "Please provide a detailed description of the video, focusing on the main subjects, their actions, the background scenes."

- Answers: The answers are extracted from the "human-generated detailed description" of the videos.

- GPT-3.5 Evaluation: The answers are evaluated using a prompt we designed, which rates the responses based on the aforementioned dimensions with `gpt-3.5-turbo-0613`.

## Groups & Tasks

### Tasks

- `vdd_499`: Given a question and a video, generate detail description of this video.
  
## Citation

```bibtex
@article{Maaz2023VideoChatGPT,
    title={Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models},
    author={Maaz, Muhammad and Rasheed, Hanoona and Khan, Salman and Khan, Fahad Shahbaz},
    journal={arXiv:2306.05424},
    year={2023}
}
```