This folder includes supplementary materials of paper【DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM】.

The contents of each file are described as follows:
【Concise-Generation-Example(bear-17).txt】and【Detailed-Generation-Example(bear-17).txt】An example of the diverse generated texts in DTVLT. We select the bear-17 as an example. For other data files, please download the train and val_test data.
【Demo.mp4】A video example to illustrate the characteristics of the proposed benchmark. Due to the file size limit of supplementary materials, we accelerate the playback speed of demos and reduce the resolution of the frames.

We declare that we bear all responsibility in case of violation of rights, etc., and confirm the data license. Our work is licensed under CC BY-NC-SA 4.0. Users are free to use the dataset for research purposes.

        