The " Four Data Branch of SpeakerVid-5M" folder contains examples of the four branches in the dataset. 
The "Generated Results" folder includes the generation results of our baseline model, which adopts an autoregressive (AR) paradigm for audio-visual joint generation. 
The "Body Composition" folder provides examples of the four types of data in the dataset, categorized by body and viewpoint.
The "example_video_link.txt" file contains examples of our original video channel data.
Due to the size limitations of the supplementary materials, the videos provided in the folder have been compressed.