Model Video Visualizations

1. Model Visualization on AVSyncD Dataset

The videos are arranged from left to right as follows: KeyVID, KeyVID-Uniform, AVSyncD, and DynamiCrafter.

2. Open-Domain Generation Visualization with Audio Synchronization

Note: Please turn on the volume when playing the videos.

The first audio clip sounds like a hammer striking on a wooden surface, and the second represents four hammer strikes on a metal object.

The results show that our model not only generates videos with the correct pattern of hammer strikes but also hits on different objects based on the material sound.