Model Video Visualizations
1. Model Visualization on AVSyncD Dataset
The videos are arranged from left to right as follows: KeyVID, KeyVID-Uniform, AVSyncD, and DynamiCrafter.
2. Open-Domain Generation Visualization with Audio Synchronization
Note: Please turn on the volume when playing the videos.
The first audio clip sounds like a hammer striking on a wooden surface, and the second represents four hammer strikes on a metal object.
The results show that our model not only generates videos with the correct pattern of hammer strikes but also hits on different objects based on the material sound.