We prepare three demo videos to show the qualitative results of COME.

1. gt_dome_come_3s_results.mp4 shows the comparison of ground-truth generation, DOME generation with official checkpoint and COME. The task setting is to use 4-frame 3D-Occ sequences as input and predict the next 6-frame (3-s prediction) sequences. 
 
2. gt_dome_come_8s_results.mp4 shows the comparison of ground-truth generation, DOME generation with reproduced checkpoint and COME. The task setting is to use 4-frame 3D-Occ sequences as input and predict the next 16-frame (8-s prediction) sequences. 
 
3. come_with_bev_layout_3s_results.mp4 shows the COME generation with BEV layouts. The task setting is to use 2-frame 3D-Occ sequences and 8-frame BEV layouts as input and predict the next 6-frame (3-s) sequences. 
 