# Audio-video Samples of SpecMaskFoley
We offer foley synthesis results of SpecMaskFoley with the settings reported in Tab.1 in the paper.
- in-domain VGGSound-test
    - Videos are taken from "4. On VGGSound" in https://hkchengrex.com/MMAudio/video_main.html#. Readers of this document can visit this website to compare SpecMaskFoley (ControlNet-based) with 8 different methods benchmarked in Tab.1.
    - Although all the samples present the quality of our proposed method, the authors believe that the following samples are especially interesting
        - VGG_example2_StrikingAGolfBall_4OCcv5d6xsU_000029: SpecMaskFoley can accurately capture the moment of hitting the ball
        - VGG_example5_PlayingAStringInstrument_V3aaPyUdIyo_000254: SpecMaskFoley accurately capture the moment when the man starts to speak
        - VGG_example6_AGroupOfPeoplePlayingTambourines_WohJxQ1ll6w_000052: SpecMaskFoley can reflect the change in volumne and timbre when some people stopped tapping.
- out-of-domain
    - Videos are taken from "1. Comparisons with Movie Gen Audio on Videos Generated by MovieGen" in https://hkchengrex.com/MMAudio/video_main.html. Readers of this document can visit this website to compare SpecMaskFoley (ControlNet-based) with state-of-the-art from-scratch models, such as MMAudio and MovieGen.
    - Note that, although our model fails in the last sample, SOTA methods also fail in this case. We hypothesize this failure lies more in the training data rather than in algorithm..