Keywords: Diffusion Model, Audio Generation, Video to Audio Generation
Abstract: We study Neural Foley, the automatic generation of high-quality sound effects
synchronizing with videos, enabling an immersive audio-visual experience. Despite
its wide range of applications, existing approaches encounter limitations
when it comes to simultaneously synthesizing high-quality and video-aligned
(i.e.,semantic relevant and temporal synchronized) sounds. To overcome these
limitations, we propose FoleyCrafter, a novel framework that leverages a pretrained
text-to-audio model to ensure high-quality audio generation. FoleyCrafter
comprises two key components: a semantic adapter for semantic alignment and a
temporal adapter for precise audio-video synchronization. The semantic adapter
utilizes parallel cross-attention layers to condition audio generation on video features,
producing realistic sound effects that are semantically relevant to the visual
content. Meanwhile, the temporal adapter estimates time-varying signals from
the videos and subsequently synchronizes audio generation with those estimates,
leading to enhanced temporal alignment between audio and video. One notable
advantage of FoleyCrafter is its compatibility with text prompts, enabling the use
of text descriptions to achieve controllable and diverse video-to-audio generation
according to user intents. We conduct extensive quantitative and qualitative experiments
on standard benchmarks to verify the effectiveness of FoleyCrafter. Models
and codes will be available.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6154
Loading