Keywords: udio-visual corresponding dataset, sound separation, audio-video retrieval
TL;DR: An Open Large-Scale Audio-Visual Dataset with High Correspondence
Abstract: Recent research initiatives such as ChatGPT and Sora highlight the important role of large-scale data in advancing generative and comprehension tasks. However, the scarcity of comprehensive and large-scale audio-visual correspondence datasets poses a significant challenge to research in the audio-visual field. To address this gap, we introduce **AVSET-10M**, a high-correspondence audio-visual dataset comprising 10 million samples, featuring the following key attributes: (1) **High Audio-Visual Correspondence**: Through meticulous sample filtering, we ensure a strong correspondence between the audio and visual components of each entry. (2) **Comprehensive Categories**: Encompassing 527 unique audio categories, AVSET-10M provides a wide range of audio categories for diverse research needs. (3) **Large Scale**: With 10 million samples, AVSET-10M is one of the largest publicly available audio-visual correspondence datasets. We have benchmarked two critical tasks on AVSET-10M: audio-video retrieval and vision-queried sound separation. These tasks underscore the importance of precise audio-visual correspondence in advancing audio-visual research. For more information, please visit our demo page at \url{https://avset-10m.github.io/}.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7550
Loading