Track: Track 2: Dataset Proposal Competition
Keywords: bioacoustics, large audio-language models, dataset centric AI, AI for ethology
TL;DR: We present a large training and benchmark dataset for audio-text generative modeling focused on bioacoustics and proposal to extend it.
Abstract: We introduce bioacoustic datasets for training and evaluation of audio-language foundation models. The training dataset aggregates 22,000 hours of audio across 44 different tasks, and includes animal vocalizations, human speech, music, and environmental sounds from public sources. The benchmark dataset tests zero-shot transfer on species classification and detection, call-type classification, and other bioacoustic tasks. Most models for conservation, biodiversity monitoring, and ethology are predictive models trained on small datasets with limited species coverage. Our large-scale, cross-taxa multimodal datasets enable the transition to foundational generative models that demonstrate exceptional ability to handle novel data and tasks, exhibit in-context learning, and produce unconstrained output, capabilities that greatly benefit bioacoustics research. These datasets were used to train and evaluate the AnonymousLM model, the first audio-text language model for bioacoustics that demonstrates effective zero-shot transfer across species and tasks. We explore several possibilities for extending these datasets and furthering the use of generative models in bioacoustics.
Submission Number: 386
Loading