Large-scale audio-language datasets for bioacoustics

Gagan Narula; Marius Miron; David Robinson; Milad Alizadeh; Masato Hagiwara; Ellen Gilsenan-McMahon; Sara Keen; Benjamin Hoffman; Maddie Cusimano; Emmanuel Chemla; Matthieu Geist; Olivier Pietquin

Large-scale audio-language datasets for bioacoustics

Gagan Narula, Marius Miron, David Robinson, Milad Alizadeh, Masato Hagiwara, Ellen Gilsenan-McMahon, Sara Keen, Benjamin Hoffman, Maddie Cusimano, Emmanuel Chemla, Matthieu Geist, Olivier Pietquin

Published: 24 Sept 2025, Last Modified: 15 Oct 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Track 2: Dataset Proposal Competition

Keywords: bioacoustics, large audio-language models, dataset centric AI, AI for ethology

TL;DR: We present a large training and benchmark dataset for audio-text generative modeling focused on bioacoustics and proposal to extend it.

Abstract: We introduce bioacoustic datasets for training and evaluation of audio-language foundation models. The training dataset aggregates 22,000 hours of audio across 44 different tasks, and includes animal vocalizations, human speech, music, and environmental sounds from public sources. The benchmark dataset tests zero-shot transfer on species classification and detection, call-type classification, and other bioacoustic tasks. Most models for conservation, biodiversity monitoring, and ethology are predictive models trained on small datasets with limited species coverage. Our large-scale, cross-taxa multimodal datasets enable the transition to foundational generative models that demonstrate exceptional ability to handle novel data and tasks, exhibit in-context learning, and produce unconstrained output, capabilities that greatly benefit bioacoustics research. These datasets were used to train and evaluate the AnonymousLM model, the first audio-text language model for bioacoustics that demonstrates effective zero-shot transfer across species and tasks. We explore several possibilities for extending these datasets and furthering the use of generative models in bioacoustics.

Submission Number: 386

Loading