OCEANS: A Global Underwater Bioacoustics Dataset

Published: 24 Sept 2025, Last Modified: 15 Oct 2025NeurIPS2025-AI4Science OralEveryoneRevisionsBibTeXCC BY 4.0
Track: Track 2: Dataset Proposal Competition
Keywords: marine bioacoustics, passive acoustic monitoring, biodiversity, open-world learning, novelty detection, active learning
Abstract: Marine ecosystems are among the most threatened on Earth, yet traditional survey methods remain expensive, labor-intensive, and limited in scale. Bioacoustics offers a powerful alternative: because sound propagates far underwater and many species, from cetaceans to crustaceans, rely on it for essential functions, passive acoustic monitoring (PAM) can provide broad, continuous baselines of ecosystem activity. Despite decades of PAM deployments, however, most underwater organisms lack verified acoustic signatures, leaving vast biodiversity undocumented. We introduce OCEANS (Open Collection of Ecological and Anthropogenic uNderwater Soundscapes): the first open-access, globally- and taxonomically-representative repository of raw underwater soundscapes designed for open-world active discovery. Unlike existing archives, OCEANS contains long-form recordings with overlapping signals, ambient noise, and explicit “unknown event” markers, paired with standardized metadata for large-scale machine learning. This resource enables AI systems to not only classify known calls but detect, cluster, and characterize novel phenomena. Beyond transforming marine bioacoustics, OCEANS provides a unique testbed for advancing discovery-oriented AI methods with broad relevance across the sciences, laying the foundation for more adaptive, expert-driven AI.
Submission Number: 269
Loading