Dolph2Vec: Self-Supervised Representations of Dolphin Vocalizations

Dolph2Vec: Self-Supervised Representations of Dolphin Vocalizations

ICLR 2026 Conference Submission16655 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: animal communication, bioacoustics, machine learning for science, self-supervised learning

Abstract: Self-supervised learning (SSL) offers critical potential in the field of bioacoustics and more broadly interspecies communication. Existing large-scale models (e.g., BioLingual, AVES, Nature-LM) prioritize cross-species generalization but fail to address the fine-grained vocal complexity within individual species. In this work, we introduce a new, large-scale dataset comprising exclusively dolphin vocalizations and present a custom pretrained model optimized for this domain. We pretrain this model on unlabeled vocalizations and benchmark it on a novel fine-grained classification task: signature whistle identification. Our approach outperforms general-purpose models in both supervised classification and unsupervised clustering. In addition to performance metrics, we show that species-specific SSL enables the formulation and empirical testing of hypotheses about dolphin communication. This approach lays the groundwork for acoustic models that are both biologically meaningful and interpretable. We propose that SSL models tailored to individual species are key to advancing interspecies communication research, as they capture the unique patterns in each species’ vocal signals, complementing broader cross-species methods.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 16655

Loading