Keywords: bioacoustics, passive acoustic monitoring, convolutional neural networks, transformer models, embedding-based classification, embedding extraction, unsupervised repertoire discovery, single species classifiers, ConvNext, EfficientNet, Perch, BirdNET, BirdSet, PANN, ViT, AST, PaSST, UMAP, HDBSCAN
Abstract: While machine learning and bioacoustics have advanced the study of animal communication and enabled passive acoustic monitoring across various taxa, these methods remain underused for gray wolves ($\textit{Canis lupus}$), a species with a distinctive and complex vocal repertoire and critical conservation relevance. Bioacoustics research on gray wolf vocalizations has often relied on small datasets, recordings of captive individuals, manually engineered acoustic features, and task-specific models. These constraints limit generalizability, ecological inference, and practical applications in conservation. To address these challenges, we have compiled what we believe to be the largest dataset of wild wolf vocalizations, comprising over 200,000 hours of field recordings and over 7,000 expert-verified vocalization clips from Yellowstone National Park. Leveraging this dataset, we are developing a machine learning research program that includes: (1) spectrogram-based classifiers for population monitoring using partial fine-tuning of state-of-the-art CNNs and image transformers, (2) embedding-based classification, (3) unsupervised clustering for vocal repertoire discovery, and (4) few-shot learning approaches for rare call types and for transfer to smaller datasets. Our ongoing work directly addresses current limitations in dataset scale, reliance on hand-crafted features, and underexplored unsupervised analysis, while supporting noninvasive, ecologically meaningful, and conservation-relevant applications.
Submission Number: 25
Loading