Keywords: Zebra Finch, BirdAVES, bioacoustic deep learning
TL;DR: This paper investigates and adapts BirdAVES foundational model to recognize 173 zebra finch individuals in acoustic recordings in their natural environment, paving way towards translating recognition into communication networks.
Abstract: Understanding who communicates with whom, when, and how is central to the ecology of group-living animals, yet individual-level acoustic identification of animals in their natural environment remains challenging. Zebra finches are a model species whose vocal behaviour has been predominantly studied indoors; here we address the outdoor setting and investigate bioacoustic deep learning for individual identification at scale as a key step to build communication networks from field recordings. We fine-tune BirdAVES for recognizing 173 zebra finch individuals from short (1-3 s) clips using a concise training recipe: two-phase training, weighted sampling and class-weighted cross-entropy for long-tailed counts, and a supervised contrastive term to pull same-individual embeddings together. On a real-world dataset (2,915 clips, 173 individuals), the selected model achieved macro-F1 = 0.733 (val) / 0.726 (test) and steep retrieval gains (Top-5 = 0.868, Top-10 = 0.893 on test set). This enables conversion of hours of audio into “who-sang-when” timelines. We deliberately report top-k performance because it quantifies review effort and supports human-in-the-loop workflows by shrinking the number of clips an expert must audit. While a train–val/test gap reflects short windows and class imbalance, the embeddings are discriminative and immediately useful. Key next steps are to address the imbalance in our data and scaling towards a significantly larger set of individuals, and to translate individual recognition into communication or social networks.
Submission Number: 9
Loading