Can Masked Autoencoders Also Listen to Birds?

Lukas Rauch; René Heinrich; Ilyass Moummad; Alexis Joly; Bernhard Sick; Christoph Scholz

Can Masked Autoencoders Also Listen to Birds?

Lukas Rauch, René Heinrich, Ilyass Moummad, Alexis Joly, Bernhard Sick, Christoph Scholz

Published: 27 Aug 2025, Last Modified: 27 Aug 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Masked Autoencoders (MAEs) learn rich representations in audio classification through an efficient self-supervised reconstruction task. Yet, general-purpose models struggle in fine-grained audio domains such as bird sound classification, which demands distinguishing subtle inter-species differences under high intra-species variability. We show that bridging this domain gap requires full-pipeline adaptation beyond domain-specific pretraining data. Using BirdSet, a large-scale bioacoustic benchmark, we systematically adapt pretraining, fine-tuning, and frozen feature utilization. Our Bird-MAE sets new state-of-the-art results on BirdSet’s multi-label classification benchmark. Additionally, we introduce the parameter-efficient prototypical probing, which boosts the utility of frozen MAE features by achieving up to 37 mAP points over linear probes and narrowing the gap to fine-tuning in low-resource settings. Bird-MAE also exhibits strong few-shot generalization with prototypical probes on our newly established few-shot benchmark on BirdSet, underscoring the importance of tailored self-supervised learning pipelines for fine-grained audio domains.

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/DBD-research-group/Bird-MAE

Assigned Action Editor: ~Chuan-Sheng_Foo1

Submission Number: 5030

Loading