Model Merging Improves Zero-Shot Generalization in Bioacoustic Foundation Models

Davide Marincione; Donato Crisostomi; Roberto Dessi; Emanuele Rodolà; Emanuele Rossi

Model Merging Improves Zero-Shot Generalization in Bioacoustic Foundation Models

Davide Marincione, Donato Crisostomi, Roberto Dessi, Emanuele Rodolà, Emanuele Rossi

Published: 02 Oct 2025, Last Modified: 02 Dec 2025NeurIPS 2025 AiForAnimalComms WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLMs, Bioacustics, Model Merging, Zero-Shot Generalization

TL;DR: We recover the instruction-following capabilities of NatureLM, a finetuned bioacustic foundation model, through model merging, improving its zero-shot generalization capabilities.

Abstract: Foundation models capable of generalizing across species and tasks represent a promising new frontier in bioacoustics, with NatureLM being one of the most prominent examples. While its domain-specific fine-tuning yields strong performance on bioacoustic benchmarks, we observe that it also introduces trade-offs in instruction-following flexibility. For instance, NatureLM achieves high accuracy when prompted for either the common or scientific name individually, but its accuracy drops significantly when both are requested in a single prompt. We address this by applying a simple model merging strategy that interpolates NatureLM with its base language model, recovering instruction-following capabilities with minimal loss of domain expertise. Finally, we show that the merged model exhibits markedly stronger zero-shot generalization, achieving over a 200% relative improvement and setting a new state-of-the-art in closed-set zero-shot classification of unseen species.

Submission Number: 1

Loading