WhaleLM: Finding Structure and Information in Sperm Whale Vocalizations and Behavior with Machine Learning

Published: 24 Sept 2025, Last Modified: 15 Oct 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: Animal Communication, Hypothesis Testing, Scientific Discovery
Abstract: Sperm whales (Physeter macrocephalus) communicate using patterned click sequences called codas. Whether there are any systematic patterns governing the structure of coda sequences, or how coda production influences group behavior, remain open questions. To answer these questions, we train neural sequence models (“sperm whale language models”) on vocalization and behavior data from a population of sperm whales in the eastern Caribbean. By systematically manipulating models' training data and measuring changes in predictive power, we find that vocalizations exhibit order dependence and long-range dependencies on up to eight previous codas in an exchange. We additionally find that this structure encodes information about behavior: whales' current behavioral context and future actions are predictable with high accuracy from coda sequences. The methods developed for relating vocalization to behavior are general, and offer a flexible framework for using language models to investigate the structure and information content of unknown communication systems.
Submission Number: 486
Loading