Keywords: sound source localization, animal behavior, computational ethology
Abstract: Understanding the behavioral and neural dynamics of social interactions is a goal
of contemporary neuroscience. Many machine learning methods have emerged
in recent years to make sense of complex video and neurophysiological data that
result from these experiments. Less focus has been placed on understanding how
animals process acoustic information, including social vocalizations. A critical
step to bridge this gap is determining the senders and receivers of acoustic infor-
mation in social interactions. While sound source localization (SSL) is a classic
problem in signal processing, existing approaches are limited in their ability to
localize animal-generated sounds in standard laboratory environments. Advances
in deep learning methods for SSL are likely to help address these limitations,
however there are currently no publicly available models, datasets, or benchmarks
to systematically evaluate SSL algorithms in the domain of bioacoustics. Here,
we present the VCL Benchmark: the first large-scale dataset for benchmarking
SSL algorithms in rodents. We acquired synchronized video and multi-channel
audio recordings of 767,295 sounds with annotated ground truth sources across 9
conditions. The dataset provides benchmarks which evaluate SSL performance on
real data, simulated acoustic data, and a mixture of real and simulated data. We
intend for this benchmark to facilitate knowledge transfer between the neuroscience
and acoustic machine learning communities, which have had limited overlap.
Submission Number: 2055
Loading