Keywords: Bioacoustics, sound event detection, machine learning
TL;DR: We propose a method for accurately detecting bioacoustic sound events that is robust to overlapping events, a common issue in domains such as ethology, ecology and conservation.
Abstract: We propose a method for accurately detecting bioacoustic sound events that is robust to overlapping events, a common issue in domains such as ethology, ecology and conservation. While standard methods employ a frame-based, multi-label approach, we introduce an
onset-based detection method which we name Voxaboxen. For each time window, Voxaboxen predicts whether it contains the start of a
vocalization and how long the vocalization is. It also does the same in reverse, predicting whether each window contains the end of a vocalization, and how long ago it started, and fuses the two sets of bounding boxes with a graph-matching algorithm. We also release a new dataset of temporally-strong labels of zebra finch vocalizations designed to have high overlap. Experiments on eight datasets, including our new dataset, show Voxaboxen outperforms natural baselines and existing methods, and is robust to vocalization overlap.
Submission Number: 24
Loading