HumBugDB: a large-scale acoustic mosquito datasetDownload PDF

07 Jun 2021 (modified: 22 Oct 2023)Submitted to NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone
Keywords: Acoustic machine learning, audio event detection, audio classification, mosquito detection, Bayesian deep learning
TL;DR: Large-scale multi-species dataset of acoustic recordings of mosquitoes, with Bayesian convolutional neural network classification models.
Abstract: This paper presents the first large-scale multi-species dataset of acoustic recordings of mosquitoes tracked continuously in free flight. Mosquitoes are well-known carriers of diseases such as malaria, dengue and yellow fever. The motivation for collecting such a large dataset comes from the need to gather information, help predict outbreaks, and inform data-driven policy. The task of detecting mosquitoes from their wingbeats is made challenging due to the difficulty in collecting recordings from realistic scenarios. To address this, as part of the HumBug project, we have conducted global experiments to record mosquitoes ranging from those bred indoors in culture cages to mosquitoes captured in the wild. As a result, the audio recordings vary widely in signal-to-noise ratio and contain a broad range of indoor and outdoor background environments from Tanzania, Thailand, Kenya, the USA and the UK. The audio recordings have been labelled by domain experts, aided by Bayesian neural networks. As a result, we present 20 hours of mosquito audio recordings expertly labelled with tags precise in time, of which 18 hours are annotated from 36 different species. We provide our data from a regularly maintained database, which captures important metadata such as the capture method, age, feeding status and gender of the mosquitoes. Additionally, we provide code to extract features and train Bayesian convolutional neural networks that can distinguish mosquito sounds from their corresponding background. Our contribution is to provide a dataset that is both challenging to machine learning researchers focusing on acoustic identification, and critical to entomologists, geo-spatial modellers and other domain experts to understand mosquito behaviour, model their distribution, and manage the threat they pose to humans.
Supplementary Material: zip
URL: Dataset: https://doi.org/10.5281/zenodo.4904800, Code and metadata: https://github.com/HumBug-Mosquito/HumBugDB
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 6 code implementations](https://www.catalyzex.com/paper/arxiv:2110.07607/code)
13 Replies

Loading