HumBugDB: A Large-scale Acoustic Mosquito DatasetDownload PDF

Published: 11 Oct 2021, Last Modified: 14 Jul 2024NeurIPS 2021 Datasets and Benchmarks Track (Round 2)Readers: Everyone
Keywords: Acoustic machine learning, audio event detection, audio classification, mosquito detection, Bayesian deep learning
TL;DR: Large-scale multi-species dataset of acoustic recordings of mosquitoes, with Bayesian convolutional neural network detection and classification models.
Abstract: This paper presents the first large-scale multi-species dataset of acoustic recordings of mosquitoes tracked continuously in free flight. We present 20 hours of audio recordings that we have expertly labelled and tagged precisely in time. Significantly, 18 hours of recordings contain annotations from 36 different species. Mosquitoes are well-known carriers of diseases such as malaria, dengue and yellow fever. Collecting this dataset is motivated by the need to assist applications which utilise mosquito acoustics to conduct surveys to help predict outbreaks and inform intervention policy. The task of detecting mosquitoes from the sound of their wingbeats is challenging due to the difficulty in collecting recordings from realistic scenarios. To address this, as part of the HumBug project, we conducted global experiments to record mosquitoes ranging from those bred in culture cages to mosquitoes captured in the wild. Consequently, the audio recordings vary in signal-to-noise ratio and contain a broad range of indoor and outdoor background environments from Tanzania, Thailand, Kenya, the USA and the UK. In this paper we describe in detail how we collected, labelled and curated the data. The data is provided from a PostgreSQL database, which captures important metadata such as the capture method, age, feeding status and gender of the mosquitoes. Additionally, we provide code to extract features and train Bayesian convolutional neural networks for two key tasks: the identification of mosquitoes from their corresponding background environments, and the classification of detected mosquitoes into species. Our extensive dataset is both challenging to machine learning researchers focusing on acoustic identification, and critical to entomologists, geo-spatial modellers and other domain experts to understand mosquito behaviour, model their distribution, and manage the threat they pose to humans.
URL: Dataset:, Code:
Supplementary Material: pdf
Contribution Process Agreement: Yes
Dataset Url: Dataset: Code:
License: Dataset: CC-BY-4.0 license. Code: MIT license.
Author Statement: Yes
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 6 code implementations](
9 Replies