WolBanking77: Wolof Banking Speech Intent Classification Dataset

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: intent classification, automatic speech recognition, low-resource languages, multilingual speech recognition, african languages, wolof, natural language processing
TL;DR: We present an Intent Classification Dataset for Wolof language. We conduct experiments on various baselines, including text and voice state-of-the-art models.
Abstract: Intent classification models have made a significant progress in recent years. However, previous studies primarily focus on high-resource language datasets, which results in a gap for low-resource languages and for regions with high rates of illiteracy, where languages are more spoken than read or written. This is the case in Senegal, for example, where Wolof is spoken by around 90\% of the population, while the national illiteracy rate remains at of 42\%. Wolof is actually spoken by more than 10 million people in West African region. To address these limitations, we introduce the Wolof Banking Speech Intent Classification Dataset (WolBanking77), for academic research in intent classification. WolBanking77 currently contains 9,791 text sentences in the banking domain and more than 4 hours of spoken sentences. Experiments on various baselines are conducted in this work, including text and voice state-of-the-art models. The results are very promising on this current dataset. In addition, this paper presents an in-depth examination of the dataset’s contents. We report baseline F1-scores and word error rates metrics respectively on NLP and ASR models trained on WolBanking77 dataset and also comparisons between models. Dataset and code available at: [wolbanking77](https://github.com/abdoukarim/wolbanking77).
Croissant File: json
Dataset URL: https://www.kaggle.com/datasets/abdoukarimkandji/wolbanking77
Code URL: https://github.com/abdoukarim/wolbanking77
Primary Area: Other (please use sparingly, only use the keyword field for more details)
Submission Number: 1259
Loading