Green Topics, Deep Roots: Energy-Aware Topic Modelling of Multilingual Nigerian Lyrics

Published: 27 Sept 2025, Last Modified: 09 Nov 2025NeurIPS Creative AI Track 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: Paper
Keywords: Green AI; Energy-aware NLP; Topic Modelling; Nigerian Lyrics; Low-Resource Settings.
Abstract: We investigate how to model themes in Nigerian lyrics while respecting energy limits faced in low-resource settings. Our multilingual corpus spans English, Yoruba, and Nigerian Pidgin, including everyday code-switches and devotional terms, to preserve cultural nuance. We benchmark seven topic models (NMF, LDA, LSI, HDP, BERTopic, Top2Vec, GSDMM). Methods combined standard semantic metrics like coherence (Cv, UMass), topic diversity, and Jaccard overlap with direct energy measurements (kWh). Results show a pronounced quality-energy trade-off: NMF achieved the highest coherence among classical models (Cv = 0.6045) at ~2×10⁻⁶ kWh, while LSI was similarly frugal with competitive quality. By contrast, BERTopic delivered maximal diversity (1.000) with disjoint topics (Jaccard = 0.000) but at markedly higher energy (0.000450 kWh). Top2Vec underperformed on coherence (Cv = 0.2698) and consumed more energy than most classical baselines (0.000113 kWh); GSDMM drew the most energy (0.000509 kWh) with undefined coherence on this short, sparse corpus. Interpreting these findings, we argue that in contexts where electricity and computing are scarce, classical models—particularly NMF—offer a culturally faithful, carbon-conscious starting point, while neural or embedding-based methods may be reserved for cases that demand maximal topical separation. Our study offers practical guidance for teams seeking sustainable, human-centred text mining of indigenous cultural materials.
Video Preview For Artwork: mp4
Submission Number: 104
Loading