Open-Emotion: A Reproducible EMO-Superb For Speech Emotion Recognition Systems

Published: 01 Jan 2024, Last Modified: 09 Apr 2025SLT 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Speech emotion recognition (SER) is an essential technology for human-computer interaction systems. However, the previous study reveals that 80.77% of SER papers yield results that cannot be reproduced on the well-known IEMOCAP dataset. The main reason for reproducibility challenges is that the database did not provide standard data splits (e.g., train, development, and test sets). Prior papers could define its partition, but they did not provide details of the partition or source code for processing the partition. Therefore, this work aims to make SER open and reproducible to everyone. We develop the EMO-SUPERB, shorted for EMOtion Speech Universal PERformance Benchmark, including a user-friendly codebase to leverage 16 state-of-the-art (SOTA) speech self-supervised learning models for exhaustive evaluation plus one SOTA SER model across 6 open-source SER datasets in English and Chinese. We make all resources open-source to facilitate future developments in SER. Researchers can easily upload their systems or datasets to EMO-SUPERB, and we name the project “Open-Emotion”.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview