A Curious Case of the Missing Measure: Better Scores and Worse Generation

Published: 23 Jan 2025, Last Modified: 26 Feb 2025ICLR 2025 Blogpost TrackEveryoneRevisionsBibTeXCC BY 4.0
Blogpost Url: https://d2jud02ci9yv69.cloudfront.net/2025-04-28-better-scores-worse-generation-178/blog/better-scores-worse-generation/
Abstract: Our field has a secret: nobody fully trusts audio evaluation measures. As neural audio generation nears perceptual fidelity, these measures fail to detect subtle differences that human listeners readily identify, often contradicting each other when comparing state-of-the-art models. The gap between human perception and automatic measures means we have increasingly sophisticated models while losing our ability to understand their flaws.
Conflict Of Interest: The authors have positive COIs with the following cited authors: * Bhiksha Raj * Eduardo Fonseca * Pranay Manocha * Shinji Watanabe * Zeyu Jin
Submission Number: 92
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview