PRiSM: Benchmarking Phone Realization in Speech Models

PRiSM: Benchmarking Phone Realization in Speech Models

ACL ARR 2026 January Submission5760 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: speech technologies, phonology, benchmarking, evaluation

Abstract: Phone recognition (PR) serves as the atomic interface for language-agnostic modeling for cross-lingual speech processing and phonetic analysis. Despite prolonged efforts in developing PR systems, current evaluations only measure surface-level transcription accuracy. We introduce PRiSM, the first open-source benchmark designed to expose blind spots in phonetic perception through intrinsic and extrinsic evaluation of PR systems. PRiSM standardizes transcription-based evaluation and assesses downstream utility in clinical, educational, and multilingual settings with transcription and representation probes. We find that diverse language exposure during training is key to PR performance, encoder-CTC models are the most stable, and specialized PR systems still outperform LALMs. PRiSM releases code, recipes, and datasets to move the field toward multilingual speech models with robust phonetic ability.

Paper Type: Long

Research Area: Speech Processing and Spoken Language Understanding

Research Area Keywords: speech technologies, phonology, benchmarking, evaluation

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models

Languages Studied: afr, amh, ara, aze, bak, bel, ben, bgc, bos, bul, cat, ceb, ces, cmn, cym, dan, deu, ell, eng, est, eus, fin, fra, ful, gle, glg, hau, hin, hrv, hun, ina, ind, isl, ita, jav, jpn, kat, kaz, kin, kir, kmr, kor, lao, lit, mal, mar, mkd, mlt, mon, mri, msa, mya, nld, nob, nya, ori, orm, pan, pol, por, ron, rus, sin, skr, slk, slv, sna, snd, som, spa, srp, swa, swe, tam, tat, tel, tgk, tha, tur, uig, ukr, urd, uzb, vie, xho, yor, yue, zul

Submission Number: 5760

Loading