75-Speaker Annot-16: A benchmark dataset for speech articulatory rt-MRI annotation with articulator contours and phonetic alignment

Xuan Shi, Yubin Zhang, Yijing Lu, Marcus Ma, Tiantian Feng, Asterios Toutios, Haley Hsu, Louis Goldstein, Shrikanth Narayanan

Published: 2025, Last Modified: 06 Feb 2026INTERSPEECH 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: High-quality speech articulatory databases are essential for advancing speech science and technology research. However, the lack of standardized annotations limits their full potential use and broad accessibility. In this context, we introduce 75-Speaker Annot-16, a comprehensive annotation dataset derived from the 75-Speaker vocal tract MRI database. Annot-16 provides phonetic alignments, articulator contour annotations, and handmade ground-truth articulator contours. Our annotation process integrates automated algorithms with expert verification to ensure accuracy and efficiency. To demonstrate its utility, we establish three benchmark tasks: speech phoneme recognition, articulatory contour segmentation, and articulatory phoneme recognition. Annot-16 can serve as a valuable resource for speech modeling, computer vision, and cross-modal learning, bridging engineering applications, speech science, and linguistic research.

External IDs:dblp:conf/interspeech/ShiZLMFTHGN25