WEB-derived pronunciationsDownload PDFOpen Website

Published: 01 Jan 2009, Last Modified: 25 Feb 2024ICASSP 2009Readers: Everyone
Abstract: Pronunciation information is available in large quantities on the Web, in the form of IPA and ad-hoc transcriptions. We describe techniques for extracting candidate pronunciations from Web pages and associating them with orthographic words, filtering out poorly extracted pronunciations, normalizing IPA pronunciations to better conform to a common transcription standard, and generating phonemic from ad-hoc transcriptions. We show improvements on a letter-to-phoneme task when using web-derived vs. Pronlex pronunciations.
0 Replies

Loading