AdapITN: A Fast, Reliable, and Dynamic Adaptive Inverse Text Normalization

Published: 01 Jan 2023, Last Modified: 13 Nov 2024ICASSP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Inverse text normalization (ITN) is the task that transforms text in spoken-form into written-form. While automatic speech recognition (ASR) produces text in spoken-form, human and natural language understanding systems prefer to consume text in written-form. ITN generally deals with semiotic phrases (e.g., numbers, date, time). However, lack of studies to deal with phonetization phrases, which is ASR’s output when it handles unseen data (e.g., foreign-named entities, domain names), although these exist in the same form in the spoken-form text. The reason is that phonetization phrases are infinite patterns and language-dependent. In this study, we introduce a novel end2end model that can handle both semiotic phrases (SEP) and phonetization phrases (PHP), named AdapITN. We call it "Adap" because it allows for handling unseen PHP. The model performs only when necessary by providing a mechanism to narrow normalized regions and external query knowledge, reducing the runtime significantly.
Loading