Abstract: Neural inverse text normalization (ITN) has recently become an emerging approach for automatic speech recognition in terms of post-processing for readability. In particular, leveraging ITN by using neural network models has achieved remarkable results instead of relying on the accuracy of manual rules. However, ITN is a highly language-dependent task that is especially tricky in ambiguous languages. In this study, we focus on improving the performance of ITN tasks by adopting the combination of neural network models and rule-based systems. Specifically, we first use a seq2seq model to detect numerical segments (e.g., cardinals, ordinals, and date) of input sentences. Then, detected segments are converted into the written form using rule-based systems. Technically, a major difference in our method is that we only use neural network models to detect numerical segments, which is able to deal with the low resource and ambiguous scenarios of target languages. Regarding the experiment, we evaluate different languages in order to indicate the advantages of the proposed method. Accordingly, empirical evaluations provide promising results for our method compared with state-of-the-art models in this research field, especially in the case of low resource scenarios.
Loading