Spoken term detection from bilingual spontaneous speech using code-switched lattice-based structures for words and subword units

Abstract: This paper presents the first work known publicly on spoken term detection from bilingual spontaneous speech using code-switched lattice-based structures for word and subword units. The corpus used is the lectures with Chinese as the host language and English as the guest language recorded for a real course offered in National Taiwan University. The techniques reported here have been successfully implemented and tested in a real lecture system now available on-line over the Internet. We also present the approaches of using word fragment as the subword unit for English, and analyse the difficult issues when code-switched lattice-based structures for subword units are used for tasks involving languages of quite different natures.
0 Replies
Loading