Abstract: Cross-language speech retrieval systems face a cascade of errors due to transcription and translation ambiguity. Using 1-best speech recognition and 1-best translation in such a scenario could adversely affect recall if those 1-best system guesses are not correct. Accurately representing transcription and translation probabilities could therefore improve recall, although possibly at some cost in precision. The difficulty of the task is exacerbated when working with languages for which limited resources are available, since both recognition and translation probabilities may be less accurate in such cases. This paper explores the combination of expected term counts from recognition with expected term counts from translation to perform cross-language speech retrieval in which the queries are in English and the spoken content to be retrieved is in Tagalog or Swahili. Experiments were conducted using two query types, one focused on term presence and the other focused on topical retrieval. Overall, the results show that significant improvements in ranking quality result from modeling transcription and recognition ambiguity, even in lower-resource settings, and that adapting the ranking model to specific query types can yield further improvements.
0 Replies
Loading