Abstract: In this paper, we address the task of identifying the speakers by name in audio content. Identification of speakers by name helps to improve the readability of the transcript and also provides additional meta-data which can help in finding the audio content of interest. We present a conditional maximum entropy (maxent) framework for this problem which yields superior performance and lends itself well to incorporating different types of information. We take advantage of this property of maxent to explore new features for this task. We show that supplementing standard lexical triggers with information such as speaker gender and position of speaker name mentions afford us large gains in performance. At 95% precision, we increase the recall to 67% from the trigger baseline of 38%.
0 Replies
Loading