Cue-Phrase-Based Text Segmentation and Optimal Segment Concatenation for the NTCIR-14 QA Lab-PoliInfo Task
Abstract: The segmentation subtask of the NTCIR-14 QA Lab-PoliInfo task is finding a segment of text in assembly minutes that corresponds to a summary sentence. We divided the segmentation subtask into two steps, segmentation and search. Cue phrases were effectively used to detect segment boundaries. We compared five methods for detecting segment boundaries: a rule-based method, three supervised learning methods, and a novel semi-supervised learning method. The supervised models were trained using minutes data (in Japanese) we had segmented. In the search step, contiguous segments were concatenated to form larger segments, and the segment that maximized the value of a formula was selected as the answer. We compared the proposed formula with the conventional BM25 formula. We achieved the highest F-measure during the NTCIR-14 formal run despite our method’s simplicity.
Loading