Abstract: In many natural language tasks, such as information extraction and semantic lexicon building, individual entities and relations of interest may be found in multiple contexts within the corpus. In deciding which putative entities and relations should be extracted, a key problem is how to combine evidence across the multiple occurrences of these entities and relations. We present a novel statistical approach to address this issue, and evaluate it in the context of extracting protein names and protein-protein interactions from MEDLINE abstracts. We experimentally compare our method against a number of intuitive and simpler baselines. Our experimental results suggest that the issue of combining evidence is indeed important in these tasks. Furthermore, we show that our proposed method outperforms the baselines considered in a variety of settings.
0 Replies
Loading