Abstract: Hypothesis generation, which tries to identify implicit associations between two concepts, has attracted much attention due to its ability of linking key concepts scattered in different articles and enriching plausible new hypotheses. Among existing approaches for hypothesis generation, matrix factorization based methods have achieved start-of-the-art performance. However, matrix factorization based methods suffer from the following limitations: 1) Bridge concepts are determined only as a post-hoc analysis of matrix factorization results; 2) The embeddings of concepts by matrix factorization cannot be explained, and thus it is hard to understand whether the concepts are linked in a semantically meaningful way. To overcome these limitations, we propose an interpretable and accurate hypothesis generation model (InterHG), which improves both accuracy and interpretability compared with existing methods. First, we propose to explicitly model the relationship between bridge concepts and given concept pairs, and conduct tensor factorization to identify link concepts. This reduces information loss and improves accuracy compared with post-hoc approaches. Second, we leverage the description of categories in the tensor factorization, which can output concept embedding as a weighted combination of known categories. With this meaningful embedding representation, medical researchers are able to check the correctness of the suggested link concepts for a given concept pair. We conduct experiments based on MeSH terms (a controlled vocabulary of biomedical concepts) extracted from MEDLINE corpus and category information obtained from UMLS (a comprehensive biomedical concept database). Results demonstrate that the proposed InterHG is highly accurate and produces meaningful embeddings for explanations.
0 Replies
Loading