On Using Composite Word Embeddings To Improve Biomedical Term Similarity

Abhishek Singh, Wei Jin

Published: 01 Jan 2020, Last Modified: 30 Jul 2025BIBE 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Representation learning has gained prominence over last few years and has shown that for all underlying learning algorithm, results are vastly improved if feature representation of input is improved to capture the domain knowledge. Word embedding approaches like Word2Vec or sub-word information technique like Fast-text have shown to improve multiple NLP tasks in biomedical domain. These techniques mostly capture indirect relationships but often fail to capture deeper contextual relationships. This can be attributed to the fact that such techniques capture only short-range context defined via a co-occurrence window. In this paper we propose a novel contextual embedding for a “wide sentential context” We then generate composite word embedding achieving a multi-scale word representation. We further prove that the composite embedding performs better than the present individual state of art techniques on both intrinsic and extrinsic evaluations.