Ontological Approach for Knowledge Extraction from Clinical Documents

Raxit Goswami, Vatsal Shah, Nehal Shah, Chetan Moradiya

Published: 2019, Last Modified: 18 Mar 2026BIBM 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In clinical NLP (Natural Language Processing), Knowledge extraction is a very important task to develop a highly accurate information retrieval system. The various approaches used to develop such systems include rule-based approach, statistical approach, shortest path algorithm or hybrid of these approaches. Accuracy and coverage are the most important parameters while comparing different approaches. Some methodologies have good accuracy but low coverage and vice-versa. In this paper, our focus is to extract domain relationships, for example to extract the relationship between ‘Disease’ and ‘Procedure’ or ‘Symptom’ and ‘Disease’ etc. from the clinical documents using three different approaches. These three approaches are i) Statistical ii) Shortest Path iii) Shortest Path Using Body System. All three approaches use our in-house existing NLP system to extract entities from the un-structured documents. The Statistical approach applies a probabilistic algorithm on clinical documents, whereas the Shortest Path algorithm uses the Ontological knowledge base for the hierarchical relationship between entities. This Ontological knowledge base is built upon the curated Unified Medical Language System (UMLS). For the Shortest Path Using Body System approach, we have used the domain relationship as well as hierarchical relationship. The output of these approaches is further validated by a domain expert and this validated relationship is used to enrich our ontological knowledge base. We have presented the details of these approaches one-by-one along with the comparative results of these approaches. We finally go through the analysis of the result and conclude on further work.

External IDs:dblp:conf/bibm/GoswamiSSM19