Abstract: Introduction: Constructing an accurate and comprehensive knowledge graph of
specific diseases is critical for practical clinical disease diagnosis and treatment,
reasoning and decision support, rehabilitation, and health management. For
knowledge graph construction tasks (such as named entity recognition, relation
extraction), classical BERT-based methods require a large amount of training
data to ensure model performance. However, real-world medical annotation
data, especially disease-specific annotation samples, are very limited. In addition,
existing models do not perform well in recognizing out-of-distribution entities
and relations that are not seen in the training phase.
Method: In this study, we present a novel and practical pipeline for constructing
a heart failure knowledge graph using large language models and medical expert
refinement. We apply prompt engineering to the three phases of schema design:
schema design, information extraction, and knowledge completion. The best
performance is achieved by designing task-specific prompt templates combined
with the TwoStepChat approach.
Results: Experiments on two datasets show that the TwoStepChat method
outperforms the Vanillia prompt and outperforms the fine-tuned BERT-based
baselines. Moreover, our method saves 65% of the time compared to manual
annotation and is better suited to extract the out-of-distribution information in
the real world.
Loading