Prompt Enhanced Generative MRC Framework for Pancreatic Cancer NER

Published: 01 Jan 2022, Last Modified: 13 Nov 2024BIBM 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Medical Named Entity Recognition (NER) is a fundamental but challenging task due to the lack of specialized entity datasets like tumor entities, which are often overlapped and discontinuous. In this paper, we propose a novel Prompt Enhanced Generative Machine Reading Comprehension Framework (PGMRC) to improve the overlapped and discontinuous NER performance. Specifically, we formulate NER as a Machine Reading Comprehension (MRC) task and employ a pre-trained encoder-decoder module to generate entity span sequences according to their entity query. In this way, we adopt query to guide the model to focus on answer entities in context, which can naturally solve entity overlap and alleviate the exposure bias of the generative model. Then, we introduce continuous prompts to the self-attention mechanism in Transformer to reduce the dependence on manually constructed queries. In addition, we annotate 875 pathological documents of pancreatic cancer and construct a Chinese pathological NER dataset (PAN) containing overlapped and discontinuous entities. Finally, we conduct our experiments on three widely used benchmarks (GENIA, ACE04, ACE05) and our dataset PAN. Experiments have demonstrated its effectiveness and better performance than state-of-the-art methods.
Loading