Towards Semantic Consistency Data Augmentation for Bio-Relation Extraction via Biomedical Notion Infusion
Abstract: Biomedical Relation Extraction (Bio-RE) aims to recognize and classify the potential relations between various molecules and biomolecules. The main obstacle in Bio-RE is the scarcity of annotations especially in low-resource relation labels, thus the models cannot fully understand the connection between chemicals and diseases or drug-drug interactions. Existing works usually adopted data augmentation approaches to generate pseudo-annotated instances to alleviate the scarcity of annotations. However, the generated sentences largely ignore the semantic consistency of the biomedical domain and the logical coherence between biomolecules and diseases, causing a fatal phenomenon that the generated sentences introduce counterfactual information when learning the interactions between the drugs or diseases. To this end, this paper proposes a bio-notion-dedicated data augmentation approach that is able to measure intersections between biomedical relation notions and tokens of each instance to generate augmented data with semantic consistency. Experimental results demonstrate that our proposed method could bring 5.61% F1 improvement over SoTA baseline methods on three benchmark Bio-RE datasets in terms of BLURB.
Paper Type: long
Research Area: Information Extraction
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
0 Replies
Loading