Exploring semantic information in disease: Simple Data Augmentation Techniques for Chinese Disease Normalization

Wenqian Cui; Xiangling Fu; Shaohui Liu; Xien Liu; Ji Wu

Exploring semantic information in disease: Simple Data Augmentation Techniques for Chinese Disease Normalization

Wenqian Cui, Xiangling Fu, Shaohui Liu, Xien Liu, Ji Wu

Published: 01 Feb 2023, Last Modified: 04 Aug 2025Submitted to ICLR 2023Readers: Everyone

Keywords: Data Augmentation, Medicine, Disease, Disease Normalization, Deep Learning, Natural Language Processing, Representation Learning

TL;DR: A novel data augmentation method in NLP to address the problem of Chinese Disease Normalization.

Abstract: Disease is a core concept in the medical field, and the task of normalizing disease names is the basis of all disease-related tasks. However, due to the multi-axis and multi-grain nature of disease names, incorrect information is often injected and harms the performance when using general text data augmentation techniques. To address the above problem, we propose a set of data augmentation techniques that work together as an augmented training task for disease normalization, which is called Disease Data Augmentation (DDA). Our data augmentation methods are based on both the clinical disease corpus and standard disease corpus derived from ICD-10 coding. Extensive experiments are conducted to show the effectiveness of our proposed methods. The results demonstrate that our method can have up to 3\% performance gain compared to non-augmented counterparts, and they can work even better on smaller datasets.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/exploring-semantic-information-in-disease/code)

11 Replies

Loading