Classification of Medical Text in Small and Imbalanced Datasets in a Non-English Language

Vincent Beliveau; Helene Kaas; Martin Prener; Claes Ladefoged; Desmond Elliott; Gitte M. Knudsen; Lars H. Pinborg; Melanie Ganz

Classification of Medical Text in Small and Imbalanced Datasets in a Non-English Language

Vincent Beliveau, Helene Kaas, Martin Prener, Claes Ladefoged, Desmond Elliott, Gitte M. Knudsen, Lars H. Pinborg, Melanie Ganz

Published: 27 Apr 2024, Last Modified: 28 May 2024MIDL 2024 Short PapersEveryoneRevisionsBibTeXCC BY 4.0

Keywords: NLP, radiology reports, clasification

Abstract: Natural language processing (NLP) in the medical domain can underperform in real-case applications involving small datasets in a non-English language with few labeled samples and imbalanced classes. We evaluated a range of state-of-the-art NLP models on datasets representing this situation and found that current approaches are not sufficiently accurate to allow for fully automated classification, but can potentially be used to filter and reduce the amount of manual labeling.

Submission Number: 70

Loading