Leveraging deep active learning to annotate the first public dataset for identification of mobility functioning information in clinical text
Confirmation: I have read and agree with the IEEE BHI 2025 conference submission's policy on behalf of myself and my co-authors.
Keywords: functional status information, mobility, clinical notes, n2c2 research datasets, natural language processing
Abstract: Function is increasingly recognized as an important indicator of whole-person health, although it receives little attention in clinical natural language processing research.
We introduce the first public annotated dataset specifically on the Mobility domain of the International Classification of Functioning, Disability and Health (ICF), aiming to facilitate automatic extraction and analysis of functioning information from free-text clinical notes.
We utilize the National NLP Clinical Challenges (n2c2) research dataset to construct a pool of candidate sentences using keyword expansion.
Our active learning approach, using query-by-committee sampling weighted by density representativeness, selects informative sentences for human annotation.
We train BERT and CRF models, and use predictions from these models to guide the selection of new sentences for subsequent annotation iterations.
Our final dataset consists of 4,265 sentences with a total of 11,784 entities.
The inter-annotator agreement (IAA), averaged over all entity types, is 0.72 for exact matching and 0.91 for partial matching.
We train and evaluate common BERT models and state-of-the-art Nested NER models.
The best F1 scores are 0.83 for Action, 0.69 for Mobility, 0.60 for Assistance, and 0.67 for Quantification.
Empirical results demonstrate promising potential of NER models to accurately extract mobility functioning information from clinical text.
The public availability of our annotated dataset will facilitate further research to comprehensively capture functioning information in electronic health records (EHRs).
Track: 4. Clinical Informatics
Registration Id: DDN4DTB5CBP
Submission Number: 81
Loading