Abstract: Highlights • Health records comprise valuable information written in a spontaneous register. • Matching spontaneous terms in standard ICD terminology is challenging: 7.71% found. • This work presents a system to aid human ICD coders find standard diagnostic terms. • Terminology normalization is tackled based on weighted finite-state transducers. • After the normalization, 94.91% accuracy was achieved. Abstract Background Electronic Health Records (EHRs) are written using spontaneous natural language. Often, terms do not match standard terminology like the one available through the International Classification of Diseases (ICD). Objective Information retrieval and exchange can be improved using standard terminology. Our aim is to render diagnostic terms written in spontaneous language in EHRs into the standard framework provided by the ICD. Methods We tackle diagnostic term normalization employing Weighted Finite-State Transducers (WFSTs). These machines learn how to translate sequences, in the case of our concern, spontaneous representations into standard representations given a set of samples. They are highly flexible and easily adaptable to terminological singularities of each different hospital and practitioner. Besides, we implemented a similarity metric to enhance spontaneous-standard term matching. Results From the 2850 spontaneous DTs randomly selected we found that only 7.71% were written in their standard form matching the ICD. This WFST-based system enabled matching spontaneous ICDs with a Mean Reciprocal Rank of 0.68, which means that, on average, the right ICD code is found between the first and second position among the normalized set of candidates. This guarantees efficient document exchange and, furthermore, information retrieval. Conclusion Medical term normalization was achieved with high performance. We found that direct matching of spontaneous terms using standard lexicons leads to unsatisfactory results while normalized hypothesis generation by means of WFST helped to overcome the gap between spontaneous and standard language.
0 Replies
Loading