Clinical text classification under the open and closed topic assumptions

Yutaka Sasaki, Brian Rea, Sophia Ananiadou

Published: 01 Jan 2009, Last Modified: 08 Feb 2026International Journal of Data Mining and BioinformaticsEveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper investigates multi-topic aspects in automatic classification of clinical free text in comparison with general text. In this paper, we facilitate two different views on multi-topics: the Closed Topic Assumption (CTA) and the Open Topic Assumption (OTA). Experimental results show that the characteristics of multi-topic assignments in the Computational Medicine Centre (CMC) Medical NLP Challenge Data is strongly OTA-oriented but general text Reuters-21578 is characterised in the middle of the OTA and CTA spectrum. Copyright © 2009 Inderscience Enterprises Ltd.

External IDs:doi:10.1504/ijdmb.2009.026703