SimSUM - simulated benchmark with structured and unstructured medical records

Paloma Rabaey, Stefan Heytens, Thomas Demeester

Published: 2025, Last Modified: 08 Jan 2026J. Biomed. Semant. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Clinical information extraction, which involves structuring clinical concepts from unstructured medical text, remains a challenging problem that could benefit from the inclusion of tabular background information available in electronic health records. Existing open-source datasets lack explicit links between structured features and clinical concepts in the text, motivating the need for a new research dataset.

External IDs:dblp:journals/biomedsem/RabaeyHD25