GerMedIQ: A Resource for Simulated and Synthesized Anamnesis Interview Responses in German

Justin Hofenbitzer; Sebastian Schöning; Belle Sebastian; Jacqueline Lammert; Luise Modersohn; Martin Boeker; Diego Frassinelli

GerMedIQ: A Resource for Simulated and Synthesized Anamnesis Interview Responses in German

Justin Hofenbitzer, Sebastian Schöning, Belle Sebastian, Jacqueline Lammert, Luise Modersohn, Martin Boeker, Diego Frassinelli

Published: 22 Jun 2025, Last Modified: 17 Jul 2025ACL-SRW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Clinical NLP, Corpus Linguistics, LLM-as-a-judge, German, Data Augmentation, Synthetic Data, Simulated Data, Anamnesis, Clinical Questionnaires

Abstract: Due to strict privacy regulations, text corpora in non-English clinical contexts are scarce. Consequently, synthetic data generation using Large Language Models (LLMs) emerges as a promising strategy to address this data gap. To evaluate the ability of LLMs in generating synthetic data, we applied them to our novel German Medical Interview Questions Corpus (GerMedIQ), which consists of 4,524 unique, simulated question-response pairs in German. We augmented our corpus by prompting 18 different LLMs to generate responses to the same questions. Structural and semantic evaluations of the generated responses revealed that large-sized language models produced responses comparable to those provided by humans. Additionally, an LLM-as-a-judge study, combined with a human baseline experiment assessing response acceptability, demonstrated that human raters preferred the responses generated by Mistral (124B) over those produced by humans. Nonetheless, our findings indicate that using LLMs for data augmentation in non-English clinical contexts requires caution.

Archival Status: Archival

Acl Copyright Transfer: pdf

Paper Length: Long Paper (up to 8 pages of content)

Submission Number: 322

Loading