A Comparative Study for Contextualized Spoken Answer Classification in German Medical Questionnaires

Philipp L. Harnisch, Daniel Schuhmann, Stefan Hillmann

Published: 2024, Last Modified: 19 Feb 2025SPECOM (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents a study aimed at enhancing the classification accuracy of patients’ spoken answer selections to German medical Patient-Reported Outcome Measures (PROM) questionnaires within a multimodal dialog system. We collected 1,737 speech data samples for training and evaluation through a lab experiment, employing textual priming as opposed to the visual priming utilized in prior research. For classification, we compare results from utilizing sentence embeddings against results from prompting various Large Language Models. We conduct a comparative analysis of approaches in terms of prediction performance, efficiency, hardware restraints, budget, inference time, and data privacy. Further, we investigate if adding the survey item text as context improves the classification. Results show the highest accuracy for gpt-4 prompting, and indicate that including the questionnaire item text alongside user utterances is beneficial for LLM prompting. Additionally, we find significant positive correlations between accuracy and certain prompt characteristics.