HealthConvos: A physician-annotated dataset of presenting complaints sourced from real, multi-turn patient conversations
Keywords: Multi-turn conversational dataset, dataset release, evaluations and benchmarks
TL;DR: We present HealthConvos, an anonymized dataset of multi-turn physician consultations based on real patient conversations from Counsel Health's asynchronous telehealth platform that utilizes LLM-driven pre-visit intake.
Track: Findings
Abstract: We present HealthConvos, an anonymized dataset of multi-turn physician consultations based on real patient conversations from Counsel Health's asynchronous telehealth platform that utilizes LLM-driven pre-visit intake. In order to satisfy HIPAA Safe Harbor requirements, protected health identifiers were mapped to equivalent surrogates (e.g., patient names replaced by realistic pseudonyms), and thread messages were paraphrased using a few-shot prompted large language model. Each HealthConvos thread has been physician-labeled with clinical outcomes documentation (e.g., SOAP notes) to represent the disposition and management of ongoing patient concerns. Physicians also adjudicated the escalatory risk of threads to highlight threads where patients should seek out more immediate forms of in-person care. By releasing this dataset, researchers can evaluate history-taking chatbots against (1) the most relevant questions physicians asked given existent information within a thread, (2) the SOAP notes generated at particular turns within a conversation, and (3) the escalatory risk of a particular thread.
General Area: Applications and Practice
Specific Subject Areas: Dataset Release & Characterization
Data And Code Availability: Yes
Ethics Board Approval: No
Entered Conflicts: I confirm the above
Anonymity: I confirm the above
Submission Number: 132
Loading