A Framework for Fine-Grained Complexity Control in Health Answer Generation

Published: 22 Jun 2025, Last Modified: 17 Jul 2025ACL-SRW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Health literacy, Medical text simplification, Readability assessment, Text complexity, Controlled text generation
TL;DR: A framework for generating health answers at adjustable complexity levels using a specialized medical text formula and a model fine-tuned on 21 readability levels to match diverse health literacy needs.
Abstract: Health literacy plays a critical role in ensuring people can access, understand, and act on medical information. However, much of the health content available today is too complex for many people, and simplifying these texts manually is time-consuming and difficult to do at scale. To overcome this, we developed a new framework for automatically generating health answers at multiple, precisely controlled complexity levels. We began with a thorough analysis of 166 linguistic features, which we then refined into 13 key metrics that reliably differentiate between simple and complex medical texts. From these metrics, we derived a robust complexity scoring formula, combining them with weights learned from a logistic regression model. This formula allowed us to create a large, multi-level dataset of health question-answer pairs covering 21 distinct complexity levels, ranging from elementary patient-friendly explanations to highly technical summaries. Finally, we fine-tuned a Llama-3.1-8B-Instruct model using ``control codes'' on this dataset, giving users precise control over the complexity of the generated text and empowering them to select the level of detail and technicality they need.
Archival Status: Archival
Acl Copyright Transfer: pdf
Paper Length: Long Paper (up to 8 pages of content)
Submission Number: 326
Loading