Keywords: Generative AI, Trustworthy AI
Abstract: Generative AI and large language models (LLMs) have attracted significant attention from both industry and academia for their ability to generate high-quality content across diverse tasks. However, these capabilities raise growing concerns about misuse in sensitive domains such as news, education, software engineering, and medicine. In particular, recent cases suggest that LLMs are being exploited to fabricate radiology reports, potentially enabling insurance fraud. While prior research on detecting machine-generated text has explored domains such as news and scientific writing, there remains a lack of specialization for radiology, leaving a critical gap in reliably identifying AI-generated medical reports. To address this, we introduce text-to-text and image-to-text datasets specifically designed for radiology report generation using multiple LLMs. In addition, we establish a benchmark detection methodology based on disentangling style from content, enabling more effective differentiation between authentic radiology reports and AI-generated fabrications.
Supplementary Material: pdf
Submission Number: 105
Loading