Exploring the Trade-Offs: Unified Large Language Models vs Local Fine-Tuned Models for Highly-Specific Radiology NLI Task

Zihao Wu, Lu Zhang, Chao Cao, Xiaowei Yu, Zhengliang Liu, Lin Zhao, Yiwei Li, Haixing Dai, Chong Ma, Gang Li, Wei Liu, Quanzheng Li, Dinggang Shen, Xiang Li, Dajiang Zhu, Tianming Liu

Published: 2025, Last Modified: 06 Jan 2026IEEE Trans. Big Data 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recently, ChatGPT and GPT-4 have emerged and gained immense global attention due to their unparalleled performance in language processing. Despite demonstrating impressive capability in various open-domain tasks, their adequacy in highly specific fields like radiology remains untested. Radiology presents unique linguistic phenomena distinct from open-domain data due to its specificity and complexity. Assessing the performance of large language models (LLMs) in such specific domains is crucial not only for a thorough evaluation of their overall performance but also for providing valuable insights into future model design directions: whether model design should be generic or domain-specific. To this end, in this study, we evaluate the performance of ChatGPT/GPT-4 on a radiology natural language inference (NLI) task and compare it to other models fine-tuned specifically on task-related data samples. We also conduct a comprehensive investigation on ChatGPT/GPT-4’s reasoning ability by introducing varying levels of inference difficulty. Our results show that 1) ChatGPT and GPT-4 outperform other LLMs in the radiology NLI task and 2) other specifically fine-tuned Bert-based models require significant amounts of data samples to achieve comparable performance to ChatGPT/GPT-4. These findings not only demonstrate the feasibility and promise of constructing a generic model capable of addressing various tasks across different domains, but also highlight several key factors crucial for developing a unified model, particularly in a medical context, paving the way for future artificial general intelligence (AGI) systems. We release our code and data to the research community.

External IDs:dblp:journals/tbd/WuZCYLZLDMLLLSLZL25