Keywords: Incidental Findings Detection, Abdominal CT, Vision-Language Models, Planner- Executor Framework, Clinical Guidelines
Abstract: Incidental findings in CT scans, though often benign, can have significant clinical implications and should be reported according to established guidelines. Traditional manual inspection by radiologists is time-consuming and subject to variability.
This paper proposes a novel framework that leverages large language models (LLMs) and foundational vision–language models (VLMs) within a plan-and-execute agentic architecture to improve the efficiency and precision of incidental-findings detection, classification, and reporting in abdominal CT scans. Given medical guidelines for abdominal organs, the management process is automated through a planner–executor framework. The planner, based on an LLM, generates Python scripts from predefined base functions, while the executor runs these scripts to perform the required detections and evaluations using VLMs, segmentation models, and image-processing subroutines.
We demonstrate the effectiveness of our approach through experiments on a CT-abdominal benchmark covering three organs, in a fully automatic end-to-end setup. Our results show that the proposed framework outperforms existing purely VLM-based approaches in both accuracy and efficiency. Implementation details and code are available at: https://anonymous.4open.science/r/InformCT_public-8A77/README.md.
Primary Subject Area: Foundation Models
Secondary Subject Area: Application: Radiology
Registration Requirement: Yes
Reproducibility: https://anonymous.4open.science/r/InformCT_public-8A77/README.md
Visa & Travel: No
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 209
Loading