Using Ultra-Sound Images and a Multi-Task, Explainable Approach for Thyroid Cancer Detection

Published: 01 Jan 2024, Last Modified: 15 Mar 2025BIBE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The current clinical practice in thyroid nodule malignancy detection and diagnosis consists of ultrasound (US or sonogram) imagery, followed by guided fine needle aspiration (FNA) biopsy if deemed necessary. FNA is performed based on the expert knowledge of trained clinicians, who assess the malignancy risk of the thyroid nodule(s). Thyroid malignancy risk assessment based on ultrasound images rely on experience and heuristics, which cannot be directly and reliably converted into rule-based algorithms. Thus, deep learning-based automated methods for nodule segmentation and risk-assessment are designed as aids for radiologists to enable them to provide timely and successful cancer treatment to patients. Existing AI methods for analyzing thyroid ultrasound imagery are designed to perform either nodule segmentation or output an objective belief of malignancy that drives the decision to perform an FNA. AI solutions that simply output a probability of malignancy suffer from lack of reliability and explainability, and are usually not trusted by clinicians. Radiologists rely on the American College of Radiology TI-RADS system for malignancy risk assessment. TI-RADS requires visual analysis of features such as nodule margin, composition, echogenicity, shape and echogenic foci. AI models need to incorporate this domain knowledge to achieve explainability and earn the trust of clinicians. We aim to take a step in this direction by designing a multi-task deep-learning model that performs three important tasks: thyroid nodule segmentation, sonogram feature detection, and explain the risk of malignancy. We evaluate our technique quantitatively using the Thyroid Ultrasound Cine-Clip Dataset from Stanford for training and testing. On the Stanford dataset, our framework achieves an mIoU of $71 \%$ for segmentation and $\mathbf{6 5 \%}$ accuracy for high/low risk classification. Additionally, we use the Thyroid Digital Image Database (TDID) to test on a different dataset than the training set with a segmentation mIoU of $\mathbf{6 5 \%}$ and a risk classification accuracy of $\mathbf{6 4 \%}$.
Loading