Abstract: Hallucination in deep learning (DL) classification, where DL models yield confidently erroneous predictions remains a pressing concern. This study investigates whether binary classifiers are truly learning disease-specific features when distinguishing overlapping radiological presentations among pneumonia subtypes on chest X-ray (CXR) images. Specifically, we evaluate if uncertainty measure is a valuable tool in classifying signs of different pathogen-specific subtypes of pneumonia. We evaluated two binary classifiers to classify bacterial pneumonia and viral pneumonia, respectively, from normal CXRs. A third classifier explored the ability to distinguish bacterial from viral pneumonia presentation to highlight our concern regarding the observed hallucinations in the former cases. Our comprehensive analysis computes the Matthews Correlation Coefficient and prediction entropy metrics on a pediatric CXR dataset and reveals that the normal/bacterial and normal/viral classifiers consistently and confidently misclassify the unseen pneumonia subtype to their respective disease class. These findings expose a critical limitation concerning the tendency of binary classifiers to hallucinate by relying on general pneumonia indicators rather than pathogenspecific patterns, thereby challenging their utility in clinical workflows.
External IDs:dblp:conf/cbms/RajaramanLMXA25
Loading