Normalized Conditional Mutual Information Surrogate Loss for Deep Learning Classifiers

Published: 02 Mar 2026, Last Modified: 14 Mar 2026Sci4DL 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: conditional mutual information, classification
Abstract: In this paper, we propose a novel information-theoretic surrogate loss—normalized conditional mutual information (NCMI)—as a drop-in alternative to the de facto cross-entropy (CE) for training deep neural network (DNN)-based classifiers. We first observe that the model’s NCMI is inversely proportional to its accuracy. Building on this insight, we advocate to use NCMI as the surrogate loss for DNN classifier, and propose an alternating algorithm to efficiently minimize the NCMI. Across natural image recognition and whole-slide imaging (WSI) subtyping benchmarks, NCMI-trained models surpass state-of-the-art losses by substantial margins at a computational cost comparable to that of CE. Notably, on ImageNet, NCMI yields a 2.77\% top-1 accuracy improvement with ResNet-50 comparing to the CE; on CAMELYON-17, replacing CE with NCMI improves the macro-F1 by 8.6\% over the strongest baseline. Gains are consistent across various architectures and batch sizes, suggesting that NCMI is a practical and competitive alternative to CE.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Style Files: I have used the style files.
Challenge: This submission is an entry to the science of DL improvement challenge.
Submission Number: 1
Loading