Infomod: information-theoretic machine learning model diagnostics

Armin Esmaelizadeh, Sunil Cotterill, Liam Hebert, Lukasz Golab, Kazem Taghva

Published: 2025, Last Modified: 15 Jan 2026Distributed Parallel Databases 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Validating and debugging machine learning models is done by testing them on unseen data. During this process, analyzing model performance on various subsets of the test dataset is critical for fairness, trust, bias detection and explainability. We describe a new way to do this. Our solution, InfoMoD, applies recent work in information-theoretic data summarization to model diagnostics. To improve performance, we implemented InfoMoD in a distributed fashion, using Apache Spark. Based on four use cases ranging from finance to computer vision and hate speech detection, we show that InfoMoD concisely describes how a model performs across different subsets of the data and produces expected performance indicators for individual test instances.

External IDs:dblp:journals/dpd/EsmaelizadehCHGT25