Abstract: Evaluation of machine learning (ML) models is critically important for reliable use. Though typically done via unseen data, such validation datasets often need to be large and hard to procure; additionally, mutliple models may perform equally well on such datasets. To address these challenges, we offer GeValdi: a data-efficient method to validate discriminative classifiers by creating samples where such classifiers maximally differ. We demonstrate how such ``maximally different samples'' can be constructed and leveraged to probe the failure modes of classifiers and offer a hierarchically-aware metric to further support fine-grained, comparative model evaluation.
6 Replies
Loading