Model validation using mutated training labels: An exploratory study

Jie M. Zhang, Mark Harman, Benjamin Guedj, Earl T. Barr, John Shawe-Taylor

Published: 2023, Last Modified: 25 Jan 2026Neurocomputing 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We introduce an exploratory study on Mutation Validation (MV), a model validation method using mutated training labels for supervised learning. MV mutates training data labels, retrains the model against the mutated data, and then uses the metamorphic relation that captures the consequent training performance changes to assess model fit. It does not use a validation set or test set. The intuition underpinning MV is that overfitting models tend to fit noise in the training data.MV does not aim to replace out-of-sample validation. Instead, we provide the first exploratory study on the possibility of using MV as a complement of out-of-sample validation. We explore 8 different learning algorithms, 18 datasets, and 5 types of hyperparameter tuning tasks. Our results demonstrate that MV complements well cross-validation and test accuracy in model selection and hyperparameter tuning tasks. MV deserves more attention from developers when simplicity, sustainaiblity, security (e.g., defending training data attack), and interpretability of the built models are required.