Constructing Surrogate Models in Machine Learning Using Combinatorial Testing and Active Learning

Sunny Shree, Krishna Khadka, Yu Lei, Raghu N. Kacker, D. Richard Kuhn

Published: 01 Jan 2024, Last Modified: 06 Feb 2025ASE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Machine learning (ML)-based models are often black box, making it challenging to understand and interpret their decision-making processes. Surrogate models are constructed to approximate the behavior of a target model and are an essential tool for analyzing black-box models. The construction of a surrogate model typically includes querying the target model with carefully selected data points and using the responses from the target model to infer information about its structure and parameters.In this paper, we propose an approach to surrogate model construction using combinatorial testing and active learning, aiming to efficiently capture the essential interactions between features that drive the target model's predictions. Our approach first leverages t-way testing to generate data points that capture all the t-way feature interactions. We then use an iterative process to isolate the essential feature interactions, i.e., those that can determine a model prediction. In the iterative process, we remove nonessential feature interactions, generate additional data points to contain the remaining interactions, and employ active learning techniques to select a subset of the data points to update the surrogate model. This process is continued until we construct a surrogate model that closely mirrors the target model's behavior. We evaluate our approach on 4 public datasets and 12 ML models and compare the results with the state-of-the-art (SOTA) approaches. Our experimental results show that our approach can perform in most cases better than the SOTA approaches in terms of accuracy and efficiency.