Keywords: Model performance degradation, autonomous adaptation, large language models
TL;DR: Self-healing machine learning is a framework which diagnoses the reason for model degradation and proposes diagnosis-based corrective actions
Abstract: Real-world machine learning systems often encounter model performance degradation due to distributional shifts in the underlying data generating process (DGP). Existing approaches to addressing shifts, such as concept drift adaptation, are limited by their *reason-agnostic* nature. By choosing from a pre-defined set of actions, such methods implicitly assume that the causes of model degradation are irrelevant to what actions should be taken, limiting their ability to select appropriate adaptations. In this paper, we propose an alternative paradigm to overcome these limitations, called *self-healing machine learning* (SHML). Contrary to previous approaches, SHML autonomously diagnoses the reason for degradation and proposes diagnosis-based corrective actions. We formalize SHML as an optimization problem over a space of adaptation actions to minimize the expected risk under the shifted DGP. We introduce a theoretical framework for self-healing systems and build an agentic self-healing solution *$\mathcal{H}$-LLM* which uses large language models to perform self-diagnosis by reasoning about the structure underlying the DGP, and self-adaptation by proposing and evaluating corrective actions. Empirically, we analyze different components of *$\mathcal{H}$-LLM* to understand *why* and *when* it works, demonstrating the potential of self-healing ML.
Supplementary Material: zip
Primary Area: Other (please use sparingly, only use the keyword field for more details)
Submission Number: 18106
Loading