FEAT-KD: Learning Concise Representations for Single and Multi-Target Regression via TabNet Knowledge Distillation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In this work, we propose a novel approach that combines the strengths of FEAT and TabNet through knowledge distillation (KD), which we term FEAT-KD. FEAT is an intrinsically interpretable machine learning (ML) algorithm that constructs a weighted linear combination of concisely-represented features discovered via genetic programming optimization, which can often be inefficient. FEAT-KD leverages TabNet's deep-learning-based optimization and feature selection mechanisms instead. FEAT-KD finds a weighted linear combination of concisely-represented, symbolic features that are derived from piece-wise distillation of a trained TabNet model. We analyze FEAT-KD on regression tasks from two perspectives: (i) compared to TabNet, FEAT-KD significantly reduces model complexity while retaining competitive predictive performance, effectively converting a black-box deep learning model into a more interpretable white-box representation, (ii) compared to FEAT, our method consistently outperforms in prediction accuracy, produces more compact models, and reduces the complexity of learned symbolic expressions. In addition, we demonstrate that FEAT-KD easily supports multi-target regression, in which the shared features contribute to the interpretability of the system. Our results suggest that FEAT-KD is a promising direction for interpretable ML, bridging the gap between deep learning's predictive power and the intrinsic transparency of symbolic models.
Lay Summary: Many practical problems, such as those in healthcare, benefit from models that balance accuracy with explainability. In this work, we focus on tabular data. For tabular data, deep learning methods like TabNet often achieve strong performance but can be hard to interpret. Symbolic approaches like FEAT produce transparent formulas (e.g., a weighted sum of a few simple feature transformations) but can take a long time to find those formulas. FEAT-KD bridges this gap by first training a TabNet model on the dataset to learn which inputs matter most at each step. For each of TabNet’s internal steps, FEAT-KD uses the selected raw features to search for a concise equation that mimics TabNet’s learned transformation. Once all such distilled features are found, FEAT-KD fits a straightforward linear regression over them to predict the target. In many cases, this yields a model with accuracy close to TabNet’s while being much easier to inspect. Because the same distilled features can be reused, FEAT-KD also handles multi‐target regression naturally. Thus, FEAT-KD aims to combine much of TabNet’s predictive ability with a fully transparent, symbolic representation.
Primary Area: General Machine Learning->Representation Learning
Keywords: FEAT, Knowledge Distillation, Symbolic Regression
Submission Number: 14161
Loading