CASE: Coupled Adaptive Feature–Target Smoothing with Density-Gated Mixture-of-Experts for Robust Imbalanced Tabular Regression

Seunghun Yu; Jinwoo Park; Hun im; Pilsung Kang

CASE: Coupled Adaptive Feature–Target Smoothing with Density-Gated Mixture-of-Experts for Robust Imbalanced Tabular Regression

Seunghun Yu, Jinwoo Park, Hun im, Pilsung Kang

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Imbalanced Regression, Tabular Data, Mixture of Experts, Adaptive Smoothing, Representation Learning, Density-Aware Learning

TL;DR: We introduce CASE, a framework for imbalanced tabular regression that adaptively smooths sparse data representations and weights specialized experts to achieve state-of-the-art balanced performance across all densities.

Abstract: Although tabular data are central to many real-world applications, their target distributions are often imbalanced, as the majority of samples correspond to a narrow range of values. This imbalance severely degrades performance in sparse, few-shot regions. Prior work on imbalanced regression has typically relied on coarse binning of the target space, discarding fine-grained information, or on spatial-locality assumptions inherited from image domains. We introduce CASE (Coupled Adaptive Feature–Target Smoothing with Density-Gated Mixture-of-Experts), a framework tailored to deep imbalanced tabular regression. CASE combines two complementary mechanisms. (i) Coupled Adaptive Smoothing first identifies “true” neighbors by jointly considering similarities in both the feature and target spaces. Based on these neighbors, it then calibrates representations in sparse regions by scaling the smoothing strength according to each sample’s continuous density. (ii) A Density-Gated Mixture-of-Experts (MoE) weights the contributions of specialized experts via a gate that predicts a density range from the original features. During training, experts take the calibrated features as input; at inference, the learned experts operate on the original features, yielding both higher accuracy and faster inference. Across 40 tabular benchmarks, CASE establishes state-of-the-art performance on balanced test sets by attaining the top average rank. Notably, it demonstrates exceptionally robust by minimizing performance loss on the original imbalanced test sets, consistently delivering balanced predictions that enhance few-shot accuracy without significantly sacrificing many-shot performance.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 11365

Loading