Random at First, Fast at Last: NTK-Guided Fourier Pre-Processing for Tabular DL

Published: 20 Mar 2025, Last Modified: 27 Jul 2025AAAI 2025EveryoneCC BY 4.0
Abstract: Probability calibration transforms raw output of a classification model into empirically interpretable probability. When the model is purposed to detect rare event and only a small expensive data source has clean labels, it becomes extraordinarily challenging to obtain accurate probability calibration. Utilizing an additional large cheap data source is very helpful, however, such data sources oftentimes suffer from biased labels. To this end, we introduce an approximate expectation-maximization (EM) algorithm to extract useful information from the large data sources. For a family of calibration methods based on the logistic likelihood, we derive closed-form updates and call the resulting iterative algorithm CalEM. We show that CalEM inherits convergence guarantees from the approximate EM algorithm. We test the proposed model in simulation and on the real marketing datasets, where it shows significant performance increases.
Loading