DRIFT: DATA REDUCTION VIA INFORMATIVE FEATURE TRANSFORMATION – GENERALIZATION BEGINS BEFORE DEEP LEARNING STARTS

01 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Feature representation, Neural network, Dimensionality reduction, Generalization gap.
Abstract: Despite the remarkable optimization power of modern deep neural networks, robust generalization remains critically dependent on the quality of input representations. High-dimensional pixel data is plagued by noise, redundancy, and spurious correlations that hinder stable learning and widen the train-test generalization gap. We introduce DRIFT (Data Reduction via Informative Feature Transformation), a lightweight, physics-informed preprocessing method that reinterprets images as static displacement fields of a thin elastic plate under simply supported boundary conditions. By projecting each image onto the analytically derived orthogonal basis of vibrational mode shapes, low-frequency sinusoidal patterns governed by the biharmonic equation, DRIFT yields compact, interpretable, and intrinsically smooth features that emphasize energetically dominant spatial deformations while suppressing high-frequency noise. Extensive experiments on MNIST, CIFAR100, and CelebA demonstrate that DRIFT enables classifiers to achieve equal or superior test accuracy compared to raw pixels, PCA, DCT, and convolutional autoencoders, while using dramatically fewer features. DRIFT consistently exhibits smaller generalization gaps, smoother training trajectories, and markedly reduced sensitivity to noise perturbations. These gains arise from the physical prior of smoothness and boundary compatibility, which imposes an explicit inductive bias toward generalizable, low-energy image structure. To our knowledge, DRIFT is the first method to successfully leverage classical vibration mode analysis for machine learning feature extraction, opening a principled, data-efficient avenue for physics-informed representation learning
Supplementary Material: pdf
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 543
Loading