Abstract: Training deep neural networks (DNNs) with large-scale datasets poses considerable challenges due to the computational complexity stemming from the vast number of samples and high-dimensional features. Existing dataset condensation approaches primarily focus on reducing the number of training samples. However, even with condensed datasets, the substantial number of features still requires significant computational resources for training. To bridge this gap, we introduce the problem of “joint condensation”, which addresses both feature and sample reduction in a large-scale dataset simultaneously. To achieve this, we propose TinyData which aligns the gradients of DNN weights trained on the condensed data with those obtained from training on the original data. Extensive experiments on four computer vision benchmarks validate the effectiveness of TinyData. In particular, it reduces sample and feature sizes by 99.8% and 67.4%, respectively, while approaching a remarkable 96.0% of the performance achieved when training on MNIST.
Loading