Size Doesn't Matter: Data Efficient Deep Learning Beyond the Big Data Paradigm

Mert Sehri; Efe Çakır; Govind Vashishtha; Sumika Chauhan; Patrick Dumond

Size Doesn't Matter: Data Efficient Deep Learning Beyond the Big Data Paradigm

Mert Sehri, Efe Çakır, Govind Vashishtha, Sumika Chauhan, Patrick Dumond

18 Sept 2025 (modified: 20 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Selective Embedding, Data-Efficient Deep Learning, Medical Imaging, Generalization under Data Scarcity, Cross-Modality Learning

Abstract: Since the emergence of deep learning, machine learning scientists have focused on improving algorithms to achieve higher classification accuracies. This work has consisted of connecting encoders and decoders through attention mechanisms or having multi-layer perceptrons. However, in many data analysis fields, collecting large sets of data is not possible. Therefore, we challenge the “bigger is better” paradigm. We propose a new simple method, known as selective embedding, based entirely on how data is loaded. Existing experiments are highlighted, and new experiments are conducted in four different areas: heavy machinery, railway, manufacturing, and medical imaging. A medical dataset is used as a baseline, containing 65,000 patient data points for various diseases, by reducing the number of patients to demonstrate that dataset size does not matter. In addition, for each area, a dataset was selected to show high performance accuracy using a deep learning algorithm. Our method achieves 87%+ accuracy in all areas when using smaller sets of data for classification tasks without overfitting.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 11955

Loading