Keywords: Selective Embedding, Data-Efficient Deep Learning, Medical Imaging, Generalization under Data Scarcity, Cross-Modality Learning
Abstract: Since the emergence of deep learning, machine learning scientists have focused
on improving algorithms to achieve higher classification accuracies. This work
has consisted of connecting encoders and decoders through attention mechanisms
or having multi-layer perceptrons. However, in many data analysis fields, collecting
large sets of data is not possible. Therefore, we challenge the “bigger is
better” paradigm. We propose a new simple method, known as selective embedding,
based entirely on how data is loaded. Existing experiments are highlighted,
and new experiments are conducted in four different areas: heavy machinery, railway,
manufacturing, and medical imaging. A medical dataset is used as a baseline,
containing 65,000 patient data points for various diseases, by reducing the number
of patients to demonstrate that dataset size does not matter. In addition, for
each area, a dataset was selected to show high performance accuracy using a deep
learning algorithm. Our method achieves 87%+ accuracy in all areas when using
smaller sets of data for classification tasks without overfitting.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 11955
Loading