Keywords: Learning-hard, Micro‑Learning, NP-complete, data imbalance, small-sample, deep learning
Abstract: ML increasingly faces high complexity nonlinear data whose noise, imbalance, or small sample size thwart conventional models. We formalize this difficulty through the notion of Learning Hard Problems (LH-Ps), tasks that (i) defeat the vast majority of models, yet (ii) admit at least one high‑quality solution if the relevant label-aware structural knowledge is appropriately incorporated during training. To address this, we introduce Micro‑Learning (MiL), a principled framework that constructs traininglets: small, knowledge‑fused subsets of the training data with demonstrably low complexity and infers a deterministic local model for each that collectively form a global predictor. We prove that the decision version of optimal traininglet selection is NP‑complete, establishing a strong theoretical foundation for MiL. MiL dramatically reduces overfitting risk by eliminating irrelevant or noisy samples, while retaining interpretability and reproducibility through deterministic optimization in a RKHS space. Experiments in benchmark domains, from music information retrieval to medical proteomics, show that MiL solves LH-Ps and outperforms deep learning and classical baselines, especially on imbalanced or small-sample datasets, with negligible overfitting. Moreover, our work provides (i) the $1^{st}$ definition of LH-Ps, (ii) a Learning-Hard Index to quantify task difficulty pre-training, and (iii) theoretical guarantees on traininglet optimality and complexity, enriching learning theory and ethical AI.
Supplementary Material: pdf
Primary Area: learning theory
Submission Number: 19284
Loading