A Reproducible Framework for Bias-Resistant Machine Learning on Small-Sample Neuroimaging Data

Jagan Mohan Reddy Dwarampudi, Jennifer L. Purks, Joshua Wong, Renjie Hu, Tania Banerjee

Published: 2026, Last Modified: 06 May 2026CoRR 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We introduce a reproducible, bias-resistant machine learning framework that integrates domain-informed feature engineering, nested cross-validation, and calibrated decision-threshold optimization for small-sample neuroimaging data. Conventional cross-validation frameworks that reuse the same folds for both model selection and performance estimation yield optimistically biased results, limiting reproducibility and generalization. Demonstrated on a high-dimensional structural MRI dataset of deep brain stimulation cognitive outcomes, the framework achieved a nested-CV balanced accuracy of 0.660\,$\pm$\,0.068 using a compact, interpretable subset selected via importance-guided ranking. By combining interpretability and unbiased evaluation, this work provides a generalizable computational blueprint for reliable machine learning in data-limited biomedical domains.

External IDs:dblp:journals/corr/abs-2602-02920