From Promise to Practice: A Study of Common Pitfalls Behind the Generalization Gap in Machine Learning

TMLR Paper3189 Authors

15 Aug 2024 (modified: 11 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The world of Machine Learning (ML) offers great promise, but often there is a noticeable gap between claims made in research papers and the model's practical performance in real-life applications. This gap can often be attributed to systematic errors and pitfalls that occur during the development phase of ML models. This study aims to systematically identify these errors. For this, we break down the ML process into four main stages: data handling, model design, model evaluation, and reporting. Across these stages, we have identified fourteen common pitfalls based on a comprehensive review of around 60 papers discussing either broad challenges or specific pitfalls within ML pipeline. Moreover, Using the Brain Tumor Segmentation (BraTS) dataset, we perform three experiments to illustrate the impacts of these pitfalls, providing examples of how they can skew results and affect outcomes. In addition, we also perform a review to study the frequency of unclear reporting regarding these pitfalls in ML research. The goal of this review was to assess whether authors have adequately addressed these pitfalls in their reports. For this, we review 126 randomly chosen papers on image segmentation from the ICCV (2013-2021) and MICCAI (2013-2022) conferences from the last ten years. The results from this review show a notable oversight of these issues, with many of the papers lacking clarity on how the pitfalls are handled. This highlights an important gap in current reporting practices within the ML community. The codes for the experiments will be published upon acceptance.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Mathurin_Massias1
Submission Number: 3189
Loading