Beyond the Threshold: Time Is All You Need

Published: 12 Jul 2024, Last Modified: 09 Aug 2024AutoML 2024 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Neural Architecture Search, Benchmarks, Statistical testing
TL;DR: This paper advocates for rigorous statistical practices in NAS benchmarking, emphasizing the need for large sample sizes and extended time budgets to ensure reliable and valid algorithm performance evaluations.
Abstract: This study examines common misconceptions and suboptimal use of tabular benchmarks for neural architecture search (NAS). We address statistical limitations in performance evaluation, emphasizing adequate sample sizes and proper statistical tests, such as the two-sample t-test, to ensure reliable results. We propose a new guideline of averaging at least 1000 runs for reliably benchmarking NAS algorithms. Additionally, we explore the impact of time constraints on algorithm performance, showing that final algorithm performance highly depends on the time budget. Our findings highlight the need for robust experimental designs and extended time budgets in NAS research.
Submission Checklist: Yes
Broader Impact Statement: Yes
Paper Availability And License: Yes
Code Of Conduct: Yes
Optional Meta-Data For Green-AutoML: All questions below on environmental impact are optional.
Submission Number: 9
Loading