A Boon for Evaluating Architecture Performance

Anonymous

Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: We point out several important problems with the common practice of using the best single model performance for comparing Deep Learning architectures, and we propose a method that corrects these flaws. Each time a model is trained, one gets a different result due to random factors in the training process, which include random parameter initialization and random data shuffling. Reporting the best single model performance does not appropriately deal with this stochasticity. Furthermore, the expected best result increases with the number of experiments run, among other problems. We propose a normalized expected best-out-of-n performance (Boo_n) as a way to correct these problems.
  • TL;DR: We point out several important problems with the common practice of using the best single model performance for comparing Deep Learning architectures, and we propose a method that corrects these flaws.
  • Keywords: evaluation, methodology

Loading