Abstract: When deep neural networks became state of the art image classifiers, numerous max pooling operations were an important component of the architecture. However, modern computer vision networks typically have few, if any, max pooling operations. To understand whether this trend is justified, we develop a mathematical framework analyzing ReLU based approximations of max pooling, and prove a sense in which max pooling cannot be replicated. We formulate and analyze a novel class of optimal approximations, and find that the residual can be made exponentially small in the kernel size, but only with an exponentially wide approximation.
This work gives a theoretical basis for understanding the reduced use of max pooling in newer architectures. It also enables us to establish an empirical observation about natural images: since max pooling does not seem necessary, the inputs on which max pooling is distinct – those with a large difference between the max and other values – are not prevalent.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Thanks for accepting the paper, and sorry it has taken so long to get the camera ready uploaded. It's done now.
Compared to the last version uploaded as part of the review process:
- We have added the discussion of the analytical form of the residual distribution mentioned in my conversation with Reviewer SozV.
- We made numerous small grammatical, spelling, and typesetting fixes that do not change the substance of the paper.
- Made the references format more consistent and useful.
- Added author information, acknowledgements, and the de-anonymized links to code.
Code: https://github.com/idiap/benefits-of-max-pooling
Supplementary Material: zip
Assigned Action Editor: ~Yingnian_Wu1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1275
Loading