Abstract: Cross entropy loss is a very popular optimization objective and has been successfully applied for diverse classification tasks. The discrepancy between cross entropy objective and real classification target is not fully studied because researchers usually think such discrepancy is a must-pay price to have a differentiable objective which can be optimized through gradient based methods. In this paper, we carefully study such discrepancy and find out such discrepancy leads to the side effect that the model output have certain useless growth tendency when the classification result is correct. We call such side effects as "model output blow-up effect". Such effect distracts CE objective from real effective update, which brings the negative influence on the model training. To mitigate such side effect, we introduce a partial normalization layer for regularizing model output to reduce its useless growth tendency. We further provide the theoretical analysis on our finds and our approaches. The experiment results shows that the proposed partial normalization layer improves the model training, and it could be combined with other method like weight decay to achieve big additional performance gain.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: General Machine Learning (ie none of the above)
9 Replies
Loading