Polyak Parameter Ensemble: Exponential Parameter Growth Leads to Better Generalization

18 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Polyak Averagning, Ensemble Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Building an ensemble model via prediction averaging often improves the generalization performance over a single model across challenging tasks. Yet, prediction averaging comes with three well-known disadvantages: the computational overhead of training multiple models, increased latency and memory requirements at testing. Here, we propose a remedy for these disadvantages. Our approach (PPE) constructs a parameter ensemble model to improve the generalization performance \emph{with virtually no additional computational cost}. During training, PPE maintains a running weighted average of the model parameters at each epoch interval. Therefore, PPE with uniform weights can be seen as applying the Polyak averaging technique at each epoch interval. We show that a weight per epoch can be dynamically determined via the validation loss or pre-determined in an exponentially increasing fashion. We conducted extensive experiments on 11 benchmark datasets ranging from multi-hop reasoning to image classification task. Overall, results suggest that PPE consistently leads to a more stable training and a better generalization across models and datasets.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1272
Loading