Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models often struggle with length generalization and solving complex problem instances beyond their training distribution. We present a self-improvement approach where models iteratively generate and learn from their own solutions, progressively tackling harder problems while maintaining a standard transformer architecture. Across diverse tasks including arithmetic, string manipulation, and maze solving, our method enables models to solve problems far beyond their initial training distribution—for instance, generalizing from 10-digit to 100-digit addition without apparent saturation. We observe that filtering for correct self-generated examples leads to exponential improvements in out-of-distribution performance across training rounds. Additionally, starting from pretrained models significantly accelerates this self-improvement process for several tasks. Our results demonstrate how controlled weak-to-strong curricula can systematically expand model capabilities while preserving architectural simplicity.
Lay Summary: Large language models excel at tasks they were trained on, but often struggle with harder or longer ones. We explore whether a model can improve by learning from its own outputs—and find that it can. Starting with easy problems, the model gradually solves harder and longer tasks by training on its own generated data, without any changes to its model architecture. We demonstrate this on structured tasks like arithmetic, string copying, and mazes. For example, the model progresses from 10-digit to 100-digit addition through self-improvement. Simple filtering methods, like length thresholds and majority voting, ensure data quality. We further show that pre-trained models can learn faster with this method.
Link To Code: https://github.com/JackCai1206/arithmetic-self-improve
Primary Area: General Machine Learning
Keywords: self-improvement, length generalization, self-training
Submission Number: 12828
Loading