Keywords: hyperparameter optimization, reinforcement learning, learned controllers, GRPO, PPO, schedule-free optimizers, transformer pretraining, meta-optimization
TL;DR: A small GRPO-trained controller adjusts learning rate and weight decay throughout training, beating schedule-free baselines and approaching hand-tuned performance with amortized cost.
Abstract: Hyperparameter optimization remains a persistent bottleneck in deep learning, requiring expensive sweeps for each new model or dataset. We propose JEL (Just Enough Learning), a lightweight learned controller that adjusts optimizer hyperparameters throughout training. JEL treats the training process as an episodic reinforcement learning problem: at fixed decision intervals, a compact policy network observes training progress and outputs multiplicative corrections to learning rate and weight decay applied on top of a strong base optimizer. We train the controller using a modified group-relative policy optimization (GRPO) objective that removes length and per-group variance normalizations to avoid biasing the learning signal. On transformer pretraining tasks, JEL improves validation performance by 2.5% over schedule-free optimizers at equivalent computational cost, requiring controller training equivalent to only 5.6 training runs, a one-time cost amortized across deployments. JEL achieves performance within 8% of an upper bound from extensive manual experimentation, while already costing less than traditional 6-8 run hyperparameter sweeps, with savings compounding on each subsequent task. Our results demonstrate that a simple learned controller can effectively replace costly hyperparameter searches while maintaining competitive performance.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 132
Loading