Abstract: Training deep neural networks to convergence is expensive and time-consuming, especially when exploring new architectures or hardware setups. Prior work has focused on estimating per-iteration cost or total training time assuming a fixed step count, but has largely ignored the critical challenge of predicting how many steps a model will take to converge. We introduce CAPE (Convergence-Aware Prediction Engine), a lightweight, probing-based system that accurately predicts the number of training steps required for convergence without executing full training runs. CAPE probes models at initialization using a small batch of data to extract both structural and dynamical features, including parameter count, gradient norm, NTK trace, dataset size, and learning rate. Using these features, we build a meta-dataset spanning a wide range of model types and train a meta-model to forecast convergence steps. CAPE attains mean absolute errors of 3–9 optimization steps across MLP, CNN, RNN, and Transformer models, consistently surpassing strong baselines. This performance remains stable across a fourfold range in typical convergence horizons (15–60 steps), offering practical value for rapid model selection and budget planning. CAPE offers a practical and generalizable solution for convergence forecasting, supporting faster model selection, efficient scheduling, and resource-aware training.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ruoyu_Sun1
Submission Number: 5848
Loading