stable-pretraining: Foundation Model Research Made Simple

Randall Balestriero; Hugues Van Assel; Sami BuGhanem; Lucas Maes

stable-pretraining: Foundation Model Research Made Simple

Randall Balestriero, Hugues Van Assel, Sami BuGhanem, Lucas Maes

Published: 23 Sept 2025, Last Modified: 20 Nov 2025UniReps2025EveryoneRevisionsBibTeXCC BY 4.0

Track: Extended Abstract Track

Keywords: foundation models, representation learning, library

TL;DR: stable-pretraining is a modular library built on PyTorch/Lightning that unifies probes, metrics, and evaluation with full logging, reducing engineering overhead and enabling scalable, reproducible foundation model research.

Abstract: Foundation models and self-supervised learning (SSL) have become central to modern AI, yet research in this area remains hindered by complex codebases, redundant re-implementations, and the heavy engineering burden of scaling experiments. We present stable-pretraining, a modular, extensible, and performance-optimized library built on top of PyTorch, Lightning, Hugging Face, and TorchMetrics. Unlike prior toolkits focused narrowly on reproducing state-of-the-art results, stable-pretraining is designed for flexibility and iteration speed: it unifies essential SSL utilities—including probes, collapse detection metrics, augmentation pipelines, and extensible evaluation routines—within a coherent and reliable framework. A central design principle is logging everything, enabling fine-grained visibility into training dynamics that makes debugging, monitoring, and reproducibility seamless. We validate the library by demonstrating its ability to generate new research insights with minimal overhead, including depth-wise representation probing and the analysis of CLIP degradation under synthetic data finetuning. By lowering barriers to entry while remaining scalable to large experiments, stable-pretraining aims to accelerate discovery and expand the possibilities of foundation model research.

Submission Number: 100

Loading