A Practitioner's Guide to Continual Multimodal Pretraining

Published: 10 Oct 2024, Last Modified: 19 Nov 2024Continual FoMo OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Continual Pretraining, Multimodal, Lifelong Learning
TL;DR: We introduce a new extensive benchmark to study continual multimodal pretraining, and provide extensive experimental insights into best practices and pipeline design choices.
Abstract: Multimodal foundation models, despite being extensively pretrained, become outdated over time. Research into continual pretraining mainly explores (1) infrequent, indiscriminate updates on large-scale new data, or (2) frequent, sample-level updates. However, practical model deployment often operates in the gap between these limit cases, as real-world applications require continual adaptation to specific subdomains or tasks. In this work, we complement current approaches through a new, continual multimodal pretraining test bed with realistic compute constraints and practical deployment requirements (\texttt{FoMo-in-Flux}), and provide \textit{comprehensive practical guidance} for effective continual model updates---investigating different method choices, pipeline design and data-centric deployment scenarios.
Submission Number: 18
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview