Keywords: Text-to-Video, Concept Erasure, Concept Inversion, Erasure Robustness
Abstract: Text-to-video (T2V) diffusion models have achieved remarkable progress in generating temporally coherent, high-quality videos. However, their ability to generate sensitive or undesired concepts has raised concerns, motivating the development of concept erasure techniques that aim to suppress specific semantics while preserving general generation quality. Despite rapid progress in text-to-image (T2I) concept erasure, the effectiveness of these methods in T2V settings remains largely unquantified. In this work, we introduce PROBE, a systematic framework to measure the residual capacity of erased T2V models to represent and regenerate a target concept. PROBE learns a compact token embedding by jointly optimizing across all frames and timesteps, augmented with a latent alignment loss that anchors the recovered representation to the spatiotemporal structure of the original concept. The resulting embedding serves as a reusable probe that enables reproducible, large-scale robustness evaluation across different erasure methods and models. Experiments on multiple T2V architectures demonstrate that PROBE reveals substantial residual concept capacity even after erasure, providing new insights into the limitations of existing techniques and establishing a principled benchmark for future research on safe video generation.
Primary Area: generative models
Submission Number: 11572
Loading