MiniMax-Remover: Taming Bad Noise Helps Video Object Removal

Bojia Zi; Weixuan Peng; Xianbiao Qi; Jianan Wang; Shihao Zhao; Rong Xiao; Kam-Fai Wong

MiniMax-Remover: Taming Bad Noise Helps Video Object Removal

Bojia Zi, Weixuan Peng, Xianbiao Qi, Jianan Wang, Shihao Zhao, Rong Xiao, Kam-Fai Wong

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Video Object Removal

Abstract: Recent advances in video diffusion models have driven rapid progress in video editing techniques. However, video object removal, a critical subtask of video editing, remains challenging due to issues such as hallucinated objects and visual artifacts. Furthermore, existing methods often rely on computationally expensive sampling procedures and classifier-free guidance (CFG), resulting in slow inference. To address these limitations, we propose **MiniMax-Remover**, a novel two-stage video object removal approach. Motivated by the observation that text condition is not best suited for this task, we simplify the pretrained video generation model by removing textual input and cross-attention layers. In this way, we obtain a more lightweight and efficient model architecture in the first stage. In the second stage, we proposed a minimax optimization strategy to further distill the remover with the successful videos produced by stage-1 model. Specifically, the inner maximization identifies adversarial input noise ("bad noise'') that leads to failure removals, while the outer minimization trains the model to generate high-quality removal results even under such challenging conditions. As a result, our method achieves a state-of-the-art video object removal results using as few as 6 sampling steps without CFG usage. Extensive experiments demonstrate the effectiveness and superiority of MiniMax-Remover compared to existing methods. Codes and Videos are available at: **https://minimax-remover.github.io**.

Supplementary Material: zip

Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)

Submission Number: 4051

Loading