Abstract: Although recent deep learning-based inpainting techniques achieve excellent restoration quality, their stringent requirement for computational resources such as GPU/VRAM render them difficult to use in settings where high-end hardware is not available. We address this gap between research and real-world applications in our submission to the DREAMING challenge, which explores such a use case – diminished reality – where run time and GPU hardware are limited. Specifically, this paper proposes a method that divides a video optimally into sections of variable lengths for the downstream inpainting model. By maximizing the number of frames, i.e., spatio-temporal information, in each subvideo, inference using a SOTA model can often be performed without significant degradation to output quality. In addition, to expedite inference, we examine techniques that intelligently reduce the amount of input pixels, e.g., downsampling and cropping, while maintaining decent inpainting quality.