Transframer: Arbitrary Frame Prediction with Generative Models

Charlie Nash; Joao Carreira; Jacob C Walker; Iain Barr; Andrew Jaegle; Mateusz Malinowski; Peter Battaglia

Transframer: Arbitrary Frame Prediction with Generative Models

Charlie Nash, Joao Carreira, Jacob C Walker, Iain Barr, Andrew Jaegle, Mateusz Malinowski, Peter Battaglia

Published: 05 Apr 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We present a general-purpose framework for image modelling and vision tasks based on probabilistic frame prediction. Our approach unifies a broad range of tasks, from image segmentation, to novel view synthesis and video interpolation. We pair this framework with an architecture we term \modelname, which uses U-Net and Transformer components to condition on annotated context frames, and outputs sequences of sparse, compressed image features. Transframer is the state-of-the-art on a variety of video generation benchmarks, is competitive with the strongest models on few-shot view synthesis, and can generate coherent 30 second videos from a single image without any explicit geometric information. A single generalist Transframer simultaneously produces promising results on 8 tasks, including semantic segmentation, image classification and optical flow prediction with no task-specific architectural components, demonstrating that multi-task computer vision can be tackled using probabilistic image models. Our approach can in principle be applied to a wide range of applications that require learning the conditional structure of annotated image-formatted data.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: * Added explanation of what z_i is. * Added explanation of datasets in Fig. 3. * Clarified contributions. * Clarified our statement on uncertainty.

Video: https://sites.google.com/view/transframer

Assigned Action Editor: ~Yingnian_Wu1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 454

Loading