ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering

Lufei Liu; Tor M. Aamodt

ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering

Lufei Liu, Tor M. Aamodt

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Caching layer outputs in network inferences for graphics rendering.

Abstract: Graphics rendering applications increasingly leverage neural networks in tasks such as denoising, supersampling, and frame extrapolation to improve image quality while maintaining frame rates. The temporal coherence inherent in these tasks presents an opportunity to reuse intermediate results from previous frames and avoid redundant computations. Recent work has shown that caching intermediate features to be reused in subsequent inferences is an effective method to reduce latency in diffusion models. We extend this idea to real-time rendering and present ReFrame, which explores different caching policies to optimize trade-offs between quality and performance in rendering workloads. ReFrame can be applied to a variety of encoder-decoder style networks commonly found in rendering pipelines. Experimental results show that we achieve 1.4$\times$ speedup on average with negligible quality loss in three real-time rendering tasks. Code available: https://ubc-aamodt-group.github.io/reframe-layer-caching/

Lay Summary: Realistic visuals make video games and virtual reality feel more immersive and exciting, but creating these images can be slow and power-intensive. In fact, the better the image, the longer it takes. While animated movies can spend hours producing each frame, interactive experiences need to respond instantly to user input to feel smooth and believable. Our research aims to make rendering faster, so we can save power and improve image quality without slowing down the system. We notice that many frames displayed back-to-back look very similar, which inspired us to introduce a mechanism that only partially updates the neural networks involved in creating each frame. We strategically save parts of the neural network and reuse them for as long as possible before they start noticeably compromising the image. Our technique accelerates the neural networks behind the visuals, cutting energy use and making it easier for less powerful hardware to keep up without sacrificing quality.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/ubc-aamodt-group/reframe-layer-caching

Primary Area: Applications->Computer Vision

Keywords: Computer Graphics, Neural Network Inference, Training-Free Optimization, Feature Caching, Inference Acceleration

Submission Number: 9591

Loading