# ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion


## 🫖 Introduction 
In this work, we present **ERTACache**, a principled and efficient caching framework for accelerating diffusion model inference. By decomposing cache-induced degradation into feature shift and step amplification errors, we develop a dual-path correction strategy that combines offline-calibrated reuse scheduling, trajectory-aware timestep adjustment, and closed-form residual rectification. Unlike prior heuristics-based methods, **ERTACache** provides a theoretically grounded yet lightweight solution that significantly reduces redundant computations while maintaining high-fidelity outputs. Empirical results across multiple benchmarks validate its effectiveness and generality, highlighting its potential as a practical solution for efficient generative sampling.

![visualization](./visualize/exp.png)

## 🎉 Supported Models (WIP)

**Text to Video**
- **ERTACache4Wan2.1**
- **ERTACache4CogVideoX-2B**
- **ERTACache4OpenSora1.2**

**Text to Image**
- **ERTACache4FLUX**



## 📈 Inference Comparisons on a Single A800
<table>
  <tr>
    <th>T2V Model</th>
    <th>Method</th>
    <th>LPIPS</th>
    <th>SSIM</th>
    <th>PSNR</th>
    <th>Latency(s)</th>
  </tr>
  <tr>
    <td rowspan="2">OpenSora 1.2</td>
    <td>TeaCache</td>
    <td>0.2511</td>
    <td>0.7477</td>
    <td>19.10</td>
    <td>19.84</td>
  </tr>
  <tr>
    <td>ERTACache</td>
    <td>0.1659</td>
    <td>0.8170</td>
    <td>22.34</td>
    <td>18.04</td>
  </tr>
  <tr>
    <td rowspan="2">CogvideoX-2B</td>
    <td>TeaCache</td>
    <td>0.2057</td>
    <td>0.7614</td>
    <td>20.97</td>
    <td>26.88</td>
  </tr>
  <tr>
    <td>ERTACache</td>
    <td>0.1012</td>
    <td>0.8702</td>
    <td>26.44</td>
    <td>26.78</td>
  </tr>
  <tr>
   <td rowspan="2">Wan2.1-1.3B</td>
    <td>TeaCache</td>
    <td>0.2913</td>
    <td>0.5685</td>
    <td>16.17</td>
    <td>99.5</td>
  </tr>
  <tr>
    <td>ERTACache</td>
    <td>0.1095</td>
    <td>0.8200</td>
    <td>23.77</td>
    <td>91.7</td>
  </tr>
  <tr>
  <td rowspan="2">FLUX-dev 1.0</td>
    <td>TeaCache</td>
    <td>0.4427</td>
    <td>0.7445</td>
    <td>16.47</td>
    <td>14.21</td>
  </tr>
  <tr>
    <td>ERTACache</td>
    <td>0.3029</td>
    <td>0.8962</td>
    <td>20.51</td>
    <td>14.01</td>
  </tr>
  <tr>
  </tr>
</table>




## 💐 Acknowledgement 

This repository is built based on [VideoSys](https://github.com/NUS-HPC-AI-Lab/VideoSys), [Diffusers](https://github.com/huggingface/diffusers), [Open-Sora](https://github.com/hpcaitech/Open-Sora),  [CogVideoX](https://github.com/THUDM/CogVideo), [FLUX](https://github.com/black-forest-labs/flux), [Wan2.1](https://github.com/Wan-Video/Wan2.1), Thanks for their contributions!



