DiWTBR: Dilated Wavelet Transformer for Efficient Megapixel Bokeh Rendering

Xiaoshi Qiu, Shiyue Yan, Qingmin Liao, Shaojun Liu

Published: 01 Jan 2025, Last Modified: 26 Jan 2026IEEE Transactions on Image ProcessingEveryoneRevisionsCC BY-SA 4.0

Abstract: Bokeh is widely used in photography and is traditionally achieved with large-aperture cameras. Bokeh rendering from pictures taken with small-aperture cameras has attracted much attention due to its system simplicity. Most of the existing methods employ Convolutional Neural Networks and often mistakenly blur the foreground due to the limited receptive field. In contrast, Transformers can easily capture long-range dependencies. Therefore, it is more suitable for this problem. However, Transformers suffer from a high computation burden, especially for high-resolution images. In this paper, we propose a Dilated Wavelet Transformer model for Bokeh Rendering (DiWTBR) from a single small-aperture image with megapixels. It employs both window attention and dilated attention schemes, introducing both local and global spatial interactions at a low computation cost. Moreover, to further improve the efficiency, we employ the wavelet transform in the attention block. Experimental results demonstrate that DiWTBR outperforms the state-of-the-art methods by up to 0.7dB in PSNR. Last but not least, our model can be readily implemented on mainstream personal computers and laptops, with only 4G GPU memory consumption. The code will be available on GitHub upon acceptance.

External IDs:doi:10.1109/tip.2025.3632227