Abstract: Demand for full screen in consumer electronics has propelled the development of under-display image processing. For time-of-flight cameras, they are usually placed under TOLED display. Due to the presence of pixels and display patterns in the TOLED panel, depth maps from under-display time-of-flight (UD-ToF) are noisy, blurry, and inaccurate. We propose a non-local method based on Vision Transformer for UD-ToF depth restoration to address these issues. Specifically, a novel feature attention block is designed to incorporate non-local depth features. Additionally, we take ToF raw measurements rather than the depth map as input, allowing the network to extract informative features from the raw domain. We conduct comprehensive experiments on real RUD-TOF and synthetic SUD-TOF benchmark datasets, and the results indicate that the proposed transformer-based method achieves better performance than the state-of-the-art algorithms.
Loading