Improve Deep Learning Autofocus with Depth Information Supervision and Current Focal Distance Cues

Published: 01 Jan 2024, Last Modified: 14 Apr 2025SMC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Traditional autofocus methods search for the optimal focal distance (FD) by evaluating image quality from focal stacks, resulting in time-consuming focusing processes. Recently, deep learning has being adopted for single-shot autofocus methods, which can predict the optimal FD directly from a single input image. However, these methods often suffer from low prediction accuracy due to the lack of global features and structured global supervisory information, as they rely solely on the image's region of interest (ROI) as input and a single value for supervision. We propose a deep learning network named MPFS (Multi-Head Network with Per-Pixel Focal Distance Supervision), which takes a full-frame photograph as input and uses the optimal focal distance per pixel for supervision, this method effectively addresses the issues of missing global features and insufficient supervisory information by leveraging these enhancements. Additionally, the network integrates current camera focal distance information to mitigate the scale ambiguity caused by the lack of absolute scale information. To validate the effectiveness of the proposed method, we designed an experiment using a dataset annotated with optimal FD per pixel. Experimental results on this dataset indicate that our approach achieves a 0.22 decrease in the Mean Absolute Error (Mae) metric compared to the state-of-the-art models, with improvements of 0.02 and 0.004 in $\boldsymbol{d}_{\mathbf{1}}$ and $\boldsymbol{d}_{\mathbf{2}}$ metrics.
Loading