D2 FNet: Degradation-aware Diffusion Feature Guided Network against Physical-World Attack for Monocular Depth Estimation

Published: 01 Jan 2026, Last Modified: 14 May 2026IEEE Transactions on Consumer ElectronicsEveryoneRevisionsCC BY-SA 4.0
Abstract: In consumer healthcare and intelligent consumer electronics, robust monocular depth estimation is essential for applications such as portable endoscopic devices, AR/VR-assisted diagnosis, home-use health monitoring systems, and smart wearable imaging platforms. However, low-cost sensors and compact optical systems often introduce motion blur, uneven illumination, compression artifacts, and sensor noise. Beyond these degradations, AI-enabled physical-world attacks—such as lighting perturbations, surface reflections, or optical interference—can mislead perception systems, posing serious reliability and safety risks in endoscopic scenarios. These factors jointly degrade performance and hinder the deployment of AI-enabled perception in real-world devices. To address this challenge, we propose D2FNet, a degradation-aware diffusion feature guided network explicitly designed to enhance robustness against both imaging degradation and physical-world attacks. D2FNet integrates a Diffusion Feature Generation Module (DFGM) to synthesize degradation-invariant features, a Style-free Content Reconstruction Block (SCRB) to suppress style-related distortions, and a Frequency Refined Adaptive Fusion Block (FRAF) to adaptively fuse spatial and frequency information. Extensive experiments on multiple endoscopic datasets demonstrate that D2FNet achieves superior robustness, generalization, and attack resilience compared to existing approaches. This work provides a scalable and secure AI perception framework for consumer healthcare electronics, enabling reliable monocular depth estimation under real-world adversarial imaging conditions.
Loading