Abstract: Light field saliency object detection (LF SOD) methods have made significant progress recently. Most of them explore abundant multi-modal information from the all-focus image and the focal stacks at all focal planes to enrich scene details and depth perception. However, in light-field images, the spatial and depth information varies slightly across different slices, raising redundancy within focal stacks. Besides, the noise can appear repeatedly in multiple images of the focal stacks, which brings interference. To address these issues, in this work, we propose VMKNet, an effective approach that leverages innovative variance-maximized key slice selection and interacts with the all-focus image, to improve LF SOD. Specifically, we measure consistency differences between the all-focus image and each focal slice in the salient region as saliency scores. Then, we randomly assemble sets of them, where each score corresponds to a certain slice. The one exhibiting the highest variance is singled out to determine key focal slices as they reveal the diversity of salient objects. Then, the bidirectional guidance module (BGM) is presented to learn attentive features of all-focus and selected key slices in a mutual guidance manner, thus producing enhanced and holistic features. With hierarchical BGMs, our model can progressively aggregate common salient semantics and meaningful contextual details, generating more discriminative representations. Moreover, we introduce the edge enhancement module in conjunction with BGM to improve the sharpness of saliency maps. Extensive experiments on common light field datasets demonstrate that our method, termed VMKNet, outperforms recent state-of-the-art LF, RGB-D, and RGB methods.
External IDs:doi:10.1109/tmm.2025.3586131
Loading