Abstract: Salient object detection (SOD) in light field data presents unique challenges due to dynamic semantic inconsistencies across focal slices and representation heterogeneity between focal slices and the all-focus image. Existing methods often treat focal slices uniformly or rely on simple fusion strategies, which fail to address focus-induced semantic drift and cross-modal feature misalignment. To tackle these issues, we propose LFMamba, a unified network that jointly models dynamic semantic consistency and adaptive cross-modal fusion. We design the Focal-aware State Space Module (FSSM), which generates focal-aware semantic prompts through low-rank decomposition and adaptively routes them according to focal plane indices, thereby enabling bidirectional semantic propagation across slices through non-causal state transitions. Furthermore, we introduce the Focal-guided Cross-modal Fusion Module (FCFM), which mitigates cross-modal heterogeneity by a two-stage hierarchical strategy, combining structure-aware low-level alignment and gated high-level semantic fusion. Extensive experiments on four public light field SOD benchmarks demonstrate that LFMamba achieves superior performance compared to state-of-the-art methods, with improved robustness and consistency under complex focal variation scenarios.
External IDs:doi:10.1145/3746027.3755654
Loading