Abstract: Light field super-resolution (LFSR) faces the critical challenge of balancing computational efficiency with high quality reconstruction while maintaining spatial and angular consistency. Existing methods suffer from architectural trade-offs between the effectiveness of Transformers and the efficiency of CNNs, coupled with insufficient frequency modelling leading to blurred textures. We propose LFMix, a hybrid CNN-Transformer architecture that processes three complementary light field representations: sub-aperture images (SAIs), macro-pixel images (MacPIs), and epipolar plane images (EPIs). Our core innovation lies in the MixBlock design, which combines spatial convolution, spectral self-attention and dedicated high-frequency enhancement through learnable filtering. The architecture achieves computational efficiency through strategic downsampling in spectral processing while preserving full resolution spatial detail. A dual-domain loss function combining pixel and frequency constraints further enhances high-frequency detail preservation. Extensive experiments on five benchmarks show that our method achieves the best performance among models with comparable computational cost, reaching 32.99 dB PSNR with only 0.98M parameters and 19.48 GFLOPs.
External IDs:dblp:conf/cvpr/YuWH25
Loading