Abstract: This study introduces a hybrid deep learning framework for turbulence mitigation (HATM) in videos, integrating a transformer-based followed by CNN-based attention modules. Due to the computational demands associated with transformers, we propose a simple technique within the transformer module to enhance computational efficiency. Additionally, to better exploit spatial and channel information, we introduce a CNN-attention module which captures global and local inter- and intra-frame dependencies. The overall structure of the model follows U-net, while the skip connections are replaced by our attention blocks to further explore local, spatial, and temporal dependencies. Our model is trained on a simulated turbulence dataset and evaluated on both simulated and real-world datasets to gauge its generalization performance. The effectiveness of each component within our model is also evaluated through ablation studies. Experimental outputs show that our model improves PSNR and SSIM scores, and notably enhances the reconstruction of text images, making the restored text images more readable and cleaner. Overall, our HATM framework represents an advancement towards addressing turbulence distortion in video sequences, showcasing improvements both qualitatively and quantitatively, and offering promising solutions for various applications requiring enhanced video content restoration and mitigation of turbulence-induced artifacts.
Loading