Temporally Consistent Smoke Removal from Endoscopic Video Images

Silja Janßen, Mohamed Oumeslakht, Kevin Köser

Published: 01 Jan 2026, Last Modified: 26 May 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: The use of electrocautery or laser ablation in endoscopic surgeries produces smoke which obscures the view of the surgeon. Smoke removal methods aim to remove smoke from affected images and provide corresponding clear views. However, methods developed so far generally apply to single-image inputs and do not take the temporal consistency of video data into account, which can lead to flickering artefacts. To cope with this effect, we propose to process multiple subsequent video frames at the same time, providing more information to the system. For our first results into this direction, we implemented a 3D U-Net architecture to process sequential video data with time acting as the third dimension. We further created novel video datasets from surgical recordings and synthetic smoke overlays to train the model on, and quantitatively compared its performance to a baseline 2D U-Net that processes each frame separately. Results show that our proposed model is able to recover structures from smoky images and generate a clear output with higher SSIM values compared to the baseline, though PNSR is slightly better for the 2D approach. However, when utilizing optical flow and warping error to compare subsequent output video frames, we can show that the 3D approach significantly increases the temporal consistency.

External IDs:doi:10.1007/978-3-031-98691-8_21