Complex Wavelet Mutual Information Loss: A Multi-Scale Loss Function for Semantic Segmentation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose a novel loss function, CWMI, that enhances image segmentation by combining multi-scale wavelet analysis with mutual information, achieving better edge precision and robustness without extra computation.
Abstract: Recent advancements in deep neural networks have significantly enhanced the performance of semantic segmentation. However, class imbalance and instance imbalance remain persistent challenges, where smaller instances and thin boundaries are often overshadowed by larger structures. To address the multiscale nature of segmented objects, various models have incorporated mechanisms such as spatial attention and feature pyramid networks. Despite these advancements, most loss functions are still primarily pixel-wise, while regional and boundary-focused loss functions often incur high computational costs or are restricted to small-scale regions. To address this limitation, we propose the complex wavelet mutual information (CWMI) loss, a novel loss function that leverages mutual information from subband images decomposed by a complex steerable pyramid. The complex steerable pyramid captures features across multiple orientations and preserves structural similarity across scales. Meanwhile, mutual information is well-suited to capturing high-dimensional directional features and offers greater noise robustness. Extensive experiments on diverse segmentation datasets demonstrate that CWMI loss achieves significant improvements in both pixel-wise accuracy and topological metrics compared to state-of-the-art methods, while introducing minimal computational overhead. Our code is available at https://github.com/lurenhaothu/CWMI
Lay Summary: Deep learning has transformed how computers understand images, with applications ranging from medical diagnosis to self-driving cars. A key task in this field is semantic segmentation—teaching a computer to label every pixel in an image, like identifying the boundaries of organs in a medical scan or roads in satellite images. Our research introduces a new mathematical tool called the Complex Wavelet Mutual Information (CWMI) loss, which helps train deep learning models to more accurately and robustly perform this task. CWMI works by comparing images at multiple levels of detail, similar to how a person might zoom in and out to understand both the big picture and fine details. It also uses a concept from information theory—mutual information—to measure how well the model’s predictions capture the meaningful parts of the image. We tested our method on several benchmark datasets and found that it improves both accuracy and reliability, especially around tricky areas like edges and thin structures. Importantly, it achieves this without adding significant computational cost. This work shows that combining ideas from signal processing and information theory can lead to smarter, more effective AI systems that better understand the visual world.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/lurenhaothu/CWMI
Primary Area: Applications->Computer Vision
Keywords: Semantic segmentation, wavelet transform, steerable pyramid, mutual information
Flagged For Ethics Review: true
Submission Number: 15833
Loading