Abstract: The U-Net is one of the most fundamental architectural advancements in the deep learning era. It is a crucial tool for image segmentation, especially for biomedical modalities. The research community seems to interpret the effectiveness of neural architectural search (such as the nn-U-Net) as evidence that architectural enhancements proposed since its debut are mostly unnecessary. We argue that there are still network-innetwork primitives that can be leveraged to further enhance its performance, focusing on the squeeze-and-excitation (SE) pathway specifically in this paper. Specifically, we study its use of global descriptors, since it should be at odds with the spatial resolution required for dense-prediction tasks. It is theorized in the literature that performance is probably gained from some implicit ability of the learned excitations to filter supposedly uninformative channels during training. We explain this almost unreasonable success through an analysis of the empirical estimates of the excitation covariance matrix. Our analysis also directly contradicts the above conjecture – the most effective SE approach actually displayed the less extreme filtering behaviour, weighing all channels much closer to the mean 0.5. Our experiments are conducted in three diverse, staple biomedical modalities: dermoscopy, colonoscopy, and ultrasound.
External IDs:dblp:conf/eusipco/MartinsCR25
Loading