Cube Kernel: Enabling Local Gradient Flow Across Channels in CNNs for Robust and Efficient Building Segmentation

Zhimeng He; Yuwei Cai; Ting Han; Meiliu Wu; Brian Barrett

Cube Kernel: Enabling Local Gradient Flow Across Channels in CNNs for Robust and Efficient Building Segmentation

Zhimeng He, Yuwei Cai, Ting Han, Meiliu Wu, Brian Barrett

12 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Cube Kernel, channel-wise convolution, inter-channel gradient flow, convolutional neural networks, building rooftop extraction, semantic segmentation, remote sensing

TL;DR: We propose Cube Kernel, a lightweight plug-and-play convolutional operator that enforces local cross-channel gradient coupling, achieving state-of-the-art building extraction performance with fewer parameters and FLOPs.

Abstract: Understanding inter-band and cross-channel relationships is fundamental to human color perception and object recognition. However, a standard 3×3 convolution kernel provides nine spatial weights and a bias per channel but fuses channel outputs only through a fixed summation. This prevents the operator from learning structured or ratio-like inter-channel cues and limits cross-channel feature coordination. To address this limitation, we develop the Cube Kernel block, a plug-and-play operator that establishes a new computational pathway for local cross-channel coupling. By reconstructing feature channels onto a finer spatial lattice, Cube Kernel enables a single convolution to jointly process and flexibly learn from mixed cross-channel neighborhoods. A learnable Channel Router further adapts channel ordering, while a lightweight spatial attention mask suppresses reconstruction-induced noise. Across CNN-based and Transformer-based backbones, Cube Kernel delivers consistent gains on the WBD, WHU, and Inria datasets. For example, ConvNeXt-U-Cube achieves 90.42\% F1 and 82.63\% IoU on Inria while reducing parameters and FLOPs by 9.2\% and 20.8\%, respectively. Ablation studies isolate the contributions of reconstruction, routing, and attention, and gradient analyses reveal substantially stronger inter-channel decorrelation. Owing to its lightweight design, architectural compatibility, and ability to be stacked across layers, Cube Kernel is highly implantable and provides a strong default operator for structured channel mixing in dense prediction tasks.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 4549

Loading