MKCBlock: Multi Kernel Convolution with Eliminating Dimension Expansion for Real-Time Semantic Segmentation
Abstract: In recent years, the integration of the transformer architecture with convolutional layers in large-kernel convolutional neural networks has showcased remarkable accomplishments in semantic segmentation. However, the inclusion of FeedForward-Network-like (FFN-like) structures within these models results in dimension expansion, significantly amplifying memory consumption and diminishing inference speed. Additionally, prevailing real-time semantic segmentation methodologies predominantly employ smaller convolutional kernels, disregarding the potential advantages of larger kernels. These methods are associated with relatively diminutive network structures, rendering them more susceptible to the influence of feature redundancy. In light of these challenges, we’ve introduced a multi-kernel convolution block (MKCBlock) that amalgamates various convolution types and kernel sizes. This innovative and streamlined approach combines the benefits of larger kernels, circumventing dimension expansion, and mitigating feature redundancy. As a consequence, applying the MKCBlock to DDRNet-23-S on the Cityscapes dataset at 131.6 FPS resulted in a 78.2% mIoU. This indicated a 0.4% improvement over the original DDRNet-23-S, which achieved 77.8% mIoU, with a reduction in inference speed of nearly 8 FPS. Similarly, integrating the MKCBlock into PIDNet-S on the Cityscapes dataset at 99.0 FPS yielded a 79.3% mIoU. This surpassed the original PIDNet-S performance of 78.8% mIoU by 0.5%, with a decrease in inference speed of nearly 3 FPS. Overall, our approach maintains a better balance between inference speed and accuracy.
Loading