Abstract: In this paper, we introduce a light-weight and powerful convolutional neural network, termed as <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">efficient feature reconstructing network</i> (EFRNet), for real-time scene parsing. Our key idea is to decompose the process of learning high-resolution representations into two stages: i) bottom-up codebook/coding matrix learning and ii) top-down feature reconstructing. Specifically, the bottom-up process focuses on learning <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">image-specific</i> codewords (codebook) using deep-layer features and generating a coding matrix with the shallow-layer feature map. In the top-down process, the learned codebook and coding matrix are used to rebuild high-resolution features via a lightweight <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">feature reconstructing operator</i> (FRO). In addition, our EFRNet is constructed on a new building block, named efficient adaptive abstraction (EAA) block, to further reduce the overall network parameters and achieve a significant speed up. Extensive experiments are conducted on challenging benchmarks, such as CamVid and Cityscapes. The results show that EFRNet demonstrates state-of-the-art performance with an optimal balance between accuracy and speed.
0 Replies
Loading