- Keywords: Deep learning, convolutional neural networks, building block design
- TL;DR: A design of building block for performance boosting
- Abstract: Convolutional Neural Networks (CNNs) typically treat normalization methods such as batch normalization (BN) and rectified linear function (ReLU) as building blocks. Previous work showed that this basic block would lead to channel-level sparsity (i.e. channel of zero values), reducing computational complexity of CNNs. However, over-sparse CNNs have many collapsed channels (i.e. many channels with undesired zero values), impeding their learning ability. This problem is seldom explored in the literature. To recover the collapsed channels and enhance learning capacity, we propose a building block, Channel Equilibrium (CE), which takes the output of a normalization layer as input and switches between two branches, batch decorrelation (BD) branch and adaptive instance inverse (AII) branch. CE is able to prevent implicit channel-level sparsity in both experiments and theory. It has several appealing properties. First, CE can be stacked after many normalization methods such as BN and Group Normalization (GN), and integrated into many advanced CNN architectures such as ResNet and MobileNet V2 to form a series of CE networks (CENets), consistently improving their performance. Second, extensive experiments show that CE achieves state-of-the-art results on various challenging benchmarks such as ImageNet and COCO. Third, we show an interesting connection between CE and Nash Equilibrium, a well-known solution of a non-cooperative game. The models and code will be released soon.
- Original Pdf: pdf