Abstract: Highlights•We use average pooling layers to capture the multi-scale features of scenes.•We propose a depth-wise convolution based inverted residual feedforward network.•Experiments are conducted on public dataset to verify the preferable performance.
Loading