Abstract: Learning optical flow based on convolutional neural networks has made great progress in recent years. These approaches usually design an encoder-decoder network that can be trained end-to-end. In encoder part, high-level feature information is extracted through a series of strided convolution, which is similar to most image classification networks. In contrast to classification task, spatial feature maps are then enlarged to full scale of input by conducting successive deconvolution layer in decoder part. However, optical flow estimation is a pixel-level task, and blurry flow fields are usually generated, which is caused by unrefined features and low-resolution. To address this problem, we propose a novel network, which combines attention mechanism and dilated convolutional neural network. In this network, the channel-wise features are adaptively weighted by building interdependencies among channels, which can weaken the weights of useless features and can enhance the directivity of feature extraction. Meanwhile, spatial precision is achieved by employing dilated convolution which improves the receptive field without large computational source and keeps the spatial resolution of feature map unchanged. Our network is trained on FlyingChairs and FlyingThings3D datasets in a supervised manner. Extensive experiments are conducted on MPI-Sintel and KITTI datasets to verify the effectiveness of the proposed method. The experimental results show that attention mechanism and dilated convolution are beneficial for optical flow estimation. Moreover, our method achieves better accuracy and visual improvements comparing to most of recent approaches.
Loading