Abstract: We introduce optical Flow transFormer, dubbed as Flow-
Former, a transformer-based neural network architecture for learning
optical flow. FlowFormer tokenizes the 4D cost volume built from an image
pair, encodes the cost tokens into a cost memory with alternate-group
transformer (AGT) layers in a novel latent space, and decodes the cost
memory via a recurrent transformer decoder with dynamic positional
cost queries. On the Sintel benchmark, FlowFormer achieves 1.144 and
2.183 average end-ponit-error (AEPE) on the clean and final pass, a
17.6% and 11.6% error reduction from the best published result (1.388
and 2.47). Besides, FlowFormer also achieves strong generalization performance.
Without being trained on Sintel, FlowFormer achieves 0.95
AEPE on the Sintel training set clean pass, outperforming the best published
result (1.29) by 26.9%.
0 Replies
Loading