Abstract: This paper studies the problem of sequential visual processing to solve arithmetic operations using handwritten digits. We feed a sequence of digits with an arithmetic operator to a trained system, and then ask for the resulting symbolic answer. All digits and operators in the input sequence are images, while the output is a real number rounded up. The proposed architecture is a hybrid recurrent-convolutional network with a regression module that is trainable end-to-end. The experimental results show that the proposed architecture is able to add or subtract sequences of up to five elements with high accuracy, and that long sequences require long training times.
Loading