Abstract: This paper presents a new direction for the visual question answering task. Given
an image with a simple linear algebraic equation system and a question in natural
language based on the variables in the image, we propose an end-to-end deep
learning model that produces accurate answers to questions pertaining to the value
of the variables and other related questions. Modeling the problem of solving simple
linear equations as a VQA task makes it interesting as the system now requires
three kinds of understanding a) visual understanding to recognize digits, variables,
operators and equal sign b) conceptual understanding of the symbolic meanings of
coefficients, constants, variables, operators and equality and realizing the role of
numbers as mathematical entities which can undergo mathematical operations and
c) high level understanding of the interaction between the image and the questions
in order to accurately answer them. We also create an open-source dataset for the
same and compare the performance of our model with different baselines.
Keywords: Linear equations, Deep Learning, Visual Question Answering, Mathematical Reasoning
3 Replies
Loading