Abstract: Unlike natural pictures, diagrams are a highly abstract vehicle for knowledge representation, and Diagram Question Answering involves complex reasoning processes such as diagram element detection. However, due to low resource constraints, achieving efficient extraction of diagram elements is challenging. In addition, vision tasks rely on image feature extraction, and most feature extraction today is based on real scenario images on ImageNet. To solve the above problems, we programmatically synthesized a diagram dataset to implement diagram element prediction and put its feature extraction module to use on downstream task. In the actual task, we explicitly input the predicted image elements from the diagram into the model. The experimental comparison shows a significant improvement in our model compared to the baseline.
0 Replies
Loading