Keywords: Vision-Language Model, Formalization, Plane Geometry, Data Augmentation
Abstract: Large models such as vision language models (VLMs) have demonstrated robust world knowledge comprehension, inspiring advancements in automated mathematical problem-solving. In the domain of geometry problem-solving, the intricate and diverse abstract relationships inherent in geometry diagrams present significant challenges for leveraging large models. To enhance the accuracy of geometry problem-solving, we analyze existing problem-solving paradigms and propose leveraging VLMs for enhanced diagram autoformalization accuracy. First, we construct a multimodal instruction-tuning dataset named **G**eometry**D**iagram**F**ormalization**86K** (GDF86K) through data augmentation based on algebraic commutativity in the Geometry3K dataset. This dataset contains over 86,000 image-caption pairs to facilitate training of diagram autoformalization models. Utilizing GDF86K, we conduct supervised fine-tuning to implement Geo-TinyLLaVA, a vision-language model specialized in geometry diagram autoformalization. When input diagrams with complete point annotations, Geo-TinyLLaVA outperforms the conventional Inter-GPS diagram parser in autoformalization performance and can serve as a plugin to enhance the problem-solving accuracy of the geometry problem-solving system. Code and data are available at https://github.com/1509cxt/Geo-TinyLLaVA.
Submission Number: 83
Loading