Retrieval-Augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models

Anonymous

Retrieval-Augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: The advancement of Large Language Models~(LLMs) has brought substantial attention to the Chain of Thought~(CoT) approach, primarily due to its ability to enhance the capability of LLMs on tasks requiring complex reasoning. Moreover, the significance of CoT approaches extends to the application of LLMs for multi-modal tasks. However, the selection of optimal CoT demonstration examples in multi-modal reasoning for LLMs remains less explored for LLMs due to the inherent complexity of multi-modal examples. In this paper, we introduce a novel approach that addresses this challenge by using retrieval mechanisms to dynamically and automatically select demonstration examples based on cross-modal and intra-modal similarities. Furthermore, we employ a stratified sampling method categorising demonstration examples into groups based on their types and retrieving examples from different groups respectively to promote the diversity of demonstration examples. Through a series of experiments on two popular benchmark datasets - ScienceQA and MathVista, we demonstrate that our approach significantly improves the performance of LLMs by more than 2.5\%, achieving state-of-the-art results in multi-modal reasoning tasks.

Paper Type: long

Research Area: Question Answering

Contribution Types: NLP engineering experiment

Languages Studied: English

0 Replies

Loading