Enhancing Large Language Reasoning through Multi-Modal Reasoning

Yixin He; Simin Chen; Xinya Du; Cong Liu; Wei Yang

Enhancing Large Language Reasoning through Multi-Modal Reasoning

Yixin He, Simin Chen, Xinya Du, Cong Liu, Wei Yang

24 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Large Language Model, Mult-Modal

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract:

The combination of chain-of-thought prompting with the self-consistency decoding strategy empowers pre-trained large language models (LLMs) to attain promising outcomes in reasoning problems. Nonetheless, this self-consistency strategy samples multiple reasoning paths from the output of a \emph{single} modality prompt (e.g., the linear "step-by-step" text prompt). However, as not all problems can be solved through such a "step-by-step" reasoning process, existing self-consistency strategies still struggle with challenging tasks that involve \emph{multiple} modalities. For example, an algebraic problem requires both reasoning and value computation. % For example, solving an algebraic problem requires both the modality of reasoning and computation. In this paper, we introduce the concept of multiple-modal consistency (\tool) as a complement to self-consistency, aiming to broaden the scope of reasoning capabilities. The idea of \tool ~is that, for different reasoning problems, an LLM ought to engage diverse perspectives rather than adhering to a prompt that only considers textual reasoning. Based on this idea, when given a specific problem, \tool initially queries the LLM to identify modalities suitable for addressing the problem. Subsequently, \tool samples varied reasoning paths from each selected modality instead of the text modality alone. Finally, \tool ~discerns the most coherent response by aggregating the sampled reasoning paths across different modalities. Our exhaustive empirical assessment demonstrates the superiority of \tool over existing approaches and single-modality reasoning baselines.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9049

Loading