Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning

Jinyuan Wang; Junlong Li; hai zhao

Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning

Jinyuan Wang, Junlong Li, hai zhao

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Commonsense Reasoning

Submission Track 2: Theme Track: Large Language Models and the Future of NLP

Keywords: Chain-of-Thought, Large Language Models, In-context-Learning, Open-domain question-answering, Multi-hop question-answering

TL;DR: Self-prompted Chain-of-Thought as In-context demonstrations to boost Large Language Models' performance for open-domain multi-hop reasoning, which is generally effective on large-scale (175B) and small-scale (13B) LLMs.

Abstract: In open-domain question-answering (ODQA), most existing questions require single-hop reasoning on commonsense. To further extend this task, we officially introduce open-domain multi-hop reasoning (ODMR) by answering multi-hop questions with explicit reasoning steps in open-domain setting. Recently, large language models (LLMs) have found significant utility in facilitating ODQA without external corpus. Furthermore, chain-of-thought (CoT) prompting boosts the reasoning capability of LLMs to a greater extent with manual or automated paradigms. However, existing automated methods lack of quality assurance, while manual approaches suffer from limited scalability and poor diversity, hindering the capabilities of LLMs. In this paper, we propose Self-prompted Chain-of-Thought (SP-CoT), an automated framework to mass-produce high quality CoTs of LLMs, by LLMs and for LLMs. SP-CoT introduces an automated generation pipeline of high quality ODMR datasets, an adaptive sampler for in-context CoT selection and self-prompted inference via in-context learning. Extensive experiments on four multi-hop question-answering benchmarks show that our proposed SP-CoT not only significantly surpasses the previous SOTA methods on large-scale (175B) LLMs, but also nearly doubles the zero-shot performance of small-scale (13B) LLMs. Further analysis reveals the remarkable capability of SP-CoT to elicit direct and concise intermediate reasoning steps by recalling $\sim$50\% of intermediate answers on MuSiQue-Ans dataset.

Submission Number: 434

Loading