Enhancing Knowledge through Revisable Chain-of-Thought for Commonsense Question Answering

Enhancing Knowledge through Revisable Chain-of-Thought for Commonsense Question Answering

ACL ARR 2025 February Submission5249 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) are effective at natural language reasoning but still struggle with answering commonsense questions that require implicit knowledge of the world. LLMs rely on knowledge learned through training, which can be limited to specific domains and may use inaccurate knowledge, resulting in hallucinations. To alleviate these, recent research integrates external knowledge sources (e.g., Fine-tuning, Self-revision, Retrieval-Augmented Generation (RAG), and Chain-of-Thought (CoT)). However, regular CoT reasoning merely presents the answering process in a specious form, where individual steps are challenging to verify. In this paper, we propose a novel approach called Revisable Chain-of-Thought to address an important commonsense question answering task, the Winograd Schema Challenge. Inspired by the cognitive logic of ``rising from the abstract to the concrete,'' Revisable CoT decomposes knowledge into three distinct categories: meta-knowledge, transfer knowledge, and instantiated knowledge, each handled in separate steps. This framework emphasizes step-by-step verifiability and revisability, ensuring a more interpretable and reliable reasoning process. Furthermore, we propose online revision by teacher models and offline revision with knowledge base. To enhance the relevance of knowledge retrieval from the knowledge base, we propose an antisense retrieval method to check if the newly generated knowledge contradicts any existing knowledge in the knowledge base to avoid retrieving meta-knowledge irrelevant to the problem. The experimental results on the Winogrande dataset have corroborated the efficacy of our proposed method. We revised the meta-knowledge of GPT-3.5 with GPT-4, which enhanced the accuracy from 68.11% to 73.64%, an improvement of 5.53%.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: Question Answering

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 5249

Loading