Three Approaches to Improve Reasoning on the TRIP Benchmark: Transfer Learning, Model Selection, and Prompting Techniques

Tiffany Parise; Xiyuan Chang; Yuting Duan; Benjamin LIu

Three Approaches to Improve Reasoning on the TRIP Benchmark: Transfer Learning, Model Selection, and Prompting Techniques

Tiffany Parise, Xiyuan Chang, Yuting Duan, Benjamin LIu

Published: 17 Dec 2024, Last Modified: 17 Dec 2024UMich CSE595 NLP FA2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: TRIP benchmark, commonsense reasoning, transfer learning, prompting, fine-tuning, physical state understanding, chain-of-thought

TL;DR: We explore three approaches to enhance reasoning performance on the TRIP benchmark: transfer learning, powerful LLMs, and advanced prompting techniques —with Mistral-7B-Instruct-v0.3 excels due to instruction fine-tuning and effective demo selection.

Abstract:

Given that Large Language Models (LLMs) are widely used in everyday society, it is important that these LLMs produce reliable, well-reasoned outputs. However, the TRIP benchmark reveals a concerning trend in which LLMs that purport high accuracy on reasoning tasks may not be able to justify their outputs with sound evidence. To address this issue, our project implements three approaches to improve reasoning abilities in LLMs and to encourage LLMs to generate their outputs by following coherent reasoning steps. Specifically, our three approaches include transferring knowledge from related reasoning tasks, employing powerful model architectures, and crafting prompts that surface reasoning abilities in LLMs. Through combinations of these approaches, we achieve approximately 20% improvements in performance on the lower-level reasoning tasks on the TRIP benchmark.

Archival Option: Yes

Submission Number: 26

Loading