everyone
since 17 Dec 2024">EveryoneRevisionsBibTeXCC BY 4.0
Given that Large Language Models (LLMs) are widely used in everyday society, it is important that these LLMs produce reliable, well-reasoned outputs. However, the TRIP benchmark reveals a concerning trend in which LLMs that purport high accuracy on reasoning tasks may not be able to justify their outputs with sound evidence. To address this issue, our project implements three approaches to improve reasoning abilities in LLMs and to encourage LLMs to generate their outputs by following coherent reasoning steps. Specifically, our three approaches include transferring knowledge from related reasoning tasks, employing powerful model architectures, and crafting prompts that surface reasoning abilities in LLMs. Through combinations of these approaches, we achieve approximately 20% improvements in performance on the lower-level reasoning tasks on the TRIP benchmark.