Dissecting Inaccuracies in Large Language Models: An Analysis on Reasoning-Error Causes on Large Language Models

Anonymous

Dissecting Inaccuracies in Large Language Models: An Analysis on Reasoning-Error Causes on Large Language Models

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: We evaluate multiple techniques that can be used to diagnose reasoning error causes with an extension of self consistency while partly improving performance of the given results

Abstract: While large language models (LLMs) have rapidly improved performance on a broad number of tasks, they still lag behind in abstract reasoning tasks. \citet{wang2023selfconsistency} proposed \textit{self-consistency}, finding that sampling multiple rationales before taking a majority vote stably improves performance in both mathematical and commonsense reasoning. This work augments self-consistency idea with a variety of clustering and mapping approaches to balance between diversity and accuracy, and additionally explore and evaluate sources of inaccuracies in reasoning performance more efficiently and concisely. We introduce two novel techniques: identifying consensus responses by clustering semantic embeddings of model outputs, and systematically varying temperature schedules during the course of generation. By doing so, we aim to capture a more comprehensive spectrum of reasoning paths employed by the model and increase confidence in coherent answers providing guidance about models wrong doings while improving accuracy on common benchmarks.

Paper Type: long

Research Area: Interpretability and Analysis of Models for NLP

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

0 Replies

Loading