Keywords: LLMs, human feedback
Abstract: Recent large language models (LLMs) have enabled tremendous progress in natural language understanding. However, they are prone to generate confident but nonsensical reasoning chains, a significant obstacle to establishing trust with users. In this work, we aim to incorporate rich human feedback on such incorrect model generated reasoning chains for multi-hop reasoning to improve performance on these tasks. To do so, we collect two such datasets of human feedback in the form of (correction, explanation, error type) for StrategyQA and Sports Understanding datasets, and evaluate several algorithms to learn from such feedback. We show that fine-tuning on such small datasets of rich human feedback can improve model’s performance of generating the correct final answers, and also improves the model’s ability of judging the correctness of it’s own answer.
Submission Number: 8
Loading