Keywords: Large Lanuage Model, Dataset, Ambiguity
Abstract: Asking ambiguous questions is a natural aspect of human communication, making it essential for Large Language Models (LLMs) to effectively recognize and address ambiguities. However, there is a lack of a comprehensive analysis of how well LLMs detect and solve ambiguities. Besides, though there exist several datasets on ambiguity, the absence of explicit explanations of ambiguity and annotations of ambiguity types limits the comprehensive evaluation. To address this issue, we introduce Abg-SciQA, a dataset designed to evaluate and help LLMs detect ambiguities and generate appropriate clarification questions using challenge questions in the area of social and nature science. Abg-SciQA encompasses four tasks: Ambiguity Detection, Ambiguity Type Classification, Clarification Question Generation, and Clarification-based Question Answering, where each task has corresponding annotations. We evaluate the dataset using both closed-source and open-source LLMs and fine-tune it on open-source LLMs. Our experiments show that the most state-of-the-art LLMs still encounter difficulties in resolving ambiguity in natural questions, and fine-tuning on Abg-SciQA can significantly enhance their capabilities to understand and address ambiguities. Notably, in the Ambiguity Type Classification task, the F1 score of Llama2-13b improves significantly from 16.6\% to 79.1\%. On the other hand, Abg-SciQA remains a challenging benchmark for LLMs, revealing ample room for model improvement. Our dataset can be found here.
Submission Number: 72
Loading