Keywords: Cognitive bias, Benchmark dataset, Conversational AI, Large language models (LLMs), Bias detection, Human–AI interaction, Dialogue systems, Bias classification, Trustworthy AI
TL;DR: CogniBias is the first benchmark dataset for studying cognitive biases in AI-human dialogues. It includes annotations for conversations on 30+ bias types, model baselines for human-AI interactions, and analysis of challenges to detect biases in LLMs.
Track: Regular Paper
Abstract: Cognitive biases are predictable departures from rational judgment that impact people's decisions and communication. As large language models (LLMs) are increasingly interfaced with everyday interactions, it is valuable to understand how biases arise and spread through AI–human conversations. We present CogniBias, the first benchmark dataset to study and assess cognitive biases in conversational settings. CogniBias is composed of 30+ established types of biases, like anchoring, framing, confirmation bias, and optimism bias, distilled from a variety of authentic question and answer scenarios. Each dialogue sample includes an LLM suggestion, a human-like response, and expert-informed annotations with notes, bias labels, and confidence scores. To establish the benchmark, we describe a generation pipeline incorporating multiple LLMs and include baseline results with pretrained and fine-tuned models on classification and detection tasks. Our analysis highlights the identified challenges in detecting subtle biases or overlapping biases and identified each model's frequent failures. We hope that by releasing CogniBias, it will align divergent perspectives on cognitive bias assessment and establish a baseline dataset for fairer, more trustworthy conversational AI systems.
Submission Number: 9
Loading