K-QA: A Real-World Medical Q&A Benchmark

Anonymous

K-QA: A Real-World Medical Q&A Benchmark

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Ensuring the accuracy of responses provided by large language models (LLMs) is crucial, particularly in clinical settings where incorrect information may directly impact patient health. To address this challenge, we construct K-QA, a dataset containing 1,212 patient questions originating from real-world conversations held on a popular clinical online platform. We employ a panel of in-house physicians to answer and manually decompose a subset of K-QA into self-contained statements. Additionally, we formulate two NLI-based evaluation metrics approximating recall and precision: (1) comprehensiveness, measuring the percentage of essential clinical information in the generated answer and (2) hallucination rate, measuring the number of statements from the physician-curated response contradicted by the LLM answer. Finally, we use K-QA along with these metrics to evaluate several state-of-the-art models, as well as the effect of in-context learning and medically-oriented augmented retrieval schemes developed by the authors. Our findings indicate that in-context learning improves the comprehensiveness of the models, and augmented retrieval is effective in reducing hallucinations. We will make K-QA available to to the community to spur research into medically accurate NLP applications.

Paper Type: long

Research Area: Resources and Evaluation

Contribution Types: Data resources

Languages Studied: English

Preprint Status: There is a non-anonymous preprint (URL specified in the next question).

A1: yes

A1 Elaboration For Yes Or No: Section 8

A2: yes

A2 Elaboration For Yes Or No: Section 7,8

A3: yes

A3 Elaboration For Yes Or No: Abstract and Introduction

B: yes

B1: yes

B1 Elaboration For Yes Or No: Section 2

B2: yes

B2 Elaboration For Yes Or No: Section 3,7,8

B3: yes

B3 Elaboration For Yes Or No: Section 7,8

B4: yes

B4 Elaboration For Yes Or No: Section 3

B5: yes

B5 Elaboration For Yes Or No: Section 3

B6: yes

B6 Elaboration For Yes Or No: Section 3

C: yes

C1: n/a

C2: n/a

C3: yes

C3 Elaboration For Yes Or No: Section 5

C4: n/a

D: yes

D1: yes

D1 Elaboration For Yes Or No: Section 3

D2: yes

D2 Elaboration For Yes Or No: Section 3

D3: yes

D3 Elaboration For Yes Or No: Section 3, 7

D4: yes

D4 Elaboration For Yes Or No: Section 7

D5: yes

D5 Elaboration For Yes Or No: Section 3

E: yes

E1: yes

E1 Elaboration For Yes Or No: Section 3

0 Replies

Loading