Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks

Rui Patrick Xian; Alex Jihun Lee; Satvik Lolla; Vincent Wang; Russell Ro; Qiming Cui; Reza Abbasi-Asl

Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks

Rui Patrick Xian, Alex Jihun Lee, Satvik Lolla, Vincent Wang, Russell Ro, Qiming Cui, Reza Abbasi-Asl

Published: 16 Dec 2024, Last Modified: 16 Dec 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The increasing depth of parametric domain knowledge in large language models (LLMs) is fueling their rapid deployment in real-world applications. Understanding model vulnerabilities in high-stakes and knowledge-intensive tasks is essential to quantifying the trustworthiness of model predictions and regulating model use. The recent discovery of named entities as adversarial examples (i.e. adversarial entities) in natural language processing tasks raises questions about their potential impact on the knowledge robustness of pre-trained and finetuned LLMs in high-stakes and specialized domains. We examined the use of type-consistent entity substitution as a template for collecting adversarial entities for medium-sized billion-parameter LLMs with biomedical knowledge. To this end, we developed an embedding space, gradient-free attack based on powerscaled distance-weighted sampling for robustness evaluation, which has a low query budget and controllable coverage. Our method has favorable query efficiency and scaling over alternative approaches based on blackbox gradient-guided search, which we demonstrated for adversarial distractor generation in biomedical question answering. Subsequent failure mode analysis uncovered two regimes of adversarial entities on the attack surface with distinct characteristics. We also showed that entity substitution attacks can manipulate token-wise Shapley value explanations, which become deceptive in this setting. Our approach complements standard evaluations for high-capacity models and the results highlight the brittleness of domain knowledge in LLMs.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=mNFcCGLYv1

Changes Since Last Submission: [09/21/24] Recovered the original fonts and reduced manuscript length. [10/23/24] Updated manuscript to address the issues raised by reviewer D6wa. [11/08/24] Updated manuscript to address points 1,3,4 raised by reviewer HHFG. [11/12/24] Updated manuscript to address point 2 raised by reviewer HHFG. [11/13/24] Updated manuscript to address the issues raised by reviewer rYjV. [11/21/24] Minor fixes in math symbols and text of Appendix B.

Code: https://github.com/RealPolitiX/qstab

Supplementary Material: pdf

Assigned Action Editor: ~Grigorios_Chrysos1

Submission Number: 3370

Loading