Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

ICLR 2026 Conference Submission22423 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, API auditing, untargeted fingerprinting

Abstract: As API access becomes a primary interface to large language models (LLMs), users often interact with black-box systems that offer little transparency into the deployed model. To reduce costs or maliciously alter model behaviors, API providers may discreetly serve quantized or fine-tuned variants, which can degrade performance and compromise safety. Detecting such substitutions is difficult, as users lack access to model weights and, in most cases, even output logits. To tackle this problem, we propose a rank-based uniformity test (RUT) that can verify the behavioral equality of a black-box LLM to a locally deployed authentic model. Our method is accurate, query-efficient, and avoids detectable query patterns, making it robust to adversarial providers that reroute or mix responses upon the detection of testing attempts. We evaluate the approach across diverse query domains and threat scenarios, including quantization, harmful fine-tuning, jailbreak prompts, full model substitution, showing that it consistently achieves superior detection power over prior methods under constrained query budgets.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 22423

Loading