Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments

Published: 05 Jun 2025, Last Modified: 15 Jul 2025ICML 2025 Workshop TAIG PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Verifieable Audits, Confidential Computing, AI Safety Benchmarks, Trustworthy AI, Model Transparency
TL;DR: Attestable Audits runs AI-safety benchmarks inside hardware-based secure enclaves and issues cryptographic proofs, providing verifiable model audits while keeping weights and data confidential.
Abstract: Benchmarks are important measures to evaluate safety and compliance of AI models at scale. However, they typically do not offer verifiable results and lack confidentiality for the model IP and benchmark dataset, which creates a gap in AI Governance. We propose Attestable Audits, a new approach that runs inside Trusted Executions Environments (TEEs) and enables users to verify that they are interacting with a compliant AI model. Our work protects sensitive data even if model provider and auditor do not trust each other. This solves verification challenges proposed in recent AI governance frameworks. We build a prototype to demonstrate the feasibility of our approach for typical audit benchmarks against Llama-3.1.
Submission Number: 5
Loading