Can a Large Language Model Keep My Secrets? A Study on LLM-Controlled Agents

Published: 22 Jun 2025, Last Modified: 17 Jul 2025ACL-SRW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs for Security, Access Control, Datasets
TL;DR: We explore the abilities of LLMs to do natural language access control tasks, by providing a novel synthetic dataset, testing two LLMs on this dataset and establishing a human baseline.
Abstract: Agents controlled by Large Language Models (LLMs) can assist with natural language tasks across domains and applications when given access to confidential data. When such digital assistants interact with their potentially adversarial environment, confidentiality of the data is at stake. We investigated whether an LLM-controlled agent can, in a manner similar to humans, consider confidentiality when responding to natural language requests involving internal data. For evaluation, we created a synthetic dataset consisting of confidentiality-aware planning and deduction tasks in organizational access control. The dataset was developed from human input, LLM-generated content, and existing datasets. It includes various everyday scenarios in which access to confidential or private information is requested. We utilized our dataset to evaluate the ability to infer confidentiality-aware behavior in such scenarios by differentiating between legitimate and illegitimate access requests. We compared a prompting-based and a fine-tuning-based approach, to evaluate the performance of Llama~3 and GPT-4o-mini in this domain. In addition, we conducted a user study to establish a baseline for human evaluation performance in these tasks. We found humans reached an accuracy of up to 79%. Prompting techniques, such as chain-of-thought and few-shot prompting, yielded promising results, but still fell short of real-world applicability and do not surpass human baseline performance. However, we found that fine-tuning significantly improves the agent’s access decisions, reaching up to 98% accuracy, making it promising for future confidentiality-aware applications when data is available.
Archival Status: Archival
Acl Copyright Transfer: pdf
Paper Length: Long Paper (up to 8 pages of content)
Submission Number: 169
Loading