Keywords: Information Control, Information Security, AI Governance, LLM Application, LLM safety, LLM alignment
TL;DR: Using Information Control Framework to Secure LLM applications
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities
across a wide range of domains, yet their increasing deployment in sensitive
and high-stakes environments exposes profound safety risks—most notably, the
uncontrolled generation of inappropriate content and the inadvertent leakage of
confidential information. Traditionally, such risks have been approached through
the lens of alignment, focusing narrowly on ensuring outputs conform to general
notions of helpfulness, honesty, and harmlessness. In this work, we argue that
such alignment-centric perspectives are fundamentally limited: information itself
is not inherently harmful, but its appropriateness is deeply context-dependent.
We therefore propose a paradigm shift in LLM safety—from alignment to information control. Rather than merely shaping model behavior through the existing
practice of alignment, we advocate for the principled regulation of who can access
what information under which circumstances. We introduce a novel framework
for context-sensitive information governance in LLMs, grounded in classical secu-
rity principles such as authentication, role-based access control, and contextual
authorization. Our approach leverages both the internal knowledge represen-
tations of LLMs and external identity infrastructure to enable fine-grained,
dynamic control over information exposure.
We systematically evaluate our framework using recent models and a suite of
benchmark datasets spanning multiple application domains. Our results demon-
strate the feasibility and effectiveness of information-centric control in mitigating
inappropriate disclosure, providing a robust foundation for safer and more
accountable language model deployment. This work opens a new frontier in LLM safety, one rooted not in abstract alignment ideals, but in enforceable,
context-aware control of information flow.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 15735
Loading