A New Perspective on Large Language Model Safety: From Alignment to Information Control

ICLR 2026 Conference Submission15735 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Information Control, Information Security, AI Governance, LLM Application, LLM safety, LLM alignment
TL;DR: Using Information Control Framework to Secure LLM applications
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of domains, yet their increasing deployment in sensitive and high-stakes environments exposes profound safety risks—most notably, the uncontrolled generation of inappropriate content and the inadvertent leakage of confidential information. Traditionally, such risks have been approached through the lens of alignment, focusing narrowly on ensuring outputs conform to general notions of helpfulness, honesty, and harmlessness. In this work, we argue that such alignment-centric perspectives are fundamentally limited: information itself is not inherently harmful, but its appropriateness is deeply context-dependent. We therefore propose a paradigm shift in LLM safety—from alignment to information control. Rather than merely shaping model behavior through the existing practice of alignment, we advocate for the principled regulation of who can access what information under which circumstances. We introduce a novel framework for context-sensitive information governance in LLMs, grounded in classical secu- rity principles such as authentication, role-based access control, and contextual authorization. Our approach leverages both the internal knowledge represen- tations of LLMs and external identity infrastructure to enable fine-grained, dynamic control over information exposure. We systematically evaluate our framework using recent models and a suite of benchmark datasets spanning multiple application domains. Our results demon- strate the feasibility and effectiveness of information-centric control in mitigating inappropriate disclosure, providing a robust foundation for safer and more accountable language model deployment. This work opens a new frontier in LLM safety, one rooted not in abstract alignment ideals, but in enforceable, context-aware control of information flow.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 15735
Loading