ERASER is a post-hoc unlearning framework that enhances LLM safety by selectively removing harmful, toxic, or copyrighted content memorized during pretraining. While ERASER primarily aims to reduce legal and ethical risks from unintended memorization, we recognize the risk of misuse—for instance, erasing politically or socially significant facts, which could undermine historical accountability or scientific transparency.

To mitigate such dual-use risks, we adopt the following safeguards:

1. Intended Use Disclosure  
ERASER is intended solely for mitigating legal, safety, and privacy risks arising from unintended memorization in pretrained language models. Use of ERASER to remove truthful or socially significant information (e.g., facts related to human rights abuses, medical safety, or public interest) is explicitly discouraged.

2. Responsible Deployment Guidelines  
We provide the following usage guidelines aligned with published AI safety principles:
- Transparency: Users must disclose whether ERASER or any machine unlearning process has been applied to their deployed models, and clearly indicate the category of removed content.
- Risk Assessment: Prior to application, users should document the basis for removal, distinguishing between harmful/private content versus socially vital knowledge.
- Non-Automated Approval: We recommend human-in-the-loop governance, including legal, ethical, and domain-specific reviewers, for approval of unlearning requests in high-stakes domains (e.g., health, law, history, politics).
- Pre-release Evaluation: Any modified model must undergo red teaming and stress testing to surface unintended deletion, hallucination amplification, or behavior shifts.

3. Dual-Use Risk Acknowledgment  
We acknowledge the potential for dual use and recommend explicit screening for inappropriate applications, including but not limited to:
- Content moderation systems that suppress minority or dissenting voices.
- Authoritarian government systems with no democratic oversight or redress mechanisms.
- Research settings where knowledge removal compromises scientific reproducibility or integrity.

4. Technical Safeguards and Operational Controls  
To support auditability and responsible handling:
- Versioning: Unlearning-modified models must be version-tagged, with changelogs summarizing what category of data was removed.
- Reversibility: The underlying ERASER workflow retains prior checkpoints to permit rollback or re-evaluation.
- Auditable Logs: We recommend maintaining metadata logs of unlearning operations (timestamp, reviewer, rationale) for external or internal audit.

5. Performance Degradation Monitoring  
As unlearning may degrade model utility:
- Users must benchmark pre/post-unlearning model behavior on relevant accuracy, robustness, and safety metrics.
- If degradation exceeds predefined thresholds (e.g., 5% accuracy drop on public QA datasets), mitigation measures or rollback should be considered.
- Known risks such as overgeneralization (unintended forgetting beyond the target) and hallucination must be evaluated post-unlearning.

6. Access and Release Policy  
To promote open research and transparency, the ERASER codebase is publicly released and may be used, modified, and extended for both academic and commercial purposes. However, we emphasize the importance of responsible use, especially given the sensitivity of machine unlearning applications.

In particular:
- We do not provide pretrained or pre-unlearned models to prevent unintended misuse and to encourage careful replication.
- ERASER is not designed to serve as an automated, real-time forgetting API. Any use for content removal should include appropriate human oversight and ethical review.
- We strongly discourage use cases involving:
    - Arbitrary or unverifiable deletion requests (e.g., to suppress dissent or public interest content),
    - Corporate deployment without transparency,
    - Applications that bypass human judgment in safety-critical scenarios.
We believe that unlearning systems should be deployed with rigorous auditing, clear documentation of intent, and societal accountability. We welcome community contributions that align with these principles.
(Note: The content of this paper is released under the Creative Commons Attribution 4.0 International license (CC BY 4.0). Please provide appropriate credit and indicate any modifications when reusing this work.)

7. Alignment with Existing Guidelines  
Our safeguards draw from internationally recognized AI safety frameworks and dual-use mitigation principles:
- OpenAI’s GPT-4 System Card: Emphasizes dual-use risks and red teaming to assess misuse potential.
  https://cdn.openai.com/papers/gpt-4-system-card.pdf
- Anthropic’s Responsible Scaling Policy (RSP): Highlights pre-deployment safety evaluations, red teaming, and risk mitigation.
  https://www.anthropic.com/responsible-scaling-policy
- Anthropic’s Constitutional AI: Proposes principle-based alignment and ethics-driven system design.
  https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback
- NIST AI Risk Management Framework (AI RMF 1.0): Offers a lifecycle-based framework for govern–map–measure–manage functions.
  https://www.nist.gov/itl/ai-risk-management-framework
- Bletchley Declaration: Advocates international coordination for frontier AI safety and responsible scaling.
  https://www.gov.uk/government/publications/ai-safety-summit-2023-the-bletchley-declaration

8. Community Monitoring and Feedback
We encourage users to report misuse or ethical concerns related to ERASER-based systems via public issue trackers or designated community channels. Open feedback loops are critical to collective oversight.

We believe these safeguards offer a balanced and enforceable path forward for the responsible use of ERASER, enabling safer LLM deployment while preventing harmful overreach.
