# agent-privacy-attack

**ADAM: Systematic Data Extraction Attacks on Agent Memory via Adaptive Querying**  

This repository contains the reference implementation of **ADAM**, a novel *memory extraction attack* against LLM-based agents with memory or retrieval-augmented generation (RAG) modules.  
ADAM adaptively probes the agent’s memory by maintaining a distribution over topics, selecting anchors in a probability-weighted and diversity-aware way, and using an **entropy-last filter** to maximize per-round information gain.

---

## 🔍 Overview

Modern LLM agents often integrate **memory modules** or **RAG** components to improve reasoning and task execution.  
While effective, these designs introduce **critical privacy vulnerabilities**: sensitive information stored in memory can be systematically extracted by carefully crafted queries.  

ADAM highlights this risk through a *systematic, adaptive attack pipeline*:

1. **Initialization**: Start with a small set of seed anchors (e.g., *diagnosis*, *medication*, *patient*).  
2. **Prompt Design**: Generate natural queries with benign prefix + retrieval-inducing suffix (with light paraphrasing).  
3. **Anchor Extraction & Pool Maintenance**: Update anchor pool from responses with NER/keyword extraction, normalization, and deduplication.  
4. **Distribution Update**: Maintain and update topic distribution with evidence from responses, penalizing overused anchors.  
5. **Anchor Selection**: Use probability-weighted $k$-center selection for diverse, high-value anchors.  
6. **Entropy-Last Querying**: Among candidate queries, select the one with maximum entropy for maximal information gain.  
7. **Iteration & Stopping**: Continue until convergence or budget is met.  

---

## 🛠 Features

- **Entropy-guided query selection** to maximize leakage.
- **Anchor pool maintenance** with NER + semantic similarity deduplication.  
- **Distributional update mechanism** (softmax with temperature & penalty factor).  
- **Probability-weighted $k$-center anchor selection** for diversity.  
- **LLM-driven prompt design** with prefix/suffix injection & paraphrasing.  
- **Logging & metrics**: supports CSV output and per-iteration logs.  

---

## 🚀 Getting Started

### Requirements
- Python 3.9+
- PyTorch
- HuggingFace `transformers`
- `scikit-learn`, `numpy`, `pandas`
- (Optional) OpenAI API / other LLM APIs

### Installation
```bash
git clone https://github.com/your-username/agent-privacy-attack.git
cd agent-privacy-attack
pip install -r requirements.txt


⚠️ Disclaimer

This code is provided for research purposes only.
Do not deploy ADAM against unauthorized systems.
Our goal is to raise awareness of privacy vulnerabilities in LLM-based agents and encourage development of defensive techniques.