# LLM_Agent_Privacy_Auditing

* [Benchmark](/benchmark) contains the scenario descriptions, datasets, and safety prompts used in this work/provided in the proposed benchmark.
* [src](/src) contains code for each of the agents ([adversary](/src/adversaries), [application](/src/application_agents), [auditor](/src/auditor)).
    * [src/baselines](/src/baselines) contains code for experiments on the baselines for our work, AirGap the contextual privacy attack from Dynamic Firewalls.
  
## Section References

1. **Sec 4.1, Explicit Leakages:** Run code for adversary and application agent from [adversary](/src/adversaries), [application](/src/application_agents) in parallel (using a visible-to-adversary host/port for the application agent), then extract yaml-formatted logs using [extractor](/src/auditor/extract_logs.py) and then run the privacy and utility audit scripts in the [auditor](/src/auditor) directory: use the judges suffixed with "_schedule" for the scheduling scenario, and those suffixed with "_insurance" for the rest. Run baseline jobs using the scripts in [src/baselines](/src/baselines).
2. **Sec 4.1, Implicit Leakages via Adversarial Inference and Sec 4.2:** Run [adversary](/src/adversaries), [application](/src/application_agents) in parallel in parallel for the insurance claim scenario with target "mental health conditions" then run the [predictor](src/adversaries/predictor/sidechannel_predictor_consistency.py) on extracted logs. 
   * In order to retrieve more than 20 logprobs for studying adversarial belief updates, go to the installation directory for the vllm package, find vllm/config/\_\_init\_\_.py and change max_logprobs to a large number (viz. 2000000). Otherwise by default, vllm will only allow fetching a maximum of 20 logprobs.
3. **Sec 4.3:** Run code in [auditor](/src/auditor) over existing conversation logs with different base models and analyse output scores.
4. **Sec 4.4:** Run the [trajectory judge](src/auditor/trajectory_judge.py) on conversation logs.