# LLM_Agent_Privacy_Auditing

* [Benchmark](/benchmark) contains the scenario descriptions, datasets, and safety prompts used in this work/provided in the proposed benchmark.
* [src](/src) contains code for each of the agents ([adversary](/src/adversaries), [application](/src/application_agents), [auditor](/src/auditor)).
    * [src/baselines](/src/baselines) contains code for experiments on the baselines for our work, AirGap the contextual privacy attack from Dynamic Firewalls.
  
## Section References

1. Sec 5.1: Run code for adversary and application agent from [adversary](/src/adversaries), [application](/src/application_agents) in parallel (using a visible-to-adversary host/port for the application agent), then extract yaml-formatted logs using [extractor](/src/auditor/extract_logs.py) and then run the privacy and utility audit scripts in the [auditor](/src/auditor) directory: use the judges suffixed with "_schedule" for the scheduling scenario, and those suffixed with "_insurance" for the rest. Run baseline jobs using the scripts in [src/baselines](/src/baselines).
2. Sec 5.2: Run [adversary](/src/adversaries), [application](/src/application_agents) in parallel in parallel for the insurance claim scenario with target "mental health conditions" then run the [predictor](src/adversaries/predictor/sidechannel_predictor_consistency.py) on extracted logs.
3. Sec 5.3: Run code in [auditor](/src/auditor) over existing conversation logs with different base models and analyse output scores.
4. Sec 5.4: Run the [trajectory judge](src/auditor/trajectory_judge.py) on conversation logs.