# Optimization-Based Trajectory Deviation Attacks in Agentic LLM Systems#

For demenstration of our research approach, we open-source both our code and datasamples so all readers could run it easily.

---

The whole project implementation could be divided into two parts:
1. **data utilities**: for understanding the dataset better, we implemented many statistical measurements to analyze the dataset. Then we pick a number of samples from each domain from the dataset. After that, we feed this data sample into attack models(like gpt-5 model family) for getting responses. Finally, we use functions in gpt-oss utilities module to calculate the metrics.
2. **gpt-oss utilities**: for this part, we implement the function for interacting with gpt-oss and calculate the perplexity and cross-entropy to see how the model is confident with the output.

## Project Layout
```text
project/
├─ codes/
│  ├─ dataset_utils.py
│  └─ gpt_oss_utils.py      
└─ data/
   ├─ complex_func_original_samples/
   ├─ complex_function_injected_samples/
   └─ ComplexFuncBench.zip
```

## Setup
Environment:
1. python 3.9+ is recommended
2. pip install -U torch transformers accelerate openai
3. make sure the environment is with CUDA

## Configuration
1. **data paths**: make sure the *ComplexFuncBench.jsonl* path is right in your local repo
2. **Model path**: it is local weights. Since we are calculating the perplexity and cross-entropy, we need to access the weight of gpt-oss model. Make sure you have gpt-oss locally ready and find the model weight path
3. **Environment variables**: it is preferred to setup openai key via environment variable

## Data Format
For *ComplexFuncBench.jsonl*
The data structure is rather complex as its name showing. However, we only focus on part of it. The precise structure is shown below:

```python
{
  "id": "Hotels-123",
  "conversations": [
    {"role": "user", "content": "..."}, 
    {"role": "assistant", "content": "..."}
  ],
  "functions": [
    {"name": "Search_Hotels", "description": "/api/v1/hotels search ..."}
  ]
}
```

The structure of data samples is the same as the one shown above.

## LM Scoring Toolkit
Metrics
Cross-Entropy (CE): nats/token from HF’s loss; bits/token = CE / ln(2).

Perplexity (PPL): exp(CE).

Example: score a single text
```python
text = "Your long document to evaluate ..."
ce_nats, ce_bits, ppl = ce_ppl_sliding_streaming(
    text, max_length=1024, stride=512, empty_cache_every=4
)
print(f"CE: {ce_nats:.6f} nats/token ({ce_bits:.6f} bits/token), PPL: {ppl:.6f}")
```