# Truncation Debugging Guide

## Overview

This document explains the truncation issues that can occur when LLM responses are cut off due to token limits, and how to identify and fix them.

## Root Cause

The main cause of truncation is the `max_tokens` parameter being set too low in the LLM configuration. When LLM responses exceed this limit, they get cut off mid-sentence, leading to incomplete LaTeX models and Gurobi code.

## Recent Fixes Applied

### 1. Increased Token Limits

**Before:**
- All models had `max_tokens=4000`
- Timeout was 60 seconds

**After:**
- All models now have `max_tokens=16000`
- Timeout increased to 120 seconds
- Applied to OpenAI, Anthropic, and Google models

### 2. Enhanced Debugging

Added comprehensive response debugging:

- **Response length tracking**: Monitor response lengths to identify potential truncation
- **Debug output**: Detailed logging showing response length and content previews
- **Fix suggestions**: Automatic suggestions for resolving truncation issues

## How to Identify Truncation

### 1. Look for Debug Output

When running the system, watch for these debug messages:

```
🔍 Min/Max agent response length: 1234 characters
🔍 Response preview: ...\n\n```\n
⚠️ WARNING: Response appears to be truncated!
💡 Suggestions: Increase max_tokens in LLM configuration, Split the task into smaller chunks
```

### 2. Common Truncation Indicators

- Response ends with `...`
- Response ends with ```` 
- Missing expected markers like `UPDATED MODEL:` or `model.optimize()`
- Incomplete sentences at the end
- Unmatched code block markers

### 3. Response Analysis

The system now tracks response lengths and provides:

- Response length in characters
- Response preview (first 300 characters)
- Response ending (last 200 characters)
- Suggestions for fixing truncation issues

## How to Fix Truncation

### 1. Increase Token Limits

The most common fix is to increase the `max_tokens` parameter in your LLM configuration:

```python
from src.utils.llm_model_manager import LLMConfig

config = LLMConfig(
    provider="openai",
    model_name="gpt-4o",
    max_tokens=32000,  # Increased from 16000
    temperature=0.0
)
```

### 2. Use Models with Higher Token Limits

Consider using models that support higher token limits:

- **OpenAI**: GPT-4o (128K context)
- **Anthropic**: Claude-3-Opus (200K context)
- **Google**: Gemini-1.5-Pro (1M context)

### 3. Split Large Tasks

If you're processing very large models, consider splitting them into smaller chunks:

```python
# Process model in sections
model_sections = split_large_model(latex_model)
for section in model_sections:
    result = process_model_section(section)
```

### 4. Optimize Prompts

Reduce prompt size by:
- Removing unnecessary context
- Using more concise instructions
- Focusing on essential information only

## Example Debugging Session

Here's what a typical debugging session looks like:

```python
# Run the workflow with verbose output
workflow = LinearizeLLMWorkflow("large_model.tex", verbose=True)
results = workflow.run()

# Look for truncation warnings in the output
# If you see warnings, increase max_tokens and try again
```

## Best Practices

1. **Start with High Token Limits**: Use the maximum available tokens for your model
2. **Monitor Response Lengths**: Watch for responses that are unusually short
3. **Test with Smaller Models First**: Validate your approach with smaller problems
4. **Use Appropriate Models**: Choose models that can handle your problem size
5. **Optimize Prompts**: Keep prompts concise and focused

## Troubleshooting Checklist

- [ ] Check if `max_tokens` is set high enough
- [ ] Verify the model supports the required token limit
- [ ] Look for truncation indicators in response endings
- [ ] Consider splitting large models into sections
- [ ] Optimize prompt templates for conciseness
- [ ] Use models with higher context windows