## Truffld

Truffld: BRIDGING NON-INTRUSIVE TRACING AND FINE-GRAINED CROSS-LAYER REPRESENTATIONS FOR
LLM INFERENCE DIAGNOSIS

------

TRUFFLD  provides a practical end-to-end solution for observability and
diagnosis in large-scale LLM inference.

![image-20250924165928479](framework.png)

------

### Contributions

- **Non-intrusive monitoring with low overhead**. TRUFFLD records execution events without modifying source code or online binaries, ensuring that tracing introduces minimal interference in production.
- **Fine-grained cross-layer representations**. By jointly capturing vertical (per-node stack execu-tion) and horizontal (cross-node communication) views, TRUFFLD provides comprehensive cover-age and enables per-step granularity at the request level.
- **Asynchrony- and concurrency-aware call-chain construction**. TRUFFLD reconstructs equest-level call-chain trees by aligning host and device timelines, and by explicitly disambiguating many-to-one operator mappings caused by batching and coroutine scheduling.
- **End-to-end anomaly diagnosis**. On top of the call-chain representation, TRUFFLD integrates a probabilistic stage based on Gaussian Mixture Models to model multi-modal normal behavior, and a reasoning stage based on large language models to apply structural and semantic constraints. This two-stage design yields both step-level anomaly decisions and operator-level localization, and outperforms classical machine learning baselines.

------

###  Project Structure

```
root/                                   # Project root directory
├── config/                             # Configuration files directory
│   └── pytorch.json                    # PyTorch tracing configuration
│
├── csrc/                               # C/C++ source code and build scripts
│   ├── CMakeLists.txt                  # CMake build configuration file
│   └── cupti_injection.cpp             # CUPTI tracing injection implementation
│
├── example/                            # Usage examples
│   └── vllm_offline.py                 # vLLM offline inference tracing example
│
├── gen_dataset/                        # Dataset generation scripts
│   ├── gen_offline_dataset.py          # Offline dataset generation
│   └── gen_online_dataset.py           # Online dataset generation
│
├── hook/                               # Hook scripts and utility functions
│   ├── sitecustomize.py                # Python runtime custom injection entry
│   └── utils.py                        # Helper functions for hook implementation
│
├── process/                            # Data processing and analysis scripts
│   ├── full_sample.py                  # Full-sample data processing
│   ├── hook_kind_count.py              # Hook event type statistics analysis
│   ├── horizontal_chrome_events.py     # Horizontal Chrome trace events processing
│   ├── vertical_chrome_events.py       # Vertical Chrome trace events processing
│   ├── utils.py                        # Shared utilities for processing
│   └── __init__.py                     # Marks directory as a Python package
│
└── README.md                           # Project overview and usage guide
```

