Gen-T: Reduce Distributed Tracing Operational Costs Using Generative Models

Published: 20 Oct 2023, Last Modified: 18 Nov 2023TGL Workshop 2023 LongPaperEveryoneRevisionsBibTeX
Keywords: Distributed Tracing, Generative Models, Timeseries, Observability
Abstract: Distributed tracing (DT) is an important aspect of modern microservice operations. It allows operators to troubleshoot problems by modeling the sequence of services a specific request traverses in the system. However, transmitting traces incurs significant costs. This forces operators to use coarse-grained prefiltering or sampling techniques, creating undesirable tradeoffs between cost and fidelity. We propose to circumvent these issues using generative modeling to capture the semantic structure of collected traces in a lossy-yet-succinct way. Realizing this potential in practice, however, is challenging. Naively extending ideas from the literature on deep generative models in timeseries generation or graph generation can result in poor cost-fidelity tradeoffs. In designing and implementing Gen-T, we tackle key algorithmic and systems challenges to make deep generative models practical for DT. We design a hybrid generative model that separately models different components of DT data, and conditionally stitches them together. Our system Gen-T, which has been integrated with the widely-used OpenTelemetry framework, achieves a level of fidelity comparable to that of 1:15 sampling, which is more fine-grained than the default 1:20 sampling setting in the Opentelemetry documentation, while maintaining a cost profile equivalent to that of 1:100 lossless-compressed sampling (i.e., a 7$\times$ volume reduction).
Format: Long paper, up to 8 pages. If the reviewers recommend it to be changed to a short paper, I would be willing to revise my paper to fit within 4 pages.
Submission Number: 33