Presentation: Virtual
Keywords: Security, Distributed LLM Inference, TEE
Presenter Full Name: Kexin Chu
Presenter Email: kexin.chu@uconn.edu
Abstract: Large Language Models (LLMs) have achieved remarkable success across a range of applications, from code generation to conversational AI. As LLMs grow in size and capability, distributed inference across multiple computing nodes becomes necessary to meet resource demands and performance goals. However, this shift introduces critical security challenges, particularly in the handling of sensitive user inputs and intermediate model states like key-value (KV) caches. In this paper, we present SPADA—a Secure, Performant, and Distributed Architecture for LLM inference that addresses the core challenges of secure execution, inter-node trust, and efficient communication in distributed environments. SPADA integrates trusted execution environments (TEEs), a decentralized trust establishment protocol, and a lightweight, encrypted communication pipeline. It also introduces a secure and efficient mechanism for transmitting distributed KV cache data. Our design ensures that distributed inference pipelines maintain strong privacy guarantees without sacrificing throughput or latency, offering a practical foundation for secure LLM deployment at scale.
Presenter Bio: https://scholar.google.com/citations?user=ZIdS3d0AAAAJ&hl=en
Paper Checklist Guidelines: I certify that all co-authors have validated the presented results and conclusions, and have read and commit to adhering to the Paper Checklist Guidelines, Call for Papers and Publication Ethics.
YouTube Link: --
YouTube Link Poster: https://youtu.be/yQUboNlplfI
Google Slides: --
Supplementary Material: pdf
Poster: Yes
Workshop Registration: Yes, the presenter has registered for the workshop.
YouTube Link Short: https://scholar.google.com/citations?user=ZIdS3d0AAAAJ&hl=en
Submission Number: 17
Loading