Workshop on Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference
Keywords: large language models, inference efficiency, sparsity, mixtures of experts, kv cache compression, sparse auto encoders, distillation, pruning, sparse training, quantization
TL;DR: This workshop aims to bring together researchers and practitioners from academia/industry who are interested in improving inference efficiency of LLMs.
Abstract: Large Language Models (LLMs) have emerged as transformative tools in both research and industry, excelling across a wide array of tasks. However, their growing computational demands especially during inference—raise significant concerns about accessibility, environmental sustainability, and deployment feasibility. At the same time, sparsity-based techniques are proving critical not just for improving efficiency but also for enhancing interpretability, modularity, and adaptability in AI systems. This workshop aims to bring together researchers and practitioners from academia and industry who are advancing the frontiers of sparsity in deep learning. Our scope spans several interrelated topics, including Mixture of Experts (MoEs), LLM inference and serving, network pruning, sparse training, distillation, activation sparsity, low-rank adapters, hardware innovations and quantization. A key objective is to foster connections and unlock synergies between traditionally independent yet highly related research areas, such as activation sparsity and sparse autoencoders (SAEs), or quantization and KV cache compression. Rather than focusing solely on efficiency, we aim to explore how sparsity can serve as a unifying framework across multiple dimensions of AI—driving advances in interpretability, generalization, and system design. By facilitating the fusion of ideas from different topics, the workshop will create new opportunities for innovation. We encourage participants to think beyond traditional constraints, exploring how different forms of sparsity can inform each other and yield new algorithms. Whether the goal is faster inference, modular architectures, or more interpretable models, our aim is to catalyze research that deepens the integration of sparsity within AI.
Submission Number: 67
Loading