Toggle navigation
OpenReview
.net
Login
×
Go to
DBLP
homepage
CHAI: Clustered Head Attention for Efficient LLM Inference
Saurabh Agarwal
,
Bilge Acun
,
Basil Hosmer
,
Mostafa Elhoushi
,
Yejin Lee
,
Shivaram Venkataraman
,
Dimitris Papailiopoulos
,
Carole-Jean Wu
Published: 01 Jan 2024, Last Modified: 15 May 2025
ICML 2024
Everyone
Revisions
BibTeX
CC BY-SA 4.0
Loading