Dropping the Anchor: Statistical Context Summarization for Distributed Systems via Pulsar Attention

Aryan Sood; Shantanu Acharya

Dropping the Anchor: Statistical Context Summarization for Distributed Systems via Pulsar Attention

Aryan Sood, Shantanu Acharya

Published: 01 Jun 2026, Last Modified: 04 Jun 2026AdaptFM PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Inference Optimization, Distributed Systems, Sparse Attention, Long Context Optimization

TL;DR: Pulsar Attention replaces Star Attention's static anchor block with Max-IDF chunk summaries, cutting FLOPs by up to 3.3× while improving long-context accuracy.

Abstract: Inference with large language models (LLMs) on long sequences is computationally expensive due to the quadratic complexity of self-attention. Distributed blockwise methods such as Star Attention reduce this cost by sharding context across hosts, but rely on prepending a static, content-blind copy of the first block to every host. We propose Pulsar Attention, which replaces the static anchor with two lightweight, content-aware components: a small attention-sink prefix that stabilizes softmax, and compact cross-block summaries built via a Max-IDF heuristic that selects chunks containing globally rare tokens. This reduces the Phase 1 per-GPU FLOPs by up to 3.3$\times$ over Star Attention while retaining an identical KV cache footprint. On RULER and BABILong with Llama-3.1-8B, Pulsar Attention outperforms both Star Attention and dense attention at sequence lengths up to 128K tokens, with absolute gains of up to 4.7\% over the dense baseline.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 101

Loading