Why RoPE Struggles to Maintain Long-Term Decay in Long Sequences?
Blogpost Url: https://d2jud02ci9yv69.cloudfront.net/2025-04-28-pocp-43/blog/pocp/
Abstract: Rotary Position Embedding (RoPE) improves upon traditional positional encodings but struggles with long-term decay in contexts exceeding its training length, limiting the model's generalization to longer sequences. Our experiments suggest that this issue may stem from a high proportion of obtuse angles on the complex plane between the linear transformations of query and key embeddings.
Conflict Of Interest: I have no conflict of interest with the papers listed in the 'Ref Papers' section of this blogpost.
Submission Number: 13
Loading