Abstract: Highlights•Novel ODE method for implicit local position encoding in Transformers.•Capture natural sequence position without extra embeddings.•Recurrent attention network captures longer dependencies.•Our method is adaptable, applicable in long-term language modeling.
External IDs:dblp:journals/prl/JiWQKW24
Loading