Please replace fla in "https://github.com/fla-org/flash-linear-attention" with our fla, then we can try deltaformer. And do language modeling with their opensource code.

There is a triton kernel of deltaformer in folder deltaformer_triton, when tensor.shape = [2,32,8192,128], the speed is 1/3 of flash attention in H800 GPU.

There is some toy model in our experiments, we offer the code of section4.1, section 4.2 and Appendix B.1.

We are very sorry that some confusion in the notes in Section 3 may affect the reading, so we have updated a clearer version in main.pdf. 


