[------------------------------------------------------------------------ forward ------------------------------------------------------------------------]
                                                 |    MHA    |  Flash MHA  |     H3     |   Hyena   |  Laughing Hyena (conv)  |  Laughing Hyena (recurrent)
24 threads: -----------------------------------------------------------------------------------------------------------------------------------------------
      [b_size: 1, d_model: 768, seqlen: 1024]    |    390.8  |      310.8  |  326185.6  |   4467.6  |           1035.8        |            1109.3          
      [b_size: 1, d_model: 768, seqlen: 2048]    |   1036.8  |      529.7  |    4718.4  |   4063.2  |           1118.8        |            1059.1          
      [b_size: 1, d_model: 768, seqlen: 4096]    |   2812.4  |     1365.5  |    8696.3  |   7816.3  |           1738.1        |            1091.5          
      [b_size: 1, d_model: 768, seqlen: 8192]    |  11725.1  |     4216.1  |   17551.9  |  16170.2  |           3199.2        |            1069.5          
      [b_size: 1, d_model: 768, seqlen: 16384]   |  57852.1  |    13656.9  |   32917.1  |  30887.3  |           5887.5        |            1067.9          
      [b_size: 1, d_model: 768, seqlen: 32768]   |           |    57629.2  |   90141.1  |  84575.1  |          14732.1        |            2028.0          
      [b_size: 1, d_model: 768, seqlen: 65536]   |           |   220647.6  |            |           |          42811.9        |            3729.6          
      [b_size: 2, d_model: 768, seqlen: 1024]    |    589.7  |      374.7  |    5109.9  |   4312.1  |           1119.1        |                            
      [b_size: 2, d_model: 768, seqlen: 2048]    |   1812.2  |      852.3  |    8419.0  |   7598.8  |           1440.8        |                            
      [b_size: 2, d_model: 768, seqlen: 4096]    |   5425.0  |     2348.9  |   16563.9  |  15561.6  |           2726.0        |                            
      [b_size: 2, d_model: 768, seqlen: 8192]    |  21411.3  |     8005.4  |   34225.7  |  32593.0  |           5228.5        |                            
      [b_size: 2, d_model: 768, seqlen: 16384]   |           |    30300.0  |   65481.3  |  62893.2  |          10043.4        |                            
      [b_size: 2, d_model: 768, seqlen: 32768]   |           |   113863.2  |            |           |          26096.6        |                            
      [b_size: 2, d_model: 768, seqlen: 65536]   |           |   446811.3  |            |           |          72014.0        |                            
      [b_size: 4, d_model: 768, seqlen: 1024]    |   1040.8  |      651.0  |    8243.7  |   7431.0  |           1335.3        |                            
      [b_size: 4, d_model: 768, seqlen: 2048]    |   3525.6  |     1607.5  |   15871.8  |  14977.4  |           2563.5        |                            
      [b_size: 4, d_model: 768, seqlen: 4096]    |  10545.0  |     4560.0  |   32081.9  |  30801.7  |           4750.6        |                            
      [b_size: 4, d_model: 768, seqlen: 8192]    |  43225.6  |    15932.2  |   67119.8  |  65019.9  |           9562.5        |                            
      [b_size: 4, d_model: 768, seqlen: 16384]   |           |    58953.7  |            |           |          18956.7        |                            
      [b_size: 4, d_model: 768, seqlen: 32768]   |           |   223194.9  |            |           |          46190.9        |                            
      [b_size: 4, d_model: 768, seqlen: 65536]   |           |   879098.4  |            |           |         128527.2        |                            
      [b_size: 8, d_model: 768, seqlen: 1024]    |   1991.9  |     1185.1  |   15560.3  |  14700.6  |           2475.6        |                            
      [b_size: 8, d_model: 768, seqlen: 2048]    |   6859.3  |     2903.1  |   30818.0  |  29640.2  |           4613.3        |                            
      [b_size: 8, d_model: 768, seqlen: 4096]    |  21583.4  |     9434.3  |   63195.6  |  61484.3  |           9037.3        |                            
      [b_size: 8, d_model: 768, seqlen: 8192]    |           |    32220.5  |            |           |          18792.9        |                            
      [b_size: 8, d_model: 768, seqlen: 16384]   |           |   117842.3  |            |           |          36404.2        |                            
      [b_size: 8, d_model: 768, seqlen: 32768]   |           |   448749.0  |            |           |          87880.7        |                            
      [b_size: 8, d_model: 768, seqlen: 65536]   |           |  1760696.5  |            |           |                         |                            
      [b_size: 16, d_model: 768, seqlen: 1024]   |   3682.2  |     2157.4  |   30276.3  |  29139.9  |           4541.8        |                            
      [b_size: 16, d_model: 768, seqlen: 2048]   |  13713.6  |     6197.4  |   60613.3  |  59126.2  |           8858.1        |                            
      [b_size: 16, d_model: 768, seqlen: 4096]   |  43355.8  |    18829.9  |            |           |          18121.9        |                            
      [b_size: 16, d_model: 768, seqlen: 8192]   |           |    64622.2  |            |           |          36615.3        |                            
      [b_size: 16, d_model: 768, seqlen: 16384]  |           |   235785.8  |            |           |          68916.6        |                            
      [b_size: 16, d_model: 768, seqlen: 32768]  |           |   900673.8  |            |           |                         |                            
      [b_size: 16, d_model: 768, seqlen: 65536]  |           |  3517144.6  |            |           |                         |                            
      [b_size: 32, d_model: 768, seqlen: 1024]   |   7356.4  |     4504.5  |   59733.3  |  58107.6  |           8822.8        |                            
      [b_size: 32, d_model: 768, seqlen: 2048]   |  27471.0  |    12205.2  |            |           |          17882.0        |                            
      [b_size: 32, d_model: 768, seqlen: 4096]   |           |    38029.6  |            |           |          34666.2        |                            
      [b_size: 32, d_model: 768, seqlen: 8192]   |           |   130230.6  |            |           |          70389.0        |                            
      [b_size: 32, d_model: 768, seqlen: 16384]  |           |   474459.1  |            |           |                         |                            
      [b_size: 32, d_model: 768, seqlen: 32768]  |           |  1804111.2  |            |           |                         |                            
      [b_size: 64, d_model: 768, seqlen: 1024]   |  15271.1  |     9086.3  |            |           |          17893.9        |                            
      [b_size: 64, d_model: 768, seqlen: 2048]   |  54608.1  |    24213.5  |            |           |          35417.3        |                            
      [b_size: 64, d_model: 768, seqlen: 4096]   |           |    75868.8  |            |           |          69000.0        |                            
      [b_size: 64, d_model: 768, seqlen: 8192]   |           |   259038.3  |            |           |                         |                            
      [b_size: 64, d_model: 768, seqlen: 16384]  |           |   941311.0  |            |           |                         |                            

Times are in microseconds (us).

[--------------------------------------------------------------------------- Token Generation ----------------------------------------------------------------------------]
                                                              |  MHA  |  MHA (KV)  |  H3    |  H3 (KV)  |  Hyena  |  Laughing Hyena (recurrent)  |  Laughing Hyena (conv)
24 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------------
      [b_size: 1, d_model: 768, seqlen: 1024, n_tokens: 256]  |  3.3  |    3.6     |  11.6  |    7.9    |   6.7   |           7.2                |             8.8            
      [b_size: 8, d_model: 768, seqlen: 1024, n_tokens: 256]  |  3.6  |    3.8     |  12.7  |    8.6    |   7.4   |           7.6                |                            

Times are in seconds (s).