For n = 10, T = 1


====================================================================================
k = 1
last_hidden_states:
  Device: cuda:0
  Dtype: torch.float32
  Shape: torch.Size([1, 10, 768])

conv_last_hidden_states:
  Device: cuda:0
  Dtype: torch.float32
  Shape: torch.Size([1, 10, 768])

The tensors are not equal.
Frobenius norm of the difference: 325.7996520996094


====================================================================================
k = 5
last_hidden_states:
  Device: cuda:0
  Dtype: torch.float32
  Shape: torch.Size([1, 10, 768])

conv_last_hidden_states:
  Device: cuda:0
  Dtype: torch.float32
  Shape: torch.Size([1, 10, 768])

The tensors are not equal.
Frobenius norm of the difference: 50.71752166748047


====================================================================================
k = 10
last_hidden_states:
  Device: cuda:0
  Dtype: torch.float32
  Shape: torch.Size([1, 10, 768])

conv_last_hidden_states:
  Device: cuda:0
  Dtype: torch.float32
  Shape: torch.Size([1, 10, 768])

The tensors are not equal.
Frobenius norm of the difference: 0.3133018910884857


====================================================================================================================================================================================================================================================================================================================================================================================================================================
================================================================================================================================================================================================================================================================================================================================================
========================================================================================================================================================================
gpt2_forward:
last_hidden_states:
  Device: cuda:0
  Dtype: torch.float32
  Shape: torch.Size([1, 10, 768])
  values: tensor([[[-0.0470, -0.0333, -0.1626,  ..., -0.1337, -0.0571, -0.1059],
         [ 0.1794, -0.5020, -0.8029,  ...,  0.1106,  0.6057, -0.4551],
         [ 0.4096,  0.6703, -0.3369,  ...,  0.5203,  0.1552, -0.5944],
         ...,
         [ 0.3133,  0.3148, -0.3117,  ...,  0.0921,  0.0111, -0.3207],
         [ 0.1388, -0.1685, -1.0656,  ...,  0.1676,  0.1104, -0.0937],
         [ 0.1967, -0.3877, -0.1826,  ..., -0.0813,  0.2733, -0.1671]]],
       device='cuda:0')

conv_last_hidden_states:
  Device: cuda:0
  Dtype: torch.float32
  Shape: torch.Size([1, 10, 768])
  values: tensor([[[-0.0472, -0.0332, -0.1630,  ..., -0.1338, -0.0572, -0.1058],
         [ 0.1795, -0.5028, -0.8010,  ...,  0.1106,  0.6056, -0.4551],
         [ 0.4105,  0.6704, -0.3363,  ...,  0.5209,  0.1547, -0.5944],
         ...,
         [ 0.3133,  0.3164, -0.3104,  ...,  0.0922,  0.0118, -0.3207],
         [ 0.1392, -0.1679, -1.0655,  ...,  0.1674,  0.1102, -0.0936],
         [ 0.1975, -0.3885, -0.1811,  ..., -0.0801,  0.2730, -0.1642]]],
       device='cuda:0')

The tensors are not equal.
Frobenius norm of the difference: 0.3133018910884857
relative_frobenius_norm: 0.0004913280135951936


========================================================================================================================================================================
last_hidden_states:
  Device: cuda:0
  Dtype: torch.float32
  Shape: torch.Size([1, 10, 768])
  values: tensor([[[-0.0470, -0.0333, -0.1626,  ..., -0.1337, -0.0571, -0.1059],
         [ 0.1794, -0.5020, -0.8029,  ...,  0.1106,  0.6057, -0.4551],
         [ 0.4096,  0.6703, -0.3369,  ...,  0.5203,  0.1552, -0.5944],
         ...,
         [ 0.3133,  0.3148, -0.3117,  ...,  0.0921,  0.0111, -0.3207],
         [ 0.1388, -0.1685, -1.0656,  ...,  0.1676,  0.1104, -0.0937],
         [ 0.1967, -0.3877, -0.1826,  ..., -0.0813,  0.2733, -0.1671]]],
       device='cuda:0')

conv_last_hidden_states:
  Device: cuda:0
  Dtype: torch.float32
  Shape: torch.Size([1, 10, 768])
  values: tensor([[[-4.7056e-02, -3.3278e-02, -1.6261e-01,  ..., -1.3366e-01,
          -5.7127e-02, -1.0592e-01],
         [-4.9559e-02,  1.2354e-01, -1.6359e-01,  ..., -2.0341e-01,
          -2.0513e-04, -5.9779e-02],
         [-1.0269e-01,  1.7261e-01, -8.0596e-02,  ..., -1.7586e-01,
           1.1530e-01, -3.8030e-01],
         ...,
         [ 1.1823e-02,  1.3138e-01, -2.6222e-01,  ..., -1.5223e-01,
           3.3285e-02, -1.4703e-01],
         [-5.0319e-01,  1.1823e-01, -2.1798e-01,  ..., -1.2219e-01,
          -5.3527e-02, -2.5382e-01],
         [-1.7607e-01,  2.7658e-01,  5.4212e-02,  ..., -2.9552e-01,
           5.9917e-02, -3.3654e-01]]], device='cuda:0')

The tensors are not equal.
Frobenius norm of the difference: 349.7867126464844
relative_frobenius_norm: 0.5485444664955139
