factor 8.0
Unrecognized keys in `rope_scaling` for 'rope_type'='linear': {'type'}
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]Loading checkpoint shards:  50%|█████     | 2/4 [00:00<00:00, 13.27it/s]Loading checkpoint shards: 100%|██████████| 4/4 [00:00<00:00, 14.58it/s]Loading checkpoint shards: 100%|██████████| 4/4 [00:00<00:00, 14.35it/s]
Token indices sequence length is longer than the specified maximum sequence length for this model (3259 > 2048). Running this sequence through the model will result in indexing errors
Starting ...
Ready.
Quantizing 8bit 1/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.006        | -          | -         | 1.486 |
| v_proj.V         | 0.005        | -          | -         | 1.104 |
| q_proj.V         | 0.007        | -          | -         | 1.102 |
| k_proj.U         | 0.000        | -          | -         | 0.154 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.000        | -          | -         | 0.006 |
| o_proj.V         | 0.000        | -          | -         | 1.317 |
| o_proj.U         | 0.000        | -          | -         | 0.187 |
| up_proj.V        | 0.006        | -          | -         | 1.343 |
| gate_proj.V      | 0.006        | -          | -         | 1.098 |
| up_proj.U        | 0.000        | -          | -         | 0.171 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.861 |
| down_proj.U      | 0.000        | -          | -         | 0.188 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 1/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 7091.703     | -          | -         | 1.426 |
| v_proj.V         | 3870.665     | -          | -         | 1.101 |
| q_proj.V         | 6612.581     | -          | -         | 1.108 |
| k_proj.U         | 0.171        | -          | -         | 0.472 |
| v_proj.U         | 0.171        | -          | -         | 0.262 |
| q_proj.U         | 0.842        | -          | -         | 0.263 |
| o_proj.V         | 23.447       | -          | -         | 1.391 |
| o_proj.U         | 0.015        | -          | -         | 0.502 |
| up_proj.V        | 3106.686     | -          | -         | 1.412 |
| gate_proj.V      | 3111.262     | -          | -         | 1.107 |
| up_proj.U        | 3.533        | -          | -         | 0.625 |
| gate_proj.U      | 3.513        | -          | -         | 0.381 |
| down_proj.V      | 18.471       | -          | -         | 4.990 |
| down_proj.U      | 0.031        | -          | -         | 0.632 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 2/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.004        | -          | -         | 1.362 |
| v_proj.V         | 0.005        | -          | -         | 1.100 |
| q_proj.V         | 0.004        | -          | -         | 1.115 |
| k_proj.U         | 0.000        | -          | -         | 0.154 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.000        | -          | -         | 0.006 |
| o_proj.V         | 0.000        | -          | -         | 1.321 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.009        | -          | -         | 1.355 |
| gate_proj.V      | 0.009        | -          | -         | 1.100 |
| up_proj.U        | 0.000        | -          | -         | 0.173 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.905 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 2/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 4640.044     | -          | -         | 1.424 |
| v_proj.V         | 2864.045     | -          | -         | 1.106 |
| q_proj.V         | 3287.329     | -          | -         | 1.110 |
| k_proj.U         | 0.277        | -          | -         | 0.471 |
| v_proj.U         | 0.334        | -          | -         | 0.262 |
| q_proj.U         | 1.631        | -          | -         | 0.265 |
| o_proj.V         | 21.109       | -          | -         | 1.394 |
| o_proj.U         | 0.020        | -          | -         | 0.503 |
| up_proj.V        | 5566.966     | -          | -         | 1.415 |
| gate_proj.V      | 5579.236     | -          | -         | 1.107 |
| up_proj.U        | 6.665        | -          | -         | 0.618 |
| gate_proj.U      | 6.690        | -          | -         | 0.382 |
| down_proj.V      | 22.559       | -          | -         | 5.013 |
| down_proj.U      | 0.022        | -          | -         | 0.633 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 3/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.032        | -          | -         | 1.365 |
| v_proj.V         | 0.033        | -          | -         | 1.103 |
| q_proj.V         | 0.032        | -          | -         | 1.102 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.002        | -          | -         | 0.006 |
| o_proj.V         | 0.000        | -          | -         | 1.326 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.009        | -          | -         | 1.347 |
| gate_proj.V      | 0.010        | -          | -         | 1.100 |
| up_proj.U        | 0.000        | -          | -         | 0.173 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.908 |
| down_proj.U      | 0.000        | -          | -         | 0.191 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 3/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 11380.468    | -          | -         | 1.421 |
| v_proj.V         | 10761.998    | -          | -         | 1.108 |
| q_proj.V         | 10910.693    | -          | -         | 1.109 |
| k_proj.U         | 1.234        | -          | -         | 0.471 |
| v_proj.U         | 1.311        | -          | -         | 0.264 |
| q_proj.U         | 4.928        | -          | -         | 0.264 |
| o_proj.V         | 198.552      | -          | -         | 1.390 |
| o_proj.U         | 0.210        | -          | -         | 0.506 |
| up_proj.V        | 3519.751     | -          | -         | 1.414 |
| gate_proj.V      | 3510.425     | -          | -         | 1.108 |
| up_proj.U        | 4.100        | -          | -         | 0.618 |
| gate_proj.U      | 4.103        | -          | -         | 0.383 |
| down_proj.V      | 13.289       | -          | -         | 5.003 |
| down_proj.U      | 0.023        | -          | -         | 0.634 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 4/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.016        | -          | -         | 1.367 |
| v_proj.V         | 0.016        | -          | -         | 1.101 |
| q_proj.V         | 0.016        | -          | -         | 1.101 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.000        | -          | -         | 1.323 |
| o_proj.U         | 0.000        | -          | -         | 0.189 |
| up_proj.V        | 0.006        | -          | -         | 1.349 |
| gate_proj.V      | 0.006        | -          | -         | 1.099 |
| up_proj.U        | 0.000        | -          | -         | 0.173 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.901 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 4/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 6313.734     | -          | -         | 1.429 |
| v_proj.V         | 6325.966     | -          | -         | 1.104 |
| q_proj.V         | 6303.428     | -          | -         | 1.108 |
| k_proj.U         | 0.752        | -          | -         | 0.473 |
| v_proj.U         | 0.786        | -          | -         | 0.264 |
| q_proj.U         | 2.963        | -          | -         | 0.265 |
| o_proj.V         | 231.365      | -          | -         | 1.389 |
| o_proj.U         | 0.317        | -          | -         | 0.505 |
| up_proj.V        | 2689.167     | -          | -         | 1.419 |
| gate_proj.V      | 2695.519     | -          | -         | 1.108 |
| up_proj.U        | 3.360        | -          | -         | 0.618 |
| gate_proj.U      | 3.382        | -          | -         | 0.382 |
| down_proj.V      | 12.691       | -          | -         | 4.981 |
| down_proj.U      | 0.025        | -          | -         | 0.632 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 5/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.007        | -          | -         | 1.366 |
| v_proj.V         | 0.007        | -          | -         | 1.103 |
| q_proj.V         | 0.007        | -          | -         | 1.103 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.000        | -          | -         | 0.006 |
| o_proj.V         | 0.000        | -          | -         | 1.324 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.004        | -          | -         | 1.350 |
| gate_proj.V      | 0.004        | -          | -         | 1.102 |
| up_proj.U        | 0.000        | -          | -         | 0.173 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.903 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 5/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 3467.608     | -          | -         | 1.423 |
| v_proj.V         | 3446.292     | -          | -         | 1.106 |
| q_proj.V         | 3448.428     | -          | -         | 1.110 |
| k_proj.U         | 0.413        | -          | -         | 0.472 |
| v_proj.U         | 0.426        | -          | -         | 0.265 |
| q_proj.U         | 1.628        | -          | -         | 0.265 |
| o_proj.V         | 226.444      | -          | -         | 1.386 |
| o_proj.U         | 0.292        | -          | -         | 0.506 |
| up_proj.V        | 2266.974     | -          | -         | 1.416 |
| gate_proj.V      | 2267.322     | -          | -         | 1.105 |
| up_proj.U        | 2.930        | -          | -         | 0.616 |
| gate_proj.U      | 2.975        | -          | -         | 0.382 |
| down_proj.V      | 16.346       | -          | -         | 4.991 |
| down_proj.U      | 0.038        | -          | -         | 0.634 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 6/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.008        | -          | -         | 1.360 |
| v_proj.V         | 0.007        | -          | -         | 1.108 |
| q_proj.V         | 0.008        | -          | -         | 1.101 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.000        | -          | -         | 1.325 |
| o_proj.U         | 0.000        | -          | -         | 0.189 |
| up_proj.V        | 0.004        | -          | -         | 1.350 |
| gate_proj.V      | 0.004        | -          | -         | 1.103 |
| up_proj.U        | 0.000        | -          | -         | 0.173 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.905 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 6/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 3649.983     | -          | -         | 1.422 |
| v_proj.V         | 3641.749     | -          | -         | 1.106 |
| q_proj.V         | 3641.017     | -          | -         | 1.108 |
| k_proj.U         | 0.530        | -          | -         | 0.473 |
| v_proj.U         | 0.502        | -          | -         | 0.265 |
| q_proj.U         | 1.960        | -          | -         | 0.265 |
| o_proj.V         | 208.065      | -          | -         | 1.383 |
| o_proj.U         | 0.317        | -          | -         | 0.506 |
| up_proj.V        | 2153.517     | -          | -         | 1.416 |
| gate_proj.V      | 2150.911     | -          | -         | 1.110 |
| up_proj.U        | 2.865        | -          | -         | 0.619 |
| gate_proj.U      | 2.896        | -          | -         | 0.380 |
| down_proj.V      | 18.415       | -          | -         | 5.011 |
| down_proj.U      | 0.046        | -          | -         | 0.634 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 7/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.006        | -          | -         | 1.364 |
| v_proj.V         | 0.006        | -          | -         | 1.098 |
| q_proj.V         | 0.006        | -          | -         | 1.101 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.000        | -          | -         | 1.324 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.003        | -          | -         | 1.348 |
| gate_proj.V      | 0.003        | -          | -         | 1.102 |
| up_proj.U        | 0.000        | -          | -         | 0.173 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.903 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 7/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 2862.580     | -          | -         | 1.428 |
| v_proj.V         | 2878.078     | -          | -         | 1.110 |
| q_proj.V         | 2871.014     | -          | -         | 1.107 |
| k_proj.U         | 0.441        | -          | -         | 0.472 |
| v_proj.U         | 0.393        | -          | -         | 0.264 |
| q_proj.U         | 1.631        | -          | -         | 0.267 |
| o_proj.V         | 245.163      | -          | -         | 1.389 |
| o_proj.U         | 0.407        | -          | -         | 0.505 |
| up_proj.V        | 1998.523     | -          | -         | 1.411 |
| gate_proj.V      | 2003.937     | -          | -         | 1.106 |
| up_proj.U        | 2.841        | -          | -         | 0.617 |
| gate_proj.U      | 2.912        | -          | -         | 0.383 |
| down_proj.V      | 19.926       | -          | -         | 4.997 |
| down_proj.U      | 0.056        | -          | -         | 0.633 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 8/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.005        | -          | -         | 1.371 |
| v_proj.V         | 0.005        | -          | -         | 1.097 |
| q_proj.V         | 0.006        | -          | -         | 1.119 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.331 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.003        | -          | -         | 1.345 |
| gate_proj.V      | 0.003        | -          | -         | 1.099 |
| up_proj.U        | 0.000        | -          | -         | 0.173 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.899 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 8/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 2632.539     | -          | -         | 1.417 |
| v_proj.V         | 2644.693     | -          | -         | 1.109 |
| q_proj.V         | 2646.678     | -          | -         | 1.104 |
| k_proj.U         | 0.405        | -          | -         | 0.472 |
| v_proj.U         | 0.356        | -          | -         | 0.265 |
| q_proj.U         | 1.522        | -          | -         | 0.265 |
| o_proj.V         | 243.810      | -          | -         | 1.382 |
| o_proj.U         | 0.441        | -          | -         | 0.506 |
| up_proj.V        | 1766.784     | -          | -         | 1.413 |
| gate_proj.V      | 1765.808     | -          | -         | 1.107 |
| up_proj.U        | 2.487        | -          | -         | 0.618 |
| gate_proj.U      | 2.586        | -          | -         | 0.384 |
| down_proj.V      | 18.557       | -          | -         | 4.999 |
| down_proj.U      | 0.057        | -          | -         | 0.635 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 9/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.006        | -          | -         | 1.366 |
| v_proj.V         | 0.006        | -          | -         | 1.102 |
| q_proj.V         | 0.006        | -          | -         | 1.103 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.000        | -          | -         | 0.006 |
| o_proj.V         | 0.000        | -          | -         | 1.329 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.002        | -          | -         | 1.347 |
| gate_proj.V      | 0.002        | -          | -         | 1.101 |
| up_proj.U        | 0.000        | -          | -         | 0.172 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.900 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 9/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 2774.368     | -          | -         | 1.422 |
| v_proj.V         | 2773.205     | -          | -         | 1.102 |
| q_proj.V         | 2776.424     | -          | -         | 1.108 |
| k_proj.U         | 0.524        | -          | -         | 0.471 |
| v_proj.U         | 0.409        | -          | -         | 0.265 |
| q_proj.U         | 1.685        | -          | -         | 0.265 |
| o_proj.V         | 240.839      | -          | -         | 1.382 |
| o_proj.U         | 0.413        | -          | -         | 0.506 |
| up_proj.V        | 1650.815     | -          | -         | 1.415 |
| gate_proj.V      | 1647.718     | -          | -         | 1.108 |
| up_proj.U        | 2.199        | -          | -         | 0.618 |
| gate_proj.U      | 2.315        | -          | -         | 0.383 |
| down_proj.V      | 15.246       | -          | -         | 5.021 |
| down_proj.U      | 0.046        | -          | -         | 0.645 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 10/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.005        | -          | -         | 1.364 |
| v_proj.V         | 0.005        | -          | -         | 1.100 |
| q_proj.V         | 0.005        | -          | -         | 1.100 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.000        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.321 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.002        | -          | -         | 1.344 |
| gate_proj.V      | 0.002        | -          | -         | 1.102 |
| up_proj.U        | 0.000        | -          | -         | 0.173 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.909 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 10/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 2608.425     | -          | -         | 1.423 |
| v_proj.V         | 2609.941     | -          | -         | 1.105 |
| q_proj.V         | 2610.918     | -          | -         | 1.103 |
| k_proj.U         | 0.467        | -          | -         | 0.472 |
| v_proj.U         | 0.331        | -          | -         | 0.265 |
| q_proj.U         | 1.577        | -          | -         | 0.265 |
| o_proj.V         | 263.013      | -          | -         | 1.391 |
| o_proj.U         | 0.472        | -          | -         | 0.505 |
| up_proj.V        | 1524.616     | -          | -         | 1.414 |
| gate_proj.V      | 1525.632     | -          | -         | 1.110 |
| up_proj.U        | 2.018        | -          | -         | 0.616 |
| gate_proj.U      | 2.121        | -          | -         | 0.381 |
| down_proj.V      | 16.126       | -          | -         | 4.992 |
| down_proj.U      | 0.049        | -          | -         | 0.641 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 11/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.006        | -          | -         | 1.365 |
| v_proj.V         | 0.006        | -          | -         | 1.100 |
| q_proj.V         | 0.006        | -          | -         | 1.102 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.322 |
| o_proj.U         | 0.000        | -          | -         | 0.189 |
| up_proj.V        | 0.002        | -          | -         | 1.349 |
| gate_proj.V      | 0.002        | -          | -         | 1.101 |
| up_proj.U        | 0.000        | -          | -         | 0.173 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.910 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 11/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 2975.051     | -          | -         | 1.422 |
| v_proj.V         | 2998.146     | -          | -         | 1.104 |
| q_proj.V         | 2990.073     | -          | -         | 1.108 |
| k_proj.U         | 0.526        | -          | -         | 0.472 |
| v_proj.U         | 0.377        | -          | -         | 0.265 |
| q_proj.U         | 1.739        | -          | -         | 0.265 |
| o_proj.V         | 260.535      | -          | -         | 1.389 |
| o_proj.U         | 0.468        | -          | -         | 0.508 |
| up_proj.V        | 1495.536     | -          | -         | 1.412 |
| gate_proj.V      | 1496.515     | -          | -         | 1.105 |
| up_proj.U        | 2.007        | -          | -         | 0.617 |
| gate_proj.U      | 2.125        | -          | -         | 0.382 |
| down_proj.V      | 12.905       | -          | -         | 5.003 |
| down_proj.U      | 0.039        | -          | -         | 0.632 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 12/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.006        | -          | -         | 1.372 |
| v_proj.V         | 0.006        | -          | -         | 1.102 |
| q_proj.V         | 0.006        | -          | -         | 1.109 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.326 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.002        | -          | -         | 1.348 |
| gate_proj.V      | 0.002        | -          | -         | 1.108 |
| up_proj.U        | 0.000        | -          | -         | 0.171 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.902 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 12/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 2825.295     | -          | -         | 1.425 |
| v_proj.V         | 2825.654     | -          | -         | 1.110 |
| q_proj.V         | 2819.907     | -          | -         | 1.109 |
| k_proj.U         | 0.462        | -          | -         | 0.472 |
| v_proj.U         | 0.334        | -          | -         | 0.264 |
| q_proj.U         | 1.482        | -          | -         | 0.266 |
| o_proj.V         | 237.786      | -          | -         | 1.408 |
| o_proj.U         | 0.469        | -          | -         | 0.505 |
| up_proj.V        | 1464.101     | -          | -         | 1.423 |
| gate_proj.V      | 1465.030     | -          | -         | 1.116 |
| up_proj.U        | 1.895        | -          | -         | 0.617 |
| gate_proj.U      | 2.026        | -          | -         | 0.384 |
| down_proj.V      | 13.359       | -          | -         | 4.998 |
| down_proj.U      | 0.041        | -          | -         | 0.635 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 13/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.005        | -          | -         | 1.386 |
| v_proj.V         | 0.005        | -          | -         | 1.103 |
| q_proj.V         | 0.005        | -          | -         | 1.109 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.324 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.002        | -          | -         | 1.352 |
| gate_proj.V      | 0.002        | -          | -         | 1.101 |
| up_proj.U        | 0.000        | -          | -         | 0.173 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.920 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 13/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 2309.940     | -          | -         | 1.429 |
| v_proj.V         | 2328.239     | -          | -         | 1.111 |
| q_proj.V         | 2321.642     | -          | -         | 1.106 |
| k_proj.U         | 0.366        | -          | -         | 0.472 |
| v_proj.U         | 0.287        | -          | -         | 0.265 |
| q_proj.U         | 1.352        | -          | -         | 0.267 |
| o_proj.V         | 294.596      | -          | -         | 1.389 |
| o_proj.U         | 0.687        | -          | -         | 0.507 |
| up_proj.V        | 1399.799     | -          | -         | 1.413 |
| gate_proj.V      | 1402.110     | -          | -         | 1.107 |
| up_proj.U        | 2.197        | -          | -         | 0.619 |
| gate_proj.U      | 2.347        | -          | -         | 0.385 |
| down_proj.V      | 16.191       | -          | -         | 5.008 |
| down_proj.U      | 0.041        | -          | -         | 0.633 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 14/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.006        | -          | -         | 1.373 |
| v_proj.V         | 0.006        | -          | -         | 1.121 |
| q_proj.V         | 0.006        | -          | -         | 1.105 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.324 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.002        | -          | -         | 1.353 |
| gate_proj.V      | 0.002        | -          | -         | 1.102 |
| up_proj.U        | 0.000        | -          | -         | 0.173 |
| gate_proj.U      | 0.000        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.908 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 14/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 3057.947     | -          | -         | 1.432 |
| v_proj.V         | 3071.627     | -          | -         | 1.110 |
| q_proj.V         | 3074.193     | -          | -         | 1.106 |
| k_proj.U         | 0.677        | -          | -         | 0.472 |
| v_proj.U         | 0.454        | -          | -         | 0.265 |
| q_proj.U         | 2.395        | -          | -         | 0.266 |
| o_proj.V         | 333.403      | -          | -         | 1.386 |
| o_proj.U         | 0.726        | -          | -         | 0.505 |
| up_proj.V        | 1489.127     | -          | -         | 1.419 |
| gate_proj.V      | 1491.513     | -          | -         | 1.109 |
| up_proj.U        | 2.529        | -          | -         | 0.618 |
| gate_proj.U      | 2.681        | -          | -         | 0.383 |
| down_proj.V      | 20.177       | -          | -         | 5.014 |
| down_proj.U      | 0.058        | -          | -         | 0.635 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 15/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.007        | -          | -         | 1.365 |
| v_proj.V         | 0.007        | -          | -         | 1.103 |
| q_proj.V         | 0.007        | -          | -         | 1.104 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.325 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.002        | -          | -         | 1.352 |
| gate_proj.V      | 0.002        | -          | -         | 1.104 |
| up_proj.U        | 0.000        | -          | -         | 0.173 |
| gate_proj.U      | 0.001        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.911 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 15/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 3481.797     | -          | -         | 1.432 |
| v_proj.V         | 3489.533     | -          | -         | 1.103 |
| q_proj.V         | 3486.171     | -          | -         | 1.118 |
| k_proj.U         | 0.692        | -          | -         | 0.475 |
| v_proj.U         | 0.507        | -          | -         | 0.280 |
| q_proj.U         | 2.349        | -          | -         | 0.266 |
| o_proj.V         | 420.608      | -          | -         | 1.400 |
| o_proj.U         | 0.797        | -          | -         | 0.509 |
| up_proj.V        | 1881.785     | -          | -         | 1.419 |
| gate_proj.V      | 1884.433     | -          | -         | 1.132 |
| up_proj.U        | 3.145        | -          | -         | 0.618 |
| gate_proj.U      | 3.431        | -          | -         | 0.383 |
| down_proj.V      | 47.096       | -          | -         | 5.030 |
| down_proj.U      | 0.116        | -          | -         | 0.636 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 16/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.007        | -          | -         | 1.367 |
| v_proj.V         | 0.007        | -          | -         | 1.103 |
| q_proj.V         | 0.007        | -          | -         | 1.105 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.327 |
| o_proj.U         | 0.000        | -          | -         | 0.189 |
| up_proj.V        | 0.003        | -          | -         | 1.350 |
| gate_proj.V      | 0.003        | -          | -         | 1.109 |
| up_proj.U        | 0.001        | -          | -         | 0.173 |
| gate_proj.U      | 0.001        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.942 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 16/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 3557.872     | -          | -         | 1.437 |
| v_proj.V         | 3580.100     | -          | -         | 1.108 |
| q_proj.V         | 3570.950     | -          | -         | 1.121 |
| k_proj.U         | 0.794        | -          | -         | 0.474 |
| v_proj.U         | 0.596        | -          | -         | 0.265 |
| q_proj.U         | 2.751        | -          | -         | 0.265 |
| o_proj.V         | 336.801      | -          | -         | 1.389 |
| o_proj.U         | 0.578        | -          | -         | 0.506 |
| up_proj.V        | 2336.747     | -          | -         | 1.421 |
| gate_proj.V      | 2341.238     | -          | -         | 1.108 |
| up_proj.U        | 3.849        | -          | -         | 0.621 |
| gate_proj.U      | 4.168        | -          | -         | 0.384 |
| down_proj.V      | 42.777       | -          | -         | 5.005 |
| down_proj.U      | 0.123        | -          | -         | 0.636 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 17/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.008        | -          | -         | 1.368 |
| v_proj.V         | 0.008        | -          | -         | 1.113 |
| q_proj.V         | 0.008        | -          | -         | 1.105 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.326 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.003        | -          | -         | 1.362 |
| gate_proj.V      | 0.003        | -          | -         | 1.113 |
| up_proj.U        | 0.001        | -          | -         | 0.173 |
| gate_proj.U      | 0.001        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.908 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 17/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 3881.969     | -          | -         | 1.431 |
| v_proj.V         | 3892.122     | -          | -         | 1.107 |
| q_proj.V         | 3870.369     | -          | -         | 1.107 |
| k_proj.U         | 0.821        | -          | -         | 0.474 |
| v_proj.U         | 0.608        | -          | -         | 0.265 |
| q_proj.U         | 2.863        | -          | -         | 0.265 |
| o_proj.V         | 386.223      | -          | -         | 1.387 |
| o_proj.U         | 0.777        | -          | -         | 0.508 |
| up_proj.V        | 2619.648     | -          | -         | 1.416 |
| gate_proj.V      | 2613.729     | -          | -         | 1.108 |
| up_proj.U        | 4.081        | -          | -         | 0.618 |
| gate_proj.U      | 4.352        | -          | -         | 0.384 |
| down_proj.V      | 46.036       | -          | -         | 5.010 |
| down_proj.U      | 0.129        | -          | -         | 0.635 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 18/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.008        | -          | -         | 1.368 |
| v_proj.V         | 0.007        | -          | -         | 1.108 |
| q_proj.V         | 0.008        | -          | -         | 1.120 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.336 |
| o_proj.U         | 0.000        | -          | -         | 0.189 |
| up_proj.V        | 0.004        | -          | -         | 1.349 |
| gate_proj.V      | 0.004        | -          | -         | 1.101 |
| up_proj.U        | 0.001        | -          | -         | 0.176 |
| gate_proj.U      | 0.001        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.942 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 18/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 3917.260     | -          | -         | 1.431 |
| v_proj.V         | 3932.912     | -          | -         | 1.112 |
| q_proj.V         | 3930.615     | -          | -         | 1.110 |
| k_proj.U         | 0.904        | -          | -         | 0.474 |
| v_proj.U         | 0.650        | -          | -         | 0.264 |
| q_proj.U         | 3.023        | -          | -         | 0.265 |
| o_proj.V         | 312.078      | -          | -         | 1.391 |
| o_proj.U         | 0.546        | -          | -         | 0.507 |
| up_proj.V        | 2840.712     | -          | -         | 1.431 |
| gate_proj.V      | 2844.490     | -          | -         | 1.109 |
| up_proj.U        | 4.412        | -          | -         | 0.618 |
| gate_proj.U      | 4.734        | -          | -         | 0.382 |
| down_proj.V      | 54.126       | -          | -         | 5.017 |
| down_proj.U      | 0.149        | -          | -         | 0.645 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 19/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.008        | -          | -         | 1.370 |
| v_proj.V         | 0.008        | -          | -         | 1.105 |
| q_proj.V         | 0.008        | -          | -         | 1.104 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.327 |
| o_proj.U         | 0.000        | -          | -         | 0.189 |
| up_proj.V        | 0.004        | -          | -         | 1.351 |
| gate_proj.V      | 0.004        | -          | -         | 1.100 |
| up_proj.U        | 0.001        | -          | -         | 0.174 |
| gate_proj.U      | 0.001        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.904 |
| down_proj.U      | 0.000        | -          | -         | 0.191 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 19/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 4109.248     | -          | -         | 1.427 |
| v_proj.V         | 4148.021     | -          | -         | 1.106 |
| q_proj.V         | 4107.688     | -          | -         | 1.110 |
| k_proj.U         | 0.991        | -          | -         | 0.473 |
| v_proj.U         | 0.711        | -          | -         | 0.264 |
| q_proj.U         | 3.101        | -          | -         | 0.265 |
| o_proj.V         | 344.297      | -          | -         | 1.410 |
| o_proj.U         | 0.549        | -          | -         | 0.508 |
| up_proj.V        | 3154.302     | -          | -         | 1.420 |
| gate_proj.V      | 3141.314     | -          | -         | 1.109 |
| up_proj.U        | 4.753        | -          | -         | 0.620 |
| gate_proj.U      | 5.053        | -          | -         | 0.392 |
| down_proj.V      | 57.438       | -          | -         | 5.020 |
| down_proj.U      | 0.159        | -          | -         | 0.636 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 20/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.008        | -          | -         | 1.365 |
| v_proj.V         | 0.008        | -          | -         | 1.101 |
| q_proj.V         | 0.008        | -          | -         | 1.101 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.330 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.004        | -          | -         | 1.353 |
| gate_proj.V      | 0.004        | -          | -         | 1.103 |
| up_proj.U        | 0.001        | -          | -         | 0.173 |
| gate_proj.U      | 0.002        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.930 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 20/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 4093.926     | -          | -         | 1.431 |
| v_proj.V         | 4102.524     | -          | -         | 1.110 |
| q_proj.V         | 4096.876     | -          | -         | 1.111 |
| k_proj.U         | 0.920        | -          | -         | 0.472 |
| v_proj.U         | 0.721        | -          | -         | 0.264 |
| q_proj.U         | 3.189        | -          | -         | 0.268 |
| o_proj.V         | 420.465      | -          | -         | 1.394 |
| o_proj.U         | 0.588        | -          | -         | 0.515 |
| up_proj.V        | 3410.394     | -          | -         | 1.420 |
| gate_proj.V      | 3406.085     | -          | -         | 1.119 |
| up_proj.U        | 4.989        | -          | -         | 0.618 |
| gate_proj.U      | 5.236        | -          | -         | 0.384 |
| down_proj.V      | 72.452       | -          | -         | 5.009 |
| down_proj.U      | 0.173        | -          | -         | 0.635 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 21/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.009        | -          | -         | 1.367 |
| v_proj.V         | 0.009        | -          | -         | 1.109 |
| q_proj.V         | 0.009        | -          | -         | 1.103 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.326 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.005        | -          | -         | 1.351 |
| gate_proj.V      | 0.005        | -          | -         | 1.099 |
| up_proj.U        | 0.001        | -          | -         | 0.173 |
| gate_proj.U      | 0.001        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.918 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 21/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 4425.334     | -          | -         | 1.428 |
| v_proj.V         | 4436.422     | -          | -         | 1.112 |
| q_proj.V         | 4437.888     | -          | -         | 1.114 |
| k_proj.U         | 1.029        | -          | -         | 0.474 |
| v_proj.U         | 0.776        | -          | -         | 0.266 |
| q_proj.U         | 3.499        | -          | -         | 0.267 |
| o_proj.V         | 402.877      | -          | -         | 1.400 |
| o_proj.U         | 0.582        | -          | -         | 0.507 |
| up_proj.V        | 3506.640     | -          | -         | 1.417 |
| gate_proj.V      | 3502.426     | -          | -         | 1.118 |
| up_proj.U        | 5.351        | -          | -         | 0.620 |
| gate_proj.U      | 5.650        | -          | -         | 0.383 |
| down_proj.V      | 60.441       | -          | -         | 5.038 |
| down_proj.U      | 0.156        | -          | -         | 0.636 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 22/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.009        | -          | -         | 1.370 |
| v_proj.V         | 0.008        | -          | -         | 1.102 |
| q_proj.V         | 0.009        | -          | -         | 1.105 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.329 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.005        | -          | -         | 1.363 |
| gate_proj.V      | 0.005        | -          | -         | 1.102 |
| up_proj.U        | 0.001        | -          | -         | 0.177 |
| gate_proj.U      | 0.002        | -          | -         | 0.007 |
| down_proj.V      | 0.000        | -          | -         | 4.929 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 22/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 4417.526     | -          | -         | 1.432 |
| v_proj.V         | 4424.030     | -          | -         | 1.112 |
| q_proj.V         | 4417.484     | -          | -         | 1.113 |
| k_proj.U         | 0.983        | -          | -         | 0.471 |
| v_proj.U         | 0.757        | -          | -         | 0.274 |
| q_proj.U         | 3.402        | -          | -         | 0.266 |
| o_proj.V         | 500.492      | -          | -         | 1.390 |
| o_proj.U         | 0.944        | -          | -         | 0.506 |
| up_proj.V        | 3612.212     | -          | -         | 1.421 |
| gate_proj.V      | 3604.524     | -          | -         | 1.110 |
| up_proj.U        | 5.620        | -          | -         | 0.618 |
| gate_proj.U      | 5.894        | -          | -         | 0.385 |
| down_proj.V      | 75.303       | -          | -         | 5.001 |
| down_proj.U      | 0.182        | -          | -         | 0.637 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 23/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.010        | -          | -         | 1.367 |
| v_proj.V         | 0.010        | -          | -         | 1.125 |
| q_proj.V         | 0.010        | -          | -         | 1.106 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.001        | -          | -         | 1.327 |
| o_proj.U         | 0.000        | -          | -         | 0.189 |
| up_proj.V        | 0.005        | -          | -         | 1.361 |
| gate_proj.V      | 0.005        | -          | -         | 1.104 |
| up_proj.U        | 0.001        | -          | -         | 0.174 |
| gate_proj.U      | 0.001        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.922 |
| down_proj.U      | 0.000        | -          | -         | 0.191 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 23/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 5068.266     | -          | -         | 1.421 |
| v_proj.V         | 5056.747     | -          | -         | 1.108 |
| q_proj.V         | 5064.162     | -          | -         | 1.109 |
| k_proj.U         | 1.220        | -          | -         | 0.482 |
| v_proj.U         | 0.969        | -          | -         | 0.266 |
| q_proj.U         | 4.043        | -          | -         | 0.266 |
| o_proj.V         | 516.372      | -          | -         | 1.395 |
| o_proj.U         | 0.932        | -          | -         | 0.514 |
| up_proj.V        | 3975.776     | -          | -         | 1.420 |
| gate_proj.V      | 3963.773     | -          | -         | 1.111 |
| up_proj.U        | 6.457        | -          | -         | 0.621 |
| gate_proj.U      | 6.716        | -          | -         | 0.384 |
| down_proj.V      | 67.711       | -          | -         | 5.012 |
| down_proj.U      | 0.170        | -          | -         | 0.636 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 24/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.010        | -          | -         | 1.367 |
| v_proj.V         | 0.010        | -          | -         | 1.098 |
| q_proj.V         | 0.010        | -          | -         | 1.102 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.002        | -          | -         | 1.343 |
| o_proj.U         | 0.000        | -          | -         | 0.189 |
| up_proj.V        | 0.006        | -          | -         | 1.354 |
| gate_proj.V      | 0.006        | -          | -         | 1.104 |
| up_proj.U        | 0.001        | -          | -         | 0.174 |
| gate_proj.U      | 0.002        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.912 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 24/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 5040.887     | -          | -         | 1.434 |
| v_proj.V         | 5050.106     | -          | -         | 1.111 |
| q_proj.V         | 5047.354     | -          | -         | 1.113 |
| k_proj.U         | 1.098        | -          | -         | 0.474 |
| v_proj.U         | 0.960        | -          | -         | 0.266 |
| q_proj.U         | 3.945        | -          | -         | 0.266 |
| o_proj.V         | 679.770      | -          | -         | 1.394 |
| o_proj.U         | 1.009        | -          | -         | 0.508 |
| up_proj.V        | 4254.437     | -          | -         | 1.433 |
| gate_proj.V      | 4249.560     | -          | -         | 1.109 |
| up_proj.U        | 6.712        | -          | -         | 0.618 |
| gate_proj.U      | 6.963        | -          | -         | 0.382 |
| down_proj.V      | 83.487       | -          | -         | 5.035 |
| down_proj.U      | 0.200        | -          | -         | 0.637 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 25/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.010        | -          | -         | 1.364 |
| v_proj.V         | 0.010        | -          | -         | 1.095 |
| q_proj.V         | 0.011        | -          | -         | 1.108 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.002        | -          | -         | 1.328 |
| o_proj.U         | 0.000        | -          | -         | 0.189 |
| up_proj.V        | 0.006        | -          | -         | 1.349 |
| gate_proj.V      | 0.006        | -          | -         | 1.103 |
| up_proj.U        | 0.001        | -          | -         | 0.174 |
| gate_proj.U      | 0.002        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.916 |
| down_proj.U      | 0.000        | -          | -         | 0.191 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 25/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 5347.964     | -          | -         | 1.425 |
| v_proj.V         | 5329.667     | -          | -         | 1.113 |
| q_proj.V         | 5311.994     | -          | -         | 1.111 |
| k_proj.U         | 1.140        | -          | -         | 0.473 |
| v_proj.U         | 0.915        | -          | -         | 0.264 |
| q_proj.U         | 3.947        | -          | -         | 0.265 |
| o_proj.V         | 933.773      | -          | -         | 1.394 |
| o_proj.U         | 1.253        | -          | -         | 0.508 |
| up_proj.V        | 4487.670     | -          | -         | 1.417 |
| gate_proj.V      | 4449.044     | -          | -         | 1.108 |
| up_proj.U        | 6.663        | -          | -         | 0.617 |
| gate_proj.U      | 6.850        | -          | -         | 0.390 |
| down_proj.V      | 86.644       | -          | -         | 5.006 |
| down_proj.U      | 0.213        | -          | -         | 0.634 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 26/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.011        | -          | -         | 1.385 |
| v_proj.V         | 0.010        | -          | -         | 1.101 |
| q_proj.V         | 0.011        | -          | -         | 1.122 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.002        | -          | -         | 0.006 |
| o_proj.V         | 0.003        | -          | -         | 1.335 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.006        | -          | -         | 1.348 |
| gate_proj.V      | 0.006        | -          | -         | 1.104 |
| up_proj.U        | 0.001        | -          | -         | 0.173 |
| gate_proj.U      | 0.002        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.919 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 26/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 5389.927     | -          | -         | 1.439 |
| v_proj.V         | 5394.093     | -          | -         | 1.118 |
| q_proj.V         | 5421.042     | -          | -         | 1.114 |
| k_proj.U         | 1.072        | -          | -         | 0.474 |
| v_proj.U         | 0.938        | -          | -         | 0.264 |
| q_proj.U         | 3.667        | -          | -         | 0.266 |
| o_proj.V         | 1051.634     | -          | -         | 1.394 |
| o_proj.U         | 1.485        | -          | -         | 0.507 |
| up_proj.V        | 4526.425     | -          | -         | 1.430 |
| gate_proj.V      | 4502.960     | -          | -         | 1.111 |
| up_proj.U        | 6.687        | -          | -         | 0.620 |
| gate_proj.U      | 6.788        | -          | -         | 0.384 |
| down_proj.V      | 116.918      | -          | -         | 5.046 |
| down_proj.U      | 0.235        | -          | -         | 0.635 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 27/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.010        | -          | -         | 1.374 |
| v_proj.V         | 0.011        | -          | -         | 1.118 |
| q_proj.V         | 0.011        | -          | -         | 1.105 |
| k_proj.U         | 0.000        | -          | -         | 0.156 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.002        | -          | -         | 0.006 |
| o_proj.V         | 0.003        | -          | -         | 1.324 |
| o_proj.U         | 0.000        | -          | -         | 0.188 |
| up_proj.V        | 0.006        | -          | -         | 1.356 |
| gate_proj.V      | 0.006        | -          | -         | 1.101 |
| up_proj.U        | 0.001        | -          | -         | 0.173 |
| gate_proj.U      | 0.002        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.929 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 27/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 5115.418     | -          | -         | 1.423 |
| v_proj.V         | 5104.171     | -          | -         | 1.124 |
| q_proj.V         | 5109.068     | -          | -         | 1.107 |
| k_proj.U         | 1.084        | -          | -         | 0.472 |
| v_proj.U         | 0.835        | -          | -         | 0.266 |
| q_proj.U         | 3.759        | -          | -         | 0.266 |
| o_proj.V         | 940.232      | -          | -         | 1.401 |
| o_proj.U         | 1.338        | -          | -         | 0.507 |
| up_proj.V        | 4787.542     | -          | -         | 1.422 |
| gate_proj.V      | 4780.908     | -          | -         | 1.124 |
| up_proj.U        | 6.649        | -          | -         | 0.619 |
| gate_proj.U      | 6.809        | -          | -         | 0.384 |
| down_proj.V      | 122.581      | -          | -         | 5.031 |
| down_proj.U      | 0.259        | -          | -         | 0.638 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 28/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.011        | -          | -         | 1.368 |
| v_proj.V         | 0.011        | -          | -         | 1.106 |
| q_proj.V         | 0.011        | -          | -         | 1.103 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.001        | -          | -         | 0.006 |
| o_proj.V         | 0.004        | -          | -         | 1.328 |
| o_proj.U         | 0.000        | -          | -         | 0.189 |
| up_proj.V        | 0.007        | -          | -         | 1.354 |
| gate_proj.V      | 0.007        | -          | -         | 1.101 |
| up_proj.U        | 0.002        | -          | -         | 0.174 |
| gate_proj.U      | 0.002        | -          | -         | 0.008 |
| down_proj.V      | 0.000        | -          | -         | 4.926 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 28/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 5548.814     | -          | -         | 1.429 |
| v_proj.V         | 5554.358     | -          | -         | 1.109 |
| q_proj.V         | 5545.305     | -          | -         | 1.110 |
| k_proj.U         | 1.115        | -          | -         | 0.473 |
| v_proj.U         | 0.910        | -          | -         | 0.272 |
| q_proj.U         | 3.678        | -          | -         | 0.266 |
| o_proj.V         | 1213.036     | -          | -         | 1.385 |
| o_proj.U         | 1.702        | -          | -         | 0.506 |
| up_proj.V        | 4888.860     | -          | -         | 1.419 |
| gate_proj.V      | 4888.718     | -          | -         | 1.123 |
| up_proj.U        | 6.627        | -          | -         | 0.619 |
| gate_proj.U      | 6.830        | -          | -         | 0.385 |
| down_proj.V      | 148.963      | -          | -         | 5.021 |
| down_proj.U      | 0.303        | -          | -         | 0.636 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 29/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.011        | -          | -         | 1.370 |
| v_proj.V         | 0.011        | -          | -         | 1.110 |
| q_proj.V         | 0.011        | -          | -         | 1.117 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.002        | -          | -         | 0.006 |
| o_proj.V         | 0.003        | -          | -         | 1.333 |
| o_proj.U         | 0.000        | -          | -         | 0.189 |
| up_proj.V        | 0.007        | -          | -         | 1.356 |
| gate_proj.V      | 0.007        | -          | -         | 1.113 |
| up_proj.U        | 0.002        | -          | -         | 0.174 |
| gate_proj.U      | 0.003        | -          | -         | 0.006 |
| down_proj.V      | 0.000        | -          | -         | 4.923 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 29/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 5265.131     | -          | -         | 1.427 |
| v_proj.V         | 5272.613     | -          | -         | 1.117 |
| q_proj.V         | 5276.472     | -          | -         | 1.119 |
| k_proj.U         | 0.955        | -          | -         | 0.473 |
| v_proj.U         | 0.785        | -          | -         | 0.265 |
| q_proj.U         | 3.390        | -          | -         | 0.267 |
| o_proj.V         | 1141.549     | -          | -         | 1.391 |
| o_proj.U         | 1.716        | -          | -         | 0.509 |
| up_proj.V        | 5481.720     | -          | -         | 1.422 |
| gate_proj.V      | 5470.466     | -          | -         | 1.112 |
| up_proj.U        | 7.463        | -          | -         | 0.620 |
| gate_proj.U      | 7.674        | -          | -         | 0.383 |
| down_proj.V      | 191.225      | -          | -         | 5.028 |
| down_proj.U      | 0.406        | -          | -         | 0.636 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 30/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.010        | -          | -         | 1.364 |
| v_proj.V         | 0.010        | -          | -         | 1.112 |
| q_proj.V         | 0.011        | -          | -         | 1.105 |
| k_proj.U         | 0.001        | -          | -         | 0.155 |
| v_proj.U         | 0.001        | -          | -         | 0.006 |
| q_proj.U         | 0.002        | -          | -         | 0.006 |
| o_proj.V         | 0.006        | -          | -         | 1.326 |
| o_proj.U         | 0.000        | -          | -         | 0.189 |
| up_proj.V        | 0.008        | -          | -         | 1.354 |
| gate_proj.V      | 0.008        | -          | -         | 1.102 |
| up_proj.U        | 0.003        | -          | -         | 0.174 |
| gate_proj.U      | 0.004        | -          | -         | 0.006 |
| down_proj.V      | 0.001        | -          | -         | 4.941 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 30/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 5230.329     | -          | -         | 1.427 |
| v_proj.V         | 5257.919     | -          | -         | 1.114 |
| q_proj.V         | 5242.799     | -          | -         | 1.112 |
| k_proj.U         | 0.968        | -          | -         | 0.473 |
| v_proj.U         | 0.776        | -          | -         | 0.266 |
| q_proj.U         | 3.331        | -          | -         | 0.266 |
| o_proj.V         | 2138.087     | -          | -         | 1.395 |
| o_proj.U         | 2.404        | -          | -         | 0.508 |
| up_proj.V        | 5858.712     | -          | -         | 1.429 |
| gate_proj.V      | 5869.475     | -          | -         | 1.107 |
| up_proj.U        | 8.137        | -          | -         | 0.619 |
| gate_proj.U      | 8.552        | -          | -         | 0.387 |
| down_proj.V      | 346.489      | -          | -         | 5.011 |
| down_proj.U      | 0.653        | -          | -         | 0.636 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 31/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.011        | -          | -         | 1.367 |
| v_proj.V         | 0.012        | -          | -         | 1.105 |
| q_proj.V         | 0.011        | -          | -         | 1.108 |
| k_proj.U         | 0.000        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.003        | -          | -         | 0.006 |
| o_proj.V         | 0.004        | -          | -         | 1.332 |
| o_proj.U         | 0.000        | -          | -         | 0.190 |
| up_proj.V        | 0.008        | -          | -         | 1.354 |
| gate_proj.V      | 0.008        | -          | -         | 1.107 |
| up_proj.U        | 0.003        | -          | -         | 0.174 |
| gate_proj.U      | 0.003        | -          | -         | 0.006 |
| down_proj.V      | 0.001        | -          | -         | 4.929 |
| down_proj.U      | 0.000        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 31/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 5430.750     | -          | -         | 1.438 |
| v_proj.V         | 5477.145     | -          | -         | 1.113 |
| q_proj.V         | 5459.912     | -          | -         | 1.114 |
| k_proj.U         | 1.137        | -          | -         | 0.474 |
| v_proj.U         | 0.883        | -          | -         | 0.266 |
| q_proj.U         | 3.834        | -          | -         | 0.269 |
| o_proj.V         | 1928.000     | -          | -         | 1.412 |
| o_proj.U         | 2.599        | -          | -         | 0.509 |
| up_proj.V        | 5959.600     | -          | -         | 1.427 |
| gate_proj.V      | 5968.585     | -          | -         | 1.112 |
| up_proj.U        | 7.788        | -          | -         | 0.619 |
| gate_proj.U      | 8.316        | -          | -         | 0.385 |
| down_proj.V      | 652.960      | -          | -         | 5.017 |
| down_proj.U      | 1.042        | -          | -         | 0.646 |
+------------------+--------------+------------+-----------+-------+


Quantizing 8bit 32/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 0.007        | -          | -         | 1.371 |
| v_proj.V         | 0.007        | -          | -         | 1.102 |
| q_proj.V         | 0.007        | -          | -         | 1.117 |
| k_proj.U         | 0.001        | -          | -         | 0.155 |
| v_proj.U         | 0.000        | -          | -         | 0.006 |
| q_proj.U         | 0.004        | -          | -         | 0.006 |
| o_proj.V         | 0.005        | -          | -         | 1.322 |
| o_proj.U         | 0.001        | -          | -         | 0.189 |
| up_proj.V        | 0.007        | -          | -         | 1.354 |
| gate_proj.V      | 0.007        | -          | -         | 1.105 |
| up_proj.U        | 0.004        | -          | -         | 0.174 |
| gate_proj.U      | 0.005        | -          | -         | 0.006 |
| down_proj.V      | 0.011        | -          | -         | 4.934 |
| down_proj.U      | 0.003        | -          | -         | 0.190 |
+------------------+--------------+------------+-----------+-------+


Quantizing 2bit 32/32..
+------------------+--------------+------------+-----------+-------+
|       name       | weight_error | fp_inp_SNR | q_inp_SNR | time  |
+==================+==============+============+===========+=======+
| k_proj.V         | 2962.723     | -          | -         | 1.425 |
| v_proj.V         | 3001.111     | -          | -         | 1.112 |
| q_proj.V         | 3012.802     | -          | -         | 1.116 |
| k_proj.U         | 0.557        | -          | -         | 0.474 |
| v_proj.U         | 0.468        | -          | -         | 0.267 |
| q_proj.U         | 2.031        | -          | -         | 0.266 |
| o_proj.V         | 1261.697     | -          | -         | 1.403 |
| o_proj.U         | 1.445        | -          | -         | 0.507 |
| up_proj.V        | 4491.606     | -          | -         | 1.432 |
| gate_proj.V      | 4502.713     | -          | -         | 1.127 |
| up_proj.U        | 5.870        | -          | -         | 0.627 |
| gate_proj.U      | 6.294        | -          | -         | 0.383 |
| down_proj.V      | 762.164      | -          | -         | 5.043 |
| down_proj.U      | 0.970        | -          | -         | 0.637 |
+------------------+--------------+------------+-----------+-------+


2967.8159685134888
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]Loading checkpoint shards:  50%|█████     | 2/4 [00:00<00:00, 15.74it/s]Loading checkpoint shards: 100%|██████████| 4/4 [00:00<00:00, 19.85it/s]
[2025-04-21 21:00:43,371] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
[93m [WARNING] [0m async_io: please install the libaio-dev package with apt
[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[93m [WARNING] [0m using untested triton version (2.3.1), only 1.0.0 is known to be compatible
