====================================================================================================
    - data : ../data/wikitext-2/
    - dataset : wt103
    - n_layer : 16
    - n_head : 10
    - d_head : 40
    - d_embed : 400
    - d_model : 400
    - d_inner : 900
    - dropout : 0.2
    - dropoute : 0.2
    - dropouto : 0.5
    - dropouti : 0.6
    - dropatt : 0.2
    - init : normal
    - emb_init : normal
    - init_range : 0.1
    - emb_init_range : 0.01
    - init_std : 0.02
    - proj_init_std : 0.01
    - optim : adam
    - lr : 0.00035
    - mom : 0.0
    - scheduler : cosine
    - warmup_step : 3000
    - decay_rate : 0.5
    - lr_min : 0.0
    - clip : 0.25
    - clip_nonemb : False
    - max_step : 125000
    - batch_size : 32
    - batch_chunk : 1
    - tgt_len : 150
    - eval_tgt_len : 150
    - ext_len : 0
    - mem_len : 150
    - not_tied : False
    - seed : 666
    - cuda : True
    - adaptive : True
    - div_val : 1
    - pre_lnorm : False
    - varlen : False
    - multi_gpu : False
    - log_interval : 50
    - eval_interval : 400
    - work_dir : LM-TFM-wt103-666
    - restart : False
    - restart_dir : 
    - debug : False
    - same_length : False
    - attn_type : 0
    - clamp_len : -1
    - eta_min : 0.0
    - gpu0_bsz : 1
    - max_eval_steps : -1
    - sample_softmax : -1
    - patience : 0
    - finetune_v2 : False
    - finetune_v3 : False
    - fp16 : False
    - static_loss_scale : 1
    - dynamic_loss_scale : True
    - wdecay : 1.2e-06
    - tied : True
    - n_token : 33278
    - n_all_param : 37712881
    - n_nonemb_param : 24366400
====================================================================================================
#params = 37712881
#non emb params = 24366400
| epoch   1 step       50 |     50 batches | lr 5.83e-06 | ms/batch 421.58 | loss 10.16 | ppl 25928.002
| epoch   1 step      100 |    100 batches | lr 1.17e-05 | ms/batch 417.24 | loss  9.63 | ppl 15202.686
| epoch   1 step      150 |    150 batches | lr 1.75e-05 | ms/batch 417.21 | loss  9.21 | ppl  9978.391
| epoch   1 step      200 |    200 batches | lr 2.33e-05 | ms/batch 418.16 | loss  8.91 | ppl  7418.473
| epoch   1 step      250 |    250 batches | lr 2.92e-05 | ms/batch 417.79 | loss  8.53 | ppl  5078.606
| epoch   1 step      300 |    300 batches | lr 3.5e-05 | ms/batch 419.49 | loss  8.13 | ppl  3388.574
| epoch   1 step      350 |    350 batches | lr 4.08e-05 | ms/batch 417.44 | loss  7.71 | ppl  2236.735
| epoch   1 step      400 |    400 batches | lr 4.67e-05 | ms/batch 417.78 | loss  7.39 | ppl  1624.435
----------------------------------------------------------------------------------------------------
| Eval   1 at step      400 | time: 173.60s | valid loss  6.85 | valid ppl   941.959
----------------------------------------------------------------------------------------------------
| epoch   2 step      450 |     14 batches | lr 5.25e-05 | ms/batch 576.72 | loss  7.18 | ppl  1311.495
| epoch   2 step      500 |     64 batches | lr 5.83e-05 | ms/batch 418.15 | loss  7.05 | ppl  1147.806
| epoch   2 step      550 |    114 batches | lr 6.42e-05 | ms/batch 419.30 | loss  7.00 | ppl  1100.153
| epoch   2 step      600 |    164 batches | lr 7e-05 | ms/batch 418.78 | loss  6.96 | ppl  1055.713
| epoch   2 step      650 |    214 batches | lr 7.58e-05 | ms/batch 419.43 | loss  6.95 | ppl  1045.807
| epoch   2 step      700 |    264 batches | lr 8.17e-05 | ms/batch 419.93 | loss  6.87 | ppl   963.278
| epoch   2 step      750 |    314 batches | lr 8.75e-05 | ms/batch 420.06 | loss  6.87 | ppl   962.795
| epoch   2 step      800 |    364 batches | lr 9.33e-05 | ms/batch 420.16 | loss  6.79 | ppl   893.023
----------------------------------------------------------------------------------------------------
| Eval   2 at step      800 | time: 173.61s | valid loss  6.31 | valid ppl   551.283
----------------------------------------------------------------------------------------------------
| epoch   2 step      850 |    414 batches | lr 9.92e-05 | ms/batch 580.37 | loss  6.77 | ppl   870.722
| epoch   3 step      900 |     28 batches | lr 0.000105 | ms/batch 413.76 | loss  6.71 | ppl   819.824
| epoch   3 step      950 |     78 batches | lr 0.000111 | ms/batch 419.82 | loss  6.68 | ppl   795.675
| epoch   3 step     1000 |    128 batches | lr 0.000117 | ms/batch 422.32 | loss  6.60 | ppl   737.047
| epoch   3 step     1050 |    178 batches | lr 0.000122 | ms/batch 421.73 | loss  6.60 | ppl   734.310
| epoch   3 step     1100 |    228 batches | lr 0.000128 | ms/batch 421.07 | loss  6.59 | ppl   727.776
| epoch   3 step     1150 |    278 batches | lr 0.000134 | ms/batch 419.15 | loss  6.52 | ppl   677.654
| epoch   3 step     1200 |    328 batches | lr 0.00014 | ms/batch 422.27 | loss  6.51 | ppl   674.284
----------------------------------------------------------------------------------------------------
| Eval   3 at step     1200 | time: 174.38s | valid loss  5.97 | valid ppl   390.001
----------------------------------------------------------------------------------------------------
| epoch   3 step     1250 |    378 batches | lr 0.000146 | ms/batch 590.81 | loss  6.46 | ppl   637.991
| epoch   3 step     1300 |    428 batches | lr 0.000152 | ms/batch 423.13 | loss  6.43 | ppl   618.838
| epoch   4 step     1350 |     42 batches | lr 0.000157 | ms/batch 412.68 | loss  6.39 | ppl   596.450
| epoch   4 step     1400 |     92 batches | lr 0.000163 | ms/batch 421.26 | loss  6.38 | ppl   591.536
| epoch   4 step     1450 |    142 batches | lr 0.000169 | ms/batch 421.32 | loss  6.31 | ppl   551.751
| epoch   4 step     1500 |    192 batches | lr 0.000175 | ms/batch 422.10 | loss  6.31 | ppl   547.367
| epoch   4 step     1550 |    242 batches | lr 0.000181 | ms/batch 424.02 | loss  6.29 | ppl   540.350
| epoch   4 step     1600 |    292 batches | lr 0.000187 | ms/batch 421.18 | loss  6.27 | ppl   529.908
----------------------------------------------------------------------------------------------------
| Eval   4 at step     1600 | time: 174.77s | valid loss  5.74 | valid ppl   310.734
----------------------------------------------------------------------------------------------------
| epoch   4 step     1650 |    342 batches | lr 0.000193 | ms/batch 585.51 | loss  6.21 | ppl   496.678
| epoch   4 step     1700 |    392 batches | lr 0.000198 | ms/batch 420.95 | loss  6.22 | ppl   502.081
| epoch   5 step     1750 |      6 batches | lr 0.000204 | ms/batch 412.38 | loss  6.19 | ppl   486.939
| epoch   5 step     1800 |     56 batches | lr 0.00021 | ms/batch 421.16 | loss  6.15 | ppl   467.611
| epoch   5 step     1850 |    106 batches | lr 0.000216 | ms/batch 421.34 | loss  6.12 | ppl   457.040
| epoch   5 step     1900 |    156 batches | lr 0.000222 | ms/batch 420.52 | loss  6.07 | ppl   434.008
| epoch   5 step     1950 |    206 batches | lr 0.000228 | ms/batch 423.13 | loss  6.09 | ppl   441.783
| epoch   5 step     2000 |    256 batches | lr 0.000233 | ms/batch 421.87 | loss  6.06 | ppl   429.059
----------------------------------------------------------------------------------------------------
| Eval   5 at step     2000 | time: 174.98s | valid loss  5.54 | valid ppl   254.876
----------------------------------------------------------------------------------------------------
| epoch   5 step     2050 |    306 batches | lr 0.000239 | ms/batch 612.63 | loss  6.05 | ppl   425.797
| epoch   5 step     2100 |    356 batches | lr 0.000245 | ms/batch 422.46 | loss  5.99 | ppl   397.805
| epoch   5 step     2150 |    406 batches | lr 0.000251 | ms/batch 419.96 | loss  5.98 | ppl   396.913
| epoch   6 step     2200 |     20 batches | lr 0.000257 | ms/batch 411.62 | loss  5.97 | ppl   391.090
| epoch   6 step     2250 |     70 batches | lr 0.000262 | ms/batch 421.58 | loss  5.91 | ppl   369.132
| epoch   6 step     2300 |    120 batches | lr 0.000268 | ms/batch 420.19 | loss  5.91 | ppl   367.870
| epoch   6 step     2350 |    170 batches | lr 0.000274 | ms/batch 419.77 | loss  5.88 | ppl   358.645
| epoch   6 step     2400 |    220 batches | lr 0.00028 | ms/batch 419.64 | loss  5.91 | ppl   370.106
----------------------------------------------------------------------------------------------------
| Eval   6 at step     2400 | time: 174.18s | valid loss  5.39 | valid ppl   218.813
----------------------------------------------------------------------------------------------------
| epoch   6 step     2450 |    270 batches | lr 0.000286 | ms/batch 575.37 | loss  5.86 | ppl   352.234
| epoch   6 step     2500 |    320 batches | lr 0.000292 | ms/batch 420.45 | loss  5.85 | ppl   347.495
| epoch   6 step     2550 |    370 batches | lr 0.000297 | ms/batch 420.29 | loss  5.79 | ppl   327.774
| epoch   6 step     2600 |    420 batches | lr 0.000303 | ms/batch 421.72 | loss  5.80 | ppl   331.388
| epoch   7 step     2650 |     34 batches | lr 0.000309 | ms/batch 412.07 | loss  5.81 | ppl   333.906
| epoch   7 step     2700 |     84 batches | lr 0.000315 | ms/batch 420.68 | loss  5.78 | ppl   324.616
| epoch   7 step     2750 |    134 batches | lr 0.000321 | ms/batch 421.66 | loss  5.71 | ppl   301.906
| epoch   7 step     2800 |    184 batches | lr 0.000327 | ms/batch 420.60 | loss  5.74 | ppl   309.610
----------------------------------------------------------------------------------------------------
| Eval   7 at step     2800 | time: 174.26s | valid loss  5.28 | valid ppl   195.898
----------------------------------------------------------------------------------------------------
| epoch   7 step     2850 |    234 batches | lr 0.000333 | ms/batch 574.72 | loss  5.75 | ppl   313.942
| epoch   7 step     2900 |    284 batches | lr 0.000338 | ms/batch 420.16 | loss  5.72 | ppl   305.773
| epoch   7 step     2950 |    334 batches | lr 0.000344 | ms/batch 421.10 | loss  5.65 | ppl   285.178
| epoch   7 step     3000 |    384 batches | lr 0.00035 | ms/batch 422.41 | loss  5.68 | ppl   291.617
| epoch   7 step     3050 |    434 batches | lr 0.000349 | ms/batch 421.53 | loss  5.68 | ppl   292.981
| epoch   8 step     3100 |     48 batches | lr 0.000349 | ms/batch 411.61 | loss  5.64 | ppl   281.855
| epoch   8 step     3150 |     98 batches | lr 0.000349 | ms/batch 423.79 | loss  5.63 | ppl   279.818
| epoch   8 step     3200 |    148 batches | lr 0.000349 | ms/batch 420.62 | loss  5.58 | ppl   265.985
----------------------------------------------------------------------------------------------------
| Eval   8 at step     3200 | time: 174.45s | valid loss  5.18 | valid ppl   177.185
----------------------------------------------------------------------------------------------------
| epoch   8 step     3250 |    198 batches | lr 0.000349 | ms/batch 635.10 | loss  5.64 | ppl   282.765
| epoch   8 step     3300 |    248 batches | lr 0.000349 | ms/batch 426.49 | loss  5.60 | ppl   269.209
| epoch   8 step     3350 |    298 batches | lr 0.000349 | ms/batch 419.98 | loss  5.63 | ppl   277.971
| epoch   8 step     3400 |    348 batches | lr 0.000349 | ms/batch 420.32 | loss  5.50 | ppl   244.897
| epoch   8 step     3450 |    398 batches | lr 0.000349 | ms/batch 426.95 | loss  5.55 | ppl   256.258
| epoch   9 step     3500 |     12 batches | lr 0.000349 | ms/batch 415.06 | loss  5.59 | ppl   267.795
| epoch   9 step     3550 |     62 batches | lr 0.000349 | ms/batch 420.39 | loss  5.50 | ppl   243.875
| epoch   9 step     3600 |    112 batches | lr 0.000349 | ms/batch 419.58 | loss  5.49 | ppl   242.405
----------------------------------------------------------------------------------------------------
| Eval   9 at step     3600 | time: 175.58s | valid loss  5.10 | valid ppl   164.741
----------------------------------------------------------------------------------------------------
| epoch   9 step     3650 |    162 batches | lr 0.000349 | ms/batch 576.07 | loss  5.48 | ppl   240.095
| epoch   9 step     3700 |    212 batches | lr 0.000349 | ms/batch 423.19 | loss  5.51 | ppl   247.206
| epoch   9 step     3750 |    262 batches | lr 0.000349 | ms/batch 420.80 | loss  5.48 | ppl   239.252
| epoch   9 step     3800 |    312 batches | lr 0.000349 | ms/batch 421.39 | loss  5.47 | ppl   238.005
| epoch   9 step     3850 |    362 batches | lr 0.000349 | ms/batch 422.37 | loss  5.41 | ppl   224.469
| epoch   9 step     3900 |    412 batches | lr 0.000349 | ms/batch 420.21 | loss  5.43 | ppl   228.447
| epoch  10 step     3950 |     26 batches | lr 0.000349 | ms/batch 412.29 | loss  5.47 | ppl   236.340
| epoch  10 step     4000 |     76 batches | lr 0.000349 | ms/batch 420.30 | loss  5.39 | ppl   220.170
----------------------------------------------------------------------------------------------------
| Eval  10 at step     4000 | time: 174.39s | valid loss  5.03 | valid ppl   152.361
----------------------------------------------------------------------------------------------------
| epoch  10 step     4050 |    126 batches | lr 0.000349 | ms/batch 584.92 | loss  5.39 | ppl   218.928
| epoch  10 step     4100 |    176 batches | lr 0.000349 | ms/batch 429.32 | loss  5.40 | ppl   221.235
| epoch  10 step     4150 |    226 batches | lr 0.000349 | ms/batch 426.46 | loss  5.41 | ppl   224.587
| epoch  10 step     4200 |    276 batches | lr 0.000349 | ms/batch 420.51 | loss  5.40 | ppl   222.205
| epoch  10 step     4250 |    326 batches | lr 0.000349 | ms/batch 421.51 | loss  5.32 | ppl   204.890
| epoch  10 step     4300 |    376 batches | lr 0.000349 | ms/batch 425.26 | loss  5.34 | ppl   208.983
| epoch  10 step     4350 |    426 batches | lr 0.000349 | ms/batch 420.95 | loss  5.35 | ppl   211.140
| epoch  11 step     4400 |     40 batches | lr 0.000349 | ms/batch 412.41 | loss  5.35 | ppl   211.154
----------------------------------------------------------------------------------------------------
| Eval  11 at step     4400 | time: 175.36s | valid loss  4.95 | valid ppl   140.480
----------------------------------------------------------------------------------------------------
| epoch  11 step     4450 |     90 batches | lr 0.000349 | ms/batch 619.22 | loss  5.33 | ppl   205.461
| epoch  11 step     4500 |    140 batches | lr 0.000349 | ms/batch 420.87 | loss  5.30 | ppl   200.128
| epoch  11 step     4550 |    190 batches | lr 0.000349 | ms/batch 420.20 | loss  5.36 | ppl   211.743
| epoch  11 step     4600 |    240 batches | lr 0.000349 | ms/batch 420.28 | loss  5.33 | ppl   205.586
| epoch  11 step     4650 |    290 batches | lr 0.000349 | ms/batch 421.21 | loss  5.34 | ppl   207.994
| epoch  11 step     4700 |    340 batches | lr 0.000349 | ms/batch 421.08 | loss  5.22 | ppl   185.437
| epoch  11 step     4750 |    390 batches | lr 0.000349 | ms/batch 421.34 | loss  5.28 | ppl   195.887
| epoch  12 step     4800 |      4 batches | lr 0.000349 | ms/batch 415.79 | loss  5.29 | ppl   198.040
----------------------------------------------------------------------------------------------------
| Eval  12 at step     4800 | time: 174.47s | valid loss  4.91 | valid ppl   136.191
----------------------------------------------------------------------------------------------------
| epoch  12 step     4850 |     54 batches | lr 0.000349 | ms/batch 576.55 | loss  5.26 | ppl   192.127
| epoch  12 step     4900 |    104 batches | lr 0.000349 | ms/batch 419.72 | loss  5.26 | ppl   193.403
| epoch  12 step     4950 |    154 batches | lr 0.000349 | ms/batch 421.14 | loss  5.22 | ppl   185.498
| epoch  12 step     5000 |    204 batches | lr 0.000349 | ms/batch 421.84 | loss  5.27 | ppl   194.309
| epoch  12 step     5050 |    254 batches | lr 0.000349 | ms/batch 421.83 | loss  5.27 | ppl   194.786
| epoch  12 step     5100 |    304 batches | lr 0.000349 | ms/batch 422.66 | loss  5.28 | ppl   196.634
| epoch  12 step     5150 |    354 batches | lr 0.000349 | ms/batch 420.97 | loss  5.17 | ppl   175.358
| epoch  12 step     5200 |    404 batches | lr 0.000349 | ms/batch 420.66 | loss  5.20 | ppl   181.595
----------------------------------------------------------------------------------------------------
| Eval  13 at step     5200 | time: 174.78s | valid loss  4.87 | valid ppl   129.921
----------------------------------------------------------------------------------------------------
| epoch  13 step     5250 |     18 batches | lr 0.000348 | ms/batch 564.82 | loss  5.25 | ppl   191.403
| epoch  13 step     5300 |     68 batches | lr 0.000348 | ms/batch 420.78 | loss  5.18 | ppl   177.278
| epoch  13 step     5350 |    118 batches | lr 0.000348 | ms/batch 421.63 | loss  5.19 | ppl   179.714
| epoch  13 step     5400 |    168 batches | lr 0.000348 | ms/batch 422.15 | loss  5.19 | ppl   179.022
| epoch  13 step     5450 |    218 batches | lr 0.000348 | ms/batch 422.70 | loss  5.22 | ppl   185.473
| epoch  13 step     5500 |    268 batches | lr 0.000348 | ms/batch 421.53 | loss  5.19 | ppl   179.074
| epoch  13 step     5550 |    318 batches | lr 0.000348 | ms/batch 425.37 | loss  5.18 | ppl   176.919
| epoch  13 step     5600 |    368 batches | lr 0.000348 | ms/batch 435.14 | loss  5.13 | ppl   169.364
----------------------------------------------------------------------------------------------------
| Eval  14 at step     5600 | time: 175.44s | valid loss  4.83 | valid ppl   125.262
----------------------------------------------------------------------------------------------------
| epoch  13 step     5650 |    418 batches | lr 0.000348 | ms/batch 645.57 | loss  5.15 | ppl   173.141
| epoch  14 step     5700 |     32 batches | lr 0.000348 | ms/batch 411.05 | loss  5.19 | ppl   179.576
| epoch  14 step     5750 |     82 batches | lr 0.000348 | ms/batch 420.19 | loss  5.13 | ppl   168.391
| epoch  14 step     5800 |    132 batches | lr 0.000348 | ms/batch 421.39 | loss  5.14 | ppl   170.186
| epoch  14 step     5850 |    182 batches | lr 0.000348 | ms/batch 421.22 | loss  5.15 | ppl   171.725
| epoch  14 step     5900 |    232 batches | lr 0.000348 | ms/batch 428.86 | loss  5.16 | ppl   173.865
| epoch  14 step     5950 |    282 batches | lr 0.000348 | ms/batch 420.61 | loss  5.16 | ppl   174.889
| epoch  14 step     6000 |    332 batches | lr 0.000348 | ms/batch 422.39 | loss  5.08 | ppl   161.006
----------------------------------------------------------------------------------------------------
| Eval  15 at step     6000 | time: 174.66s | valid loss  4.79 | valid ppl   119.833
----------------------------------------------------------------------------------------------------
| epoch  14 step     6050 |    382 batches | lr 0.000348 | ms/batch 578.12 | loss  5.10 | ppl   163.895
| epoch  14 step     6100 |    432 batches | lr 0.000348 | ms/batch 422.67 | loss  5.12 | ppl   167.516
| epoch  15 step     6150 |     46 batches | lr 0.000348 | ms/batch 413.78 | loss  5.10 | ppl   164.419
| epoch  15 step     6200 |     96 batches | lr 0.000348 | ms/batch 435.52 | loss  5.07 | ppl   159.725
| epoch  15 step     6250 |    146 batches | lr 0.000348 | ms/batch 435.91 | loss  5.09 | ppl   162.566
| epoch  15 step     6300 |    196 batches | lr 0.000348 | ms/batch 422.07 | loss  5.12 | ppl   168.059
| epoch  15 step     6350 |    246 batches | lr 0.000348 | ms/batch 422.92 | loss  5.11 | ppl   165.630
| epoch  15 step     6400 |    296 batches | lr 0.000348 | ms/batch 422.41 | loss  5.13 | ppl   169.010
----------------------------------------------------------------------------------------------------
| Eval  16 at step     6400 | time: 176.23s | valid loss  4.78 | valid ppl   118.577
----------------------------------------------------------------------------------------------------
| epoch  15 step     6450 |    346 batches | lr 0.000348 | ms/batch 576.92 | loss  5.00 | ppl   148.134
| epoch  15 step     6500 |    396 batches | lr 0.000348 | ms/batch 422.82 | loss  5.08 | ppl   160.032
| epoch  16 step     6550 |     10 batches | lr 0.000348 | ms/batch 414.79 | loss  5.09 | ppl   162.687
| epoch  16 step     6600 |     60 batches | lr 0.000348 | ms/batch 422.31 | loss  5.04 | ppl   153.784
| epoch  16 step     6650 |    110 batches | lr 0.000348 | ms/batch 421.43 | loss  5.02 | ppl   151.784
| epoch  16 step     6700 |    160 batches | lr 0.000348 | ms/batch 421.09 | loss  5.02 | ppl   151.066
| epoch  16 step     6750 |    210 batches | lr 0.000347 | ms/batch 422.01 | loss  5.06 | ppl   157.668
| epoch  16 step     6800 |    260 batches | lr 0.000347 | ms/batch 422.56 | loss  5.07 | ppl   158.746
----------------------------------------------------------------------------------------------------
| Eval  17 at step     6800 | time: 174.77s | valid loss  4.73 | valid ppl   113.646
----------------------------------------------------------------------------------------------------
| epoch  16 step     6850 |    310 batches | lr 0.000347 | ms/batch 582.48 | loss  5.06 | ppl   157.029
| epoch  16 step     6900 |    360 batches | lr 0.000347 | ms/batch 431.96 | loss  4.97 | ppl   143.439
| epoch  16 step     6950 |    410 batches | lr 0.000347 | ms/batch 422.26 | loss  5.01 | ppl   149.382
| epoch  17 step     7000 |     24 batches | lr 0.000347 | ms/batch 414.03 | loss  5.08 | ppl   161.406
| epoch  17 step     7050 |     74 batches | lr 0.000347 | ms/batch 420.37 | loss  5.01 | ppl   149.160
| epoch  17 step     7100 |    124 batches | lr 0.000347 | ms/batch 421.14 | loss  5.00 | ppl   148.351
| epoch  17 step     7150 |    174 batches | lr 0.000347 | ms/batch 421.09 | loss  5.02 | ppl   152.033
| epoch  17 step     7200 |    224 batches | lr 0.000347 | ms/batch 422.53 | loss  5.03 | ppl   153.102
----------------------------------------------------------------------------------------------------
| Eval  18 at step     7200 | time: 175.39s | valid loss  4.71 | valid ppl   111.506
----------------------------------------------------------------------------------------------------
| epoch  17 step     7250 |    274 batches | lr 0.000347 | ms/batch 575.82 | loss  5.02 | ppl   150.792
| epoch  17 step     7300 |    324 batches | lr 0.000347 | ms/batch 420.26 | loss  4.95 | ppl   141.573
| epoch  17 step     7350 |    374 batches | lr 0.000347 | ms/batch 420.43 | loss  4.98 | ppl   145.553
| epoch  17 step     7400 |    424 batches | lr 0.000347 | ms/batch 423.04 | loss  4.99 | ppl   147.236
| epoch  18 step     7450 |     38 batches | lr 0.000347 | ms/batch 415.23 | loss  5.01 | ppl   149.613
| epoch  18 step     7500 |     88 batches | lr 0.000347 | ms/batch 424.33 | loss  4.96 | ppl   142.544
| epoch  18 step     7550 |    138 batches | lr 0.000347 | ms/batch 421.59 | loss  4.96 | ppl   142.571
| epoch  18 step     7600 |    188 batches | lr 0.000347 | ms/batch 421.56 | loss  4.97 | ppl   144.640
----------------------------------------------------------------------------------------------------
| Eval  19 at step     7600 | time: 174.78s | valid loss  4.70 | valid ppl   109.971
----------------------------------------------------------------------------------------------------
| epoch  18 step     7650 |    238 batches | lr 0.000347 | ms/batch 577.86 | loss  4.98 | ppl   145.370
| epoch  18 step     7700 |    288 batches | lr 0.000347 | ms/batch 420.17 | loss  5.03 | ppl   152.647
| epoch  18 step     7750 |    338 batches | lr 0.000347 | ms/batch 420.29 | loss  4.91 | ppl   135.336
| epoch  18 step     7800 |    388 batches | lr 0.000347 | ms/batch 422.66 | loss  4.96 | ppl   142.611
| epoch  19 step     7850 |      2 batches | lr 0.000347 | ms/batch 413.08 | loss  4.98 | ppl   144.992
| epoch  19 step     7900 |     52 batches | lr 0.000347 | ms/batch 421.63 | loss  4.93 | ppl   137.966
| epoch  19 step     7950 |    102 batches | lr 0.000347 | ms/batch 428.11 | loss  4.93 | ppl   137.694
| epoch  19 step     8000 |    152 batches | lr 0.000346 | ms/batch 437.42 | loss  4.94 | ppl   139.502
----------------------------------------------------------------------------------------------------
| Eval  20 at step     8000 | time: 176.98s | valid loss  4.68 | valid ppl   108.190
----------------------------------------------------------------------------------------------------
| epoch  19 step     8050 |    202 batches | lr 0.000346 | ms/batch 581.27 | loss  4.97 | ppl   143.653
| epoch  19 step     8100 |    252 batches | lr 0.000346 | ms/batch 421.26 | loss  4.95 | ppl   141.553
| epoch  19 step     8150 |    302 batches | lr 0.000346 | ms/batch 421.44 | loss  4.99 | ppl   146.274
| epoch  19 step     8200 |    352 batches | lr 0.000346 | ms/batch 420.30 | loss  4.85 | ppl   128.207
| epoch  19 step     8250 |    402 batches | lr 0.000346 | ms/batch 422.86 | loss  4.93 | ppl   138.381
| epoch  20 step     8300 |     16 batches | lr 0.000346 | ms/batch 413.73 | loss  4.95 | ppl   140.793
| epoch  20 step     8350 |     66 batches | lr 0.000346 | ms/batch 423.21 | loss  4.89 | ppl   132.524
| epoch  20 step     8400 |    116 batches | lr 0.000346 | ms/batch 422.83 | loss  4.91 | ppl   135.180
----------------------------------------------------------------------------------------------------
| Eval  21 at step     8400 | time: 176.05s | valid loss  4.65 | valid ppl   104.473
----------------------------------------------------------------------------------------------------
| epoch  20 step     8450 |    166 batches | lr 0.000346 | ms/batch 605.21 | loss  4.92 | ppl   136.886
| epoch  20 step     8500 |    216 batches | lr 0.000346 | ms/batch 420.63 | loss  4.94 | ppl   139.181
| epoch  20 step     8550 |    266 batches | lr 0.000346 | ms/batch 420.52 | loss  4.94 | ppl   139.902
| epoch  20 step     8600 |    316 batches | lr 0.000346 | ms/batch 422.01 | loss  4.91 | ppl   135.903
| epoch  20 step     8650 |    366 batches | lr 0.000346 | ms/batch 422.60 | loss  4.85 | ppl   127.560
| epoch  20 step     8700 |    416 batches | lr 0.000346 | ms/batch 422.98 | loss  4.89 | ppl   133.176
| epoch  21 step     8750 |     30 batches | lr 0.000346 | ms/batch 414.08 | loss  4.93 | ppl   138.670
| epoch  21 step     8800 |     80 batches | lr 0.000346 | ms/batch 423.96 | loss  4.88 | ppl   131.217
----------------------------------------------------------------------------------------------------
| Eval  22 at step     8800 | time: 174.74s | valid loss  4.65 | valid ppl   104.123
----------------------------------------------------------------------------------------------------
| epoch  21 step     8850 |    130 batches | lr 0.000346 | ms/batch 577.25 | loss  4.86 | ppl   129.459
| epoch  21 step     8900 |    180 batches | lr 0.000346 | ms/batch 422.81 | loss  4.89 | ppl   132.858
| epoch  21 step     8950 |    230 batches | lr 0.000346 | ms/batch 422.37 | loss  4.90 | ppl   134.920
| epoch  21 step     9000 |    280 batches | lr 0.000346 | ms/batch 419.67 | loss  4.93 | ppl   137.923
| epoch  21 step     9050 |    330 batches | lr 0.000345 | ms/batch 420.21 | loss  4.85 | ppl   127.651
| epoch  21 step     9100 |    380 batches | lr 0.000345 | ms/batch 419.90 | loss  4.86 | ppl   128.440
| epoch  21 step     9150 |    430 batches | lr 0.000345 | ms/batch 421.78 | loss  4.87 | ppl   130.387
| epoch  22 step     9200 |     44 batches | lr 0.000345 | ms/batch 412.97 | loss  4.88 | ppl   131.776
----------------------------------------------------------------------------------------------------
| Eval  23 at step     9200 | time: 174.44s | valid loss  4.61 | valid ppl   100.035
----------------------------------------------------------------------------------------------------
| epoch  22 step     9250 |     94 batches | lr 0.000345 | ms/batch 578.15 | loss  4.83 | ppl   124.729
| epoch  22 step     9300 |    144 batches | lr 0.000345 | ms/batch 423.16 | loss  4.86 | ppl   129.528
| epoch  22 step     9350 |    194 batches | lr 0.000345 | ms/batch 421.78 | loss  4.87 | ppl   129.835
| epoch  22 step     9400 |    244 batches | lr 0.000345 | ms/batch 421.67 | loss  4.89 | ppl   132.613
| epoch  22 step     9450 |    294 batches | lr 0.000345 | ms/batch 422.69 | loss  4.91 | ppl   135.964
| epoch  22 step     9500 |    344 batches | lr 0.000345 | ms/batch 423.23 | loss  4.78 | ppl   118.742
| epoch  22 step     9550 |    394 batches | lr 0.000345 | ms/batch 425.09 | loss  4.86 | ppl   128.889
| epoch  23 step     9600 |      8 batches | lr 0.000345 | ms/batch 412.29 | loss  4.87 | ppl   130.001
----------------------------------------------------------------------------------------------------
| Eval  24 at step     9600 | time: 174.97s | valid loss  4.59 | valid ppl    98.446
----------------------------------------------------------------------------------------------------
| epoch  23 step     9650 |     58 batches | lr 0.000345 | ms/batch 600.63 | loss  4.83 | ppl   125.581
| epoch  23 step     9700 |    108 batches | lr 0.000345 | ms/batch 420.70 | loss  4.82 | ppl   123.505
| epoch  23 step     9750 |    158 batches | lr 0.000345 | ms/batch 420.71 | loss  4.82 | ppl   124.466
| epoch  23 step     9800 |    208 batches | lr 0.000345 | ms/batch 423.30 | loss  4.84 | ppl   126.546
| epoch  23 step     9850 |    258 batches | lr 0.000345 | ms/batch 420.37 | loss  4.87 | ppl   130.130
| epoch  23 step     9900 |    308 batches | lr 0.000345 | ms/batch 421.25 | loss  4.87 | ppl   130.351
| epoch  23 step     9950 |    358 batches | lr 0.000345 | ms/batch 421.83 | loss  4.77 | ppl   117.835
| epoch  23 step    10000 |    408 batches | lr 0.000345 | ms/batch 421.44 | loss  4.80 | ppl   121.988
----------------------------------------------------------------------------------------------------
| Eval  25 at step    10000 | time: 174.94s | valid loss  4.63 | valid ppl   102.227
----------------------------------------------------------------------------------------------------
| epoch  24 step    10050 |     22 batches | lr 0.000344 | ms/batch 536.86 | loss  4.85 | ppl   127.724
| epoch  24 step    10100 |     72 batches | lr 0.000344 | ms/batch 419.52 | loss  4.78 | ppl   119.534
| epoch  24 step    10150 |    122 batches | lr 0.000344 | ms/batch 421.31 | loss  4.82 | ppl   124.304
| epoch  24 step    10200 |    172 batches | lr 0.000344 | ms/batch 422.00 | loss  4.80 | ppl   121.886
| epoch  24 step    10250 |    222 batches | lr 0.000344 | ms/batch 421.98 | loss  4.85 | ppl   127.414
| epoch  24 step    10300 |    272 batches | lr 0.000344 | ms/batch 421.41 | loss  4.82 | ppl   124.255
| epoch  24 step    10350 |    322 batches | lr 0.000344 | ms/batch 421.60 | loss  4.80 | ppl   121.464
| epoch  24 step    10400 |    372 batches | lr 0.000344 | ms/batch 421.43 | loss  4.77 | ppl   118.430
----------------------------------------------------------------------------------------------------
| Eval  26 at step    10400 | time: 174.31s | valid loss  4.57 | valid ppl    96.906
----------------------------------------------------------------------------------------------------
| epoch  24 step    10450 |    422 batches | lr 0.000344 | ms/batch 579.69 | loss  4.80 | ppl   121.677
| epoch  25 step    10500 |     36 batches | lr 0.000344 | ms/batch 424.32 | loss  4.82 | ppl   123.898
| epoch  25 step    10550 |     86 batches | lr 0.000344 | ms/batch 422.21 | loss  4.76 | ppl   116.632
| epoch  25 step    10600 |    136 batches | lr 0.000344 | ms/batch 421.59 | loss  4.77 | ppl   118.492
| epoch  25 step    10650 |    186 batches | lr 0.000344 | ms/batch 421.08 | loss  4.78 | ppl   119.667
| epoch  25 step    10700 |    236 batches | lr 0.000344 | ms/batch 433.17 | loss  4.81 | ppl   122.907
| epoch  25 step    10750 |    286 batches | lr 0.000344 | ms/batch 422.07 | loss  4.84 | ppl   126.679
| epoch  25 step    10800 |    336 batches | lr 0.000344 | ms/batch 422.24 | loss  4.71 | ppl   111.458
----------------------------------------------------------------------------------------------------
| Eval  27 at step    10800 | time: 175.85s | valid loss  4.58 | valid ppl    97.828
----------------------------------------------------------------------------------------------------
| epoch  25 step    10850 |    386 batches | lr 0.000344 | ms/batch 546.97 | loss  4.79 | ppl   120.281
| epoch  25 step    10900 |    436 batches | lr 0.000343 | ms/batch 414.43 | loss  4.81 | ppl   122.155
| epoch  26 step    10950 |     50 batches | lr 0.000343 | ms/batch 419.00 | loss  4.75 | ppl   115.991
| epoch  26 step    11000 |    100 batches | lr 0.000343 | ms/batch 422.13 | loss  4.75 | ppl   115.143
| epoch  26 step    11050 |    150 batches | lr 0.000343 | ms/batch 422.76 | loss  4.78 | ppl   118.957
| epoch  26 step    11100 |    200 batches | lr 0.000343 | ms/batch 421.28 | loss  4.79 | ppl   120.798
| epoch  26 step    11150 |    250 batches | lr 0.000343 | ms/batch 420.65 | loss  4.78 | ppl   118.960
| epoch  26 step    11200 |    300 batches | lr 0.000343 | ms/batch 423.21 | loss  4.81 | ppl   122.989
----------------------------------------------------------------------------------------------------
| Eval  28 at step    11200 | time: 174.54s | valid loss  4.56 | valid ppl    95.345
----------------------------------------------------------------------------------------------------
| epoch  26 step    11250 |    350 batches | lr 0.000343 | ms/batch 606.12 | loss  4.68 | ppl   108.021
| epoch  26 step    11300 |    400 batches | lr 0.000343 | ms/batch 421.01 | loss  4.76 | ppl   116.258
| epoch  27 step    11350 |     14 batches | lr 0.000343 | ms/batch 413.48 | loss  4.77 | ppl   117.548
| epoch  27 step    11400 |     64 batches | lr 0.000343 | ms/batch 423.82 | loss  4.71 | ppl   111.216
| epoch  27 step    11450 |    114 batches | lr 0.000343 | ms/batch 421.75 | loss  4.73 | ppl   113.425
| epoch  27 step    11500 |    164 batches | lr 0.000343 | ms/batch 421.21 | loss  4.73 | ppl   113.781
| epoch  27 step    11550 |    214 batches | lr 0.000343 | ms/batch 420.58 | loss  4.77 | ppl   117.654
| epoch  27 step    11600 |    264 batches | lr 0.000343 | ms/batch 425.94 | loss  4.76 | ppl   117.284
----------------------------------------------------------------------------------------------------
| Eval  29 at step    11600 | time: 174.94s | valid loss  4.54 | valid ppl    93.583
----------------------------------------------------------------------------------------------------
| epoch  27 step    11650 |    314 batches | lr 0.000343 | ms/batch 575.91 | loss  4.78 | ppl   118.523
| epoch  27 step    11700 |    364 batches | lr 0.000342 | ms/batch 420.50 | loss  4.69 | ppl   109.270
| epoch  27 step    11750 |    414 batches | lr 0.000342 | ms/batch 422.00 | loss  4.73 | ppl   113.353
| epoch  28 step    11800 |     28 batches | lr 0.000342 | ms/batch 412.65 | loss  4.77 | ppl   117.507
| epoch  28 step    11850 |     78 batches | lr 0.000342 | ms/batch 423.05 | loss  4.70 | ppl   110.353
| epoch  28 step    11900 |    128 batches | lr 0.000342 | ms/batch 433.13 | loss  4.74 | ppl   114.684
| epoch  28 step    11950 |    178 batches | lr 0.000342 | ms/batch 422.84 | loss  4.74 | ppl   114.095
| epoch  28 step    12000 |    228 batches | lr 0.000342 | ms/batch 422.78 | loss  4.74 | ppl   114.960
----------------------------------------------------------------------------------------------------
| Eval  30 at step    12000 | time: 175.22s | valid loss  4.54 | valid ppl    93.987
----------------------------------------------------------------------------------------------------
| epoch  28 step    12050 |    278 batches | lr 0.000342 | ms/batch 552.19 | loss  4.77 | ppl   117.505
| epoch  28 step    12100 |    328 batches | lr 0.000342 | ms/batch 425.70 | loss  4.69 | ppl   108.491
| epoch  28 step    12150 |    378 batches | lr 0.000342 | ms/batch 425.05 | loss  4.71 | ppl   110.687
| epoch  28 step    12200 |    428 batches | lr 0.000342 | ms/batch 425.29 | loss  4.73 | ppl   113.386
| epoch  29 step    12250 |     42 batches | lr 0.000342 | ms/batch 415.17 | loss  4.73 | ppl   113.103
| epoch  29 step    12300 |     92 batches | lr 0.000342 | ms/batch 425.04 | loss  4.70 | ppl   109.955
| epoch  29 step    12350 |    142 batches | lr 0.000342 | ms/batch 424.34 | loss  4.71 | ppl   111.362
| epoch  29 step    12400 |    192 batches | lr 0.000342 | ms/batch 425.33 | loss  4.72 | ppl   112.541
----------------------------------------------------------------------------------------------------
| Eval  31 at step    12400 | time: 175.93s | valid loss  4.53 | valid ppl    92.462
----------------------------------------------------------------------------------------------------
| epoch  29 step    12450 |    242 batches | lr 0.000342 | ms/batch 580.02 | loss  4.73 | ppl   112.975
| epoch  29 step    12500 |    292 batches | lr 0.000341 | ms/batch 424.40 | loss  4.77 | ppl   117.585
| epoch  29 step    12550 |    342 batches | lr 0.000341 | ms/batch 423.50 | loss  4.62 | ppl   101.982
| epoch  29 step    12600 |    392 batches | lr 0.000341 | ms/batch 426.20 | loss  4.73 | ppl   112.824
| epoch  30 step    12650 |      6 batches | lr 0.000341 | ms/batch 415.81 | loss  4.75 | ppl   115.768
| epoch  30 step    12700 |     56 batches | lr 0.000341 | ms/batch 424.49 | loss  4.65 | ppl   104.763
| epoch  30 step    12750 |    106 batches | lr 0.000341 | ms/batch 424.33 | loss  4.68 | ppl   107.587
| epoch  30 step    12800 |    156 batches | lr 0.000341 | ms/batch 425.23 | loss  4.71 | ppl   111.308
----------------------------------------------------------------------------------------------------
| Eval  32 at step    12800 | time: 175.70s | valid loss  4.50 | valid ppl    90.413
----------------------------------------------------------------------------------------------------
| epoch  30 step    12850 |    206 batches | lr 0.000341 | ms/batch 580.60 | loss  4.71 | ppl   110.764
| epoch  30 step    12900 |    256 batches | lr 0.000341 | ms/batch 424.37 | loss  4.73 | ppl   112.790
| epoch  30 step    12950 |    306 batches | lr 0.000341 | ms/batch 427.12 | loss  4.72 | ppl   112.716
| epoch  30 step    13000 |    356 batches | lr 0.000341 | ms/batch 424.52 | loss  4.62 | ppl   101.520
| epoch  30 step    13050 |    406 batches | lr 0.000341 | ms/batch 426.21 | loss  4.67 | ppl   106.550
| epoch  31 step    13100 |     20 batches | lr 0.000341 | ms/batch 417.78 | loss  4.73 | ppl   113.160
| epoch  31 step    13150 |     70 batches | lr 0.000341 | ms/batch 426.55 | loss  4.64 | ppl   103.916
| epoch  31 step    13200 |    120 batches | lr 0.00034 | ms/batch 426.19 | loss  4.69 | ppl   108.706
----------------------------------------------------------------------------------------------------
| Eval  33 at step    13200 | time: 176.20s | valid loss  4.50 | valid ppl    89.814
----------------------------------------------------------------------------------------------------
| epoch  31 step    13250 |    170 batches | lr 0.00034 | ms/batch 600.27 | loss  4.68 | ppl   107.802
| epoch  31 step    13300 |    220 batches | lr 0.00034 | ms/batch 425.08 | loss  4.71 | ppl   110.751
| epoch  31 step    13350 |    270 batches | lr 0.00034 | ms/batch 426.70 | loss  4.68 | ppl   108.264
| epoch  31 step    13400 |    320 batches | lr 0.00034 | ms/batch 425.90 | loss  4.68 | ppl   108.137
| epoch  31 step    13450 |    370 batches | lr 0.00034 | ms/batch 425.04 | loss  4.63 | ppl   102.826
| epoch  31 step    13500 |    420 batches | lr 0.00034 | ms/batch 425.46 | loss  4.68 | ppl   107.738
| epoch  32 step    13550 |     34 batches | lr 0.00034 | ms/batch 415.64 | loss  4.69 | ppl   108.706
| epoch  32 step    13600 |     84 batches | lr 0.00034 | ms/batch 423.17 | loss  4.63 | ppl   102.686
----------------------------------------------------------------------------------------------------
| Eval  34 at step    13600 | time: 175.95s | valid loss  4.51 | valid ppl    90.477
----------------------------------------------------------------------------------------------------
| epoch  32 step    13650 |    134 batches | lr 0.00034 | ms/batch 551.21 | loss  4.67 | ppl   107.160
| epoch  32 step    13700 |    184 batches | lr 0.00034 | ms/batch 424.33 | loss  4.67 | ppl   106.749
| epoch  32 step    13750 |    234 batches | lr 0.00034 | ms/batch 424.32 | loss  4.67 | ppl   106.792
| epoch  32 step    13800 |    284 batches | lr 0.00034 | ms/batch 423.97 | loss  4.71 | ppl   110.887
| epoch  32 step    13850 |    334 batches | lr 0.00034 | ms/batch 425.31 | loss  4.60 | ppl    99.763
| epoch  32 step    13900 |    384 batches | lr 0.000339 | ms/batch 423.48 | loss  4.65 | ppl   104.277
| epoch  32 step    13950 |    434 batches | lr 0.000339 | ms/batch 427.67 | loss  4.68 | ppl   107.998
| epoch  33 step    14000 |     48 batches | lr 0.000339 | ms/batch 415.52 | loss  4.63 | ppl   102.988
----------------------------------------------------------------------------------------------------
| Eval  35 at step    14000 | time: 175.79s | valid loss  4.50 | valid ppl    89.651
----------------------------------------------------------------------------------------------------
| epoch  33 step    14050 |     98 batches | lr 0.000339 | ms/batch 598.18 | loss  4.62 | ppl   101.942
| epoch  33 step    14100 |    148 batches | lr 0.000339 | ms/batch 424.88 | loss  4.65 | ppl   104.493
| epoch  33 step    14150 |    198 batches | lr 0.000339 | ms/batch 426.80 | loss  4.67 | ppl   106.741
| epoch  33 step    14200 |    248 batches | lr 0.000339 | ms/batch 425.25 | loss  4.67 | ppl   107.170
| epoch  33 step    14250 |    298 batches | lr 0.000339 | ms/batch 426.00 | loss  4.70 | ppl   110.305
| epoch  33 step    14300 |    348 batches | lr 0.000339 | ms/batch 423.48 | loss  4.56 | ppl    95.935
| epoch  33 step    14350 |    398 batches | lr 0.000339 | ms/batch 423.76 | loss  4.66 | ppl   105.687
| epoch  34 step    14400 |     12 batches | lr 0.000339 | ms/batch 415.29 | loss  4.69 | ppl   108.759
----------------------------------------------------------------------------------------------------
| Eval  36 at step    14400 | time: 175.79s | valid loss  4.47 | valid ppl    87.320
----------------------------------------------------------------------------------------------------
| epoch  34 step    14450 |     62 batches | lr 0.000339 | ms/batch 593.43 | loss  4.63 | ppl   102.657
| epoch  34 step    14500 |    112 batches | lr 0.000339 | ms/batch 423.58 | loss  4.63 | ppl   102.763
| epoch  34 step    14550 |    162 batches | lr 0.000338 | ms/batch 424.02 | loss  4.64 | ppl   103.363
| epoch  34 step    14600 |    212 batches | lr 0.000338 | ms/batch 423.73 | loss  4.66 | ppl   105.566
| epoch  34 step    14650 |    262 batches | lr 0.000338 | ms/batch 423.42 | loss  4.68 | ppl   107.882
| epoch  34 step    14700 |    312 batches | lr 0.000338 | ms/batch 424.03 | loss  4.65 | ppl   104.128
| epoch  34 step    14750 |    362 batches | lr 0.000338 | ms/batch 425.92 | loss  4.59 | ppl    98.983
| epoch  34 step    14800 |    412 batches | lr 0.000338 | ms/batch 425.91 | loss  4.64 | ppl   103.222
----------------------------------------------------------------------------------------------------
| Eval  37 at step    14800 | time: 176.07s | valid loss  4.49 | valid ppl    89.017
----------------------------------------------------------------------------------------------------
| epoch  35 step    14850 |     26 batches | lr 0.000338 | ms/batch 543.34 | loss  4.66 | ppl   105.813
| epoch  35 step    14900 |     76 batches | lr 0.000338 | ms/batch 423.06 | loss  4.61 | ppl   100.946
| epoch  35 step    14950 |    126 batches | lr 0.000338 | ms/batch 424.25 | loss  4.62 | ppl   101.466
| epoch  35 step    15000 |    176 batches | lr 0.000338 | ms/batch 425.14 | loss  4.64 | ppl   103.697
| epoch  35 step    15050 |    226 batches | lr 0.000338 | ms/batch 426.62 | loss  4.64 | ppl   103.605
| epoch  35 step    15100 |    276 batches | lr 0.000338 | ms/batch 423.31 | loss  4.66 | ppl   105.717
| epoch  35 step    15150 |    326 batches | lr 0.000337 | ms/batch 425.76 | loss  4.58 | ppl    97.717
| epoch  35 step    15200 |    376 batches | lr 0.000337 | ms/batch 425.87 | loss  4.63 | ppl   102.283
----------------------------------------------------------------------------------------------------
| Eval  38 at step    15200 | time: 175.86s | valid loss  4.47 | valid ppl    87.543
----------------------------------------------------------------------------------------------------
| epoch  35 step    15250 |    426 batches | lr 0.000337 | ms/batch 552.01 | loss  4.64 | ppl   103.395
| epoch  36 step    15300 |     40 batches | lr 0.000337 | ms/batch 416.32 | loss  4.62 | ppl   101.298
| epoch  36 step    15350 |     90 batches | lr 0.000337 | ms/batch 426.12 | loss  4.60 | ppl    99.695
| epoch  36 step    15400 |    140 batches | lr 0.000337 | ms/batch 423.91 | loss  4.63 | ppl   102.725
| epoch  36 step    15450 |    190 batches | lr 0.000337 | ms/batch 426.10 | loss  4.64 | ppl   103.650
| epoch  36 step    15500 |    240 batches | lr 0.000337 | ms/batch 425.75 | loss  4.62 | ppl   101.185
| epoch  36 step    15550 |    290 batches | lr 0.000337 | ms/batch 426.24 | loss  4.67 | ppl   106.544
| epoch  36 step    15600 |    340 batches | lr 0.000337 | ms/batch 426.35 | loss  4.53 | ppl    92.720
----------------------------------------------------------------------------------------------------
| Eval  39 at step    15600 | time: 176.16s | valid loss  4.46 | valid ppl    86.434
----------------------------------------------------------------------------------------------------
| epoch  36 step    15650 |    390 batches | lr 0.000337 | ms/batch 579.68 | loss  4.62 | ppl   101.673
| epoch  37 step    15700 |      4 batches | lr 0.000337 | ms/batch 416.08 | loss  4.64 | ppl   103.933
| epoch  37 step    15750 |     54 batches | lr 0.000336 | ms/batch 424.71 | loss  4.58 | ppl    97.037
| epoch  37 step    15800 |    104 batches | lr 0.000336 | ms/batch 425.66 | loss  4.59 | ppl    98.303
| epoch  37 step    15850 |    154 batches | lr 0.000336 | ms/batch 425.61 | loss  4.58 | ppl    97.754
| epoch  37 step    15900 |    204 batches | lr 0.000336 | ms/batch 426.74 | loss  4.62 | ppl   101.659
| epoch  37 step    15950 |    254 batches | lr 0.000336 | ms/batch 426.29 | loss  4.63 | ppl   102.462
| epoch  37 step    16000 |    304 batches | lr 0.000336 | ms/batch 424.49 | loss  4.64 | ppl   103.499
----------------------------------------------------------------------------------------------------
| Eval  40 at step    16000 | time: 175.95s | valid loss  4.46 | valid ppl    86.507
----------------------------------------------------------------------------------------------------
| epoch  37 step    16050 |    354 batches | lr 0.000336 | ms/batch 550.33 | loss  4.55 | ppl    94.525
| epoch  37 step    16100 |    404 batches | lr 0.000336 | ms/batch 423.69 | loss  4.63 | ppl   102.351
| epoch  38 step    16150 |     18 batches | lr 0.000336 | ms/batch 416.65 | loss  4.64 | ppl   103.253
| epoch  38 step    16200 |     68 batches | lr 0.000336 | ms/batch 426.06 | loss  4.55 | ppl    94.381
| epoch  38 step    16250 |    118 batches | lr 0.000336 | ms/batch 425.42 | loss  4.59 | ppl    98.607
| epoch  38 step    16300 |    168 batches | lr 0.000336 | ms/batch 424.33 | loss  4.59 | ppl    98.485
| epoch  38 step    16350 |    218 batches | lr 0.000335 | ms/batch 424.17 | loss  4.62 | ppl   101.197
| epoch  38 step    16400 |    268 batches | lr 0.000335 | ms/batch 425.76 | loss  4.61 | ppl   100.558
----------------------------------------------------------------------------------------------------
| Eval  41 at step    16400 | time: 175.85s | valid loss  4.46 | valid ppl    86.262
----------------------------------------------------------------------------------------------------
| epoch  38 step    16450 |    318 batches | lr 0.000335 | ms/batch 581.50 | loss  4.59 | ppl    98.383
| epoch  38 step    16500 |    368 batches | lr 0.000335 | ms/batch 426.54 | loss  4.55 | ppl    94.376
| epoch  38 step    16550 |    418 batches | lr 0.000335 | ms/batch 426.05 | loss  4.59 | ppl    98.723
| epoch  39 step    16600 |     32 batches | lr 0.000335 | ms/batch 415.12 | loss  4.60 | ppl    99.205
| epoch  39 step    16650 |     82 batches | lr 0.000335 | ms/batch 423.75 | loss  4.55 | ppl    94.619
| epoch  39 step    16700 |    132 batches | lr 0.000335 | ms/batch 423.97 | loss  4.56 | ppl    95.945
| epoch  39 step    16750 |    182 batches | lr 0.000335 | ms/batch 422.78 | loss  4.59 | ppl    98.723
| epoch  39 step    16800 |    232 batches | lr 0.000335 | ms/batch 424.62 | loss  4.60 | ppl    99.884
----------------------------------------------------------------------------------------------------
| Eval  42 at step    16800 | time: 175.72s | valid loss  4.46 | valid ppl    86.064
----------------------------------------------------------------------------------------------------
| epoch  39 step    16850 |    282 batches | lr 0.000335 | ms/batch 605.22 | loss  4.62 | ppl   101.352
| epoch  39 step    16900 |    332 batches | lr 0.000334 | ms/batch 425.03 | loss  4.53 | ppl    92.636
| epoch  39 step    16950 |    382 batches | lr 0.000334 | ms/batch 424.95 | loss  4.56 | ppl    95.603
| epoch  39 step    17000 |    432 batches | lr 0.000334 | ms/batch 423.63 | loss  4.60 | ppl    99.465
| epoch  40 step    17050 |     46 batches | lr 0.000334 | ms/batch 414.98 | loss  4.56 | ppl    95.704
| epoch  40 step    17100 |     96 batches | lr 0.000334 | ms/batch 427.16 | loss  4.54 | ppl    94.035
| epoch  40 step    17150 |    146 batches | lr 0.000334 | ms/batch 425.26 | loss  4.58 | ppl    97.349
| epoch  40 step    17200 |    196 batches | lr 0.000334 | ms/batch 424.40 | loss  4.60 | ppl    99.771
----------------------------------------------------------------------------------------------------
| Eval  43 at step    17200 | time: 175.83s | valid loss  4.45 | valid ppl    85.423
----------------------------------------------------------------------------------------------------
| epoch  40 step    17250 |    246 batches | lr 0.000334 | ms/batch 579.39 | loss  4.59 | ppl    98.892
| epoch  40 step    17300 |    296 batches | lr 0.000334 | ms/batch 423.37 | loss  4.63 | ppl   102.265
| epoch  40 step    17350 |    346 batches | lr 0.000334 | ms/batch 423.33 | loss  4.49 | ppl    89.058
| epoch  40 step    17400 |    396 batches | lr 0.000334 | ms/batch 424.84 | loss  4.58 | ppl    97.059
| epoch  41 step    17450 |     10 batches | lr 0.000333 | ms/batch 415.92 | loss  4.60 | ppl    99.165
| epoch  41 step    17500 |     60 batches | lr 0.000333 | ms/batch 424.73 | loss  4.52 | ppl    92.168
| epoch  41 step    17550 |    110 batches | lr 0.000333 | ms/batch 423.74 | loss  4.56 | ppl    95.236
| epoch  41 step    17600 |    160 batches | lr 0.000333 | ms/batch 423.87 | loss  4.55 | ppl    94.526
----------------------------------------------------------------------------------------------------
| Eval  44 at step    17600 | time: 175.50s | valid loss  4.43 | valid ppl    84.333
----------------------------------------------------------------------------------------------------
| epoch  41 step    17650 |    210 batches | lr 0.000333 | ms/batch 580.40 | loss  4.58 | ppl    97.518
| epoch  41 step    17700 |    260 batches | lr 0.000333 | ms/batch 424.39 | loss  4.56 | ppl    95.191
| epoch  41 step    17750 |    310 batches | lr 0.000333 | ms/batch 424.16 | loss  4.58 | ppl    97.304
| epoch  41 step    17800 |    360 batches | lr 0.000333 | ms/batch 425.57 | loss  4.52 | ppl    91.978
| epoch  41 step    17850 |    410 batches | lr 0.000333 | ms/batch 424.20 | loss  4.54 | ppl    94.101
| epoch  42 step    17900 |     24 batches | lr 0.000333 | ms/batch 415.18 | loss  4.60 | ppl    99.261
| epoch  42 step    17950 |     74 batches | lr 0.000332 | ms/batch 423.06 | loss  4.53 | ppl    92.920
| epoch  42 step    18000 |    124 batches | lr 0.000332 | ms/batch 423.80 | loss  4.54 | ppl    93.327
----------------------------------------------------------------------------------------------------
| Eval  45 at step    18000 | time: 175.54s | valid loss  4.41 | valid ppl    82.490
----------------------------------------------------------------------------------------------------
| epoch  42 step    18050 |    174 batches | lr 0.000332 | ms/batch 597.18 | loss  4.55 | ppl    94.726
| epoch  42 step    18100 |    224 batches | lr 0.000332 | ms/batch 424.41 | loss  4.57 | ppl    96.916
| epoch  42 step    18150 |    274 batches | lr 0.000332 | ms/batch 423.18 | loss  4.56 | ppl    95.804
| epoch  42 step    18200 |    324 batches | lr 0.000332 | ms/batch 426.16 | loss  4.52 | ppl    91.651
| epoch  42 step    18250 |    374 batches | lr 0.000332 | ms/batch 424.44 | loss  4.52 | ppl    91.726
| epoch  42 step    18300 |    424 batches | lr 0.000332 | ms/batch 425.00 | loss  4.55 | ppl    95.041
| epoch  43 step    18350 |     38 batches | lr 0.000332 | ms/batch 416.94 | loss  4.56 | ppl    95.505
| epoch  43 step    18400 |     88 batches | lr 0.000332 | ms/batch 423.80 | loss  4.50 | ppl    90.096
----------------------------------------------------------------------------------------------------
| Eval  46 at step    18400 | time: 175.67s | valid loss  4.43 | valid ppl    83.929
----------------------------------------------------------------------------------------------------
| epoch  43 step    18450 |    138 batches | lr 0.000332 | ms/batch 549.88 | loss  4.53 | ppl    93.177
| epoch  43 step    18500 |    188 batches | lr 0.000331 | ms/batch 424.15 | loss  4.54 | ppl    93.741
| epoch  43 step    18550 |    238 batches | lr 0.000331 | ms/batch 424.25 | loss  4.56 | ppl    95.884
| epoch  43 step    18600 |    288 batches | lr 0.000331 | ms/batch 424.96 | loss  4.60 | ppl    99.018
| epoch  43 step    18650 |    338 batches | lr 0.000331 | ms/batch 424.13 | loss  4.46 | ppl    86.520
| epoch  43 step    18700 |    388 batches | lr 0.000331 | ms/batch 424.34 | loss  4.55 | ppl    94.541
| epoch  44 step    18750 |      2 batches | lr 0.000331 | ms/batch 416.12 | loss  4.56 | ppl    95.803
| epoch  44 step    18800 |     52 batches | lr 0.000331 | ms/batch 425.40 | loss  4.54 | ppl    93.433
----------------------------------------------------------------------------------------------------
| Eval  47 at step    18800 | time: 175.68s | valid loss  4.41 | valid ppl    82.245
----------------------------------------------------------------------------------------------------
| epoch  44 step    18850 |    102 batches | lr 0.000331 | ms/batch 581.16 | loss  4.53 | ppl    92.494
| epoch  44 step    18900 |    152 batches | lr 0.000331 | ms/batch 424.15 | loss  4.54 | ppl    93.292
| epoch  44 step    18950 |    202 batches | lr 0.000331 | ms/batch 427.58 | loss  4.56 | ppl    95.526
| epoch  44 step    19000 |    252 batches | lr 0.00033 | ms/batch 424.62 | loss  4.55 | ppl    94.632
| epoch  44 step    19050 |    302 batches | lr 0.00033 | ms/batch 423.67 | loss  4.57 | ppl    96.864
| epoch  44 step    19100 |    352 batches | lr 0.00033 | ms/batch 424.94 | loss  4.46 | ppl    86.817
| epoch  44 step    19150 |    402 batches | lr 0.00033 | ms/batch 423.93 | loss  4.53 | ppl    93.118
| epoch  45 step    19200 |     16 batches | lr 0.00033 | ms/batch 415.65 | loss  4.54 | ppl    93.906
----------------------------------------------------------------------------------------------------
| Eval  48 at step    19200 | time: 175.81s | valid loss  4.43 | valid ppl    83.813
----------------------------------------------------------------------------------------------------
| epoch  45 step    19250 |     66 batches | lr 0.00033 | ms/batch 549.88 | loss  4.48 | ppl    88.615
| epoch  45 step    19300 |    116 batches | lr 0.00033 | ms/batch 424.32 | loss  4.51 | ppl    91.377
| epoch  45 step    19350 |    166 batches | lr 0.00033 | ms/batch 423.59 | loss  4.52 | ppl    91.537
| epoch  45 step    19400 |    216 batches | lr 0.00033 | ms/batch 425.01 | loss  4.56 | ppl    95.353
| epoch  45 step    19450 |    266 batches | lr 0.00033 | ms/batch 424.33 | loss  4.55 | ppl    94.762
| epoch  45 step    19500 |    316 batches | lr 0.000329 | ms/batch 424.29 | loss  4.53 | ppl    92.891
| epoch  45 step    19550 |    366 batches | lr 0.000329 | ms/batch 425.45 | loss  4.47 | ppl    87.091
| epoch  45 step    19600 |    416 batches | lr 0.000329 | ms/batch 423.75 | loss  4.51 | ppl    90.884
----------------------------------------------------------------------------------------------------
| Eval  49 at step    19600 | time: 176.04s | valid loss  4.40 | valid ppl    81.525
----------------------------------------------------------------------------------------------------
| epoch  46 step    19650 |     30 batches | lr 0.000329 | ms/batch 572.78 | loss  4.56 | ppl    95.830
| epoch  46 step    19700 |     80 batches | lr 0.000329 | ms/batch 425.11 | loss  4.50 | ppl    89.717
| epoch  46 step    19750 |    130 batches | lr 0.000329 | ms/batch 424.67 | loss  4.49 | ppl    88.726
| epoch  46 step    19800 |    180 batches | lr 0.000329 | ms/batch 424.13 | loss  4.53 | ppl    92.488
| epoch  46 step    19850 |    230 batches | lr 0.000329 | ms/batch 424.16 | loss  4.54 | ppl    93.975
| epoch  46 step    19900 |    280 batches | lr 0.000329 | ms/batch 423.49 | loss  4.54 | ppl    94.124
| epoch  46 step    19950 |    330 batches | lr 0.000328 | ms/batch 423.63 | loss  4.45 | ppl    85.851
| epoch  46 step    20000 |    380 batches | lr 0.000328 | ms/batch 423.57 | loss  4.50 | ppl    89.992
----------------------------------------------------------------------------------------------------
| Eval  50 at step    20000 | time: 175.54s | valid loss  4.41 | valid ppl    81.938
----------------------------------------------------------------------------------------------------
| epoch  46 step    20050 |    430 batches | lr 0.000328 | ms/batch 550.10 | loss  4.55 | ppl    94.319
| epoch  47 step    20100 |     44 batches | lr 0.000328 | ms/batch 417.59 | loss  4.51 | ppl    90.643
| epoch  47 step    20150 |     94 batches | lr 0.000328 | ms/batch 424.53 | loss  4.45 | ppl    85.625
| epoch  47 step    20200 |    144 batches | lr 0.000328 | ms/batch 424.18 | loss  4.48 | ppl    88.131
| epoch  47 step    20250 |    194 batches | lr 0.000328 | ms/batch 426.49 | loss  4.51 | ppl    91.363
| epoch  47 step    20300 |    244 batches | lr 0.000328 | ms/batch 425.50 | loss  4.52 | ppl    91.999
| epoch  47 step    20350 |    294 batches | lr 0.000328 | ms/batch 425.33 | loss  4.56 | ppl    95.254
| epoch  47 step    20400 |    344 batches | lr 0.000327 | ms/batch 425.24 | loss  4.42 | ppl    82.850
----------------------------------------------------------------------------------------------------
| Eval  51 at step    20400 | time: 176.45s | valid loss  4.40 | valid ppl    81.666
----------------------------------------------------------------------------------------------------
| epoch  47 step    20450 |    394 batches | lr 0.000327 | ms/batch 552.42 | loss  4.52 | ppl    91.803
| epoch  48 step    20500 |      8 batches | lr 0.000327 | ms/batch 415.95 | loss  4.53 | ppl    93.164
| epoch  48 step    20550 |     58 batches | lr 0.000327 | ms/batch 424.65 | loss  4.48 | ppl    88.042
| epoch  48 step    20600 |    108 batches | lr 0.000327 | ms/batch 424.97 | loss  4.48 | ppl    88.585
| epoch  48 step    20650 |    158 batches | lr 0.000327 | ms/batch 425.75 | loss  4.50 | ppl    89.863
| epoch  48 step    20700 |    208 batches | lr 0.000327 | ms/batch 424.76 | loss  4.52 | ppl    92.086
| epoch  48 step    20750 |    258 batches | lr 0.000327 | ms/batch 424.32 | loss  4.52 | ppl    91.378
| epoch  48 step    20800 |    308 batches | lr 0.000327 | ms/batch 424.09 | loss  4.51 | ppl    91.346
----------------------------------------------------------------------------------------------------
| Eval  52 at step    20800 | time: 175.84s | valid loss  4.41 | valid ppl    81.896
----------------------------------------------------------------------------------------------------
| epoch  48 step    20850 |    358 batches | lr 0.000327 | ms/batch 551.36 | loss  4.43 | ppl    84.252
| epoch  48 step    20900 |    408 batches | lr 0.000326 | ms/batch 426.22 | loss  4.47 | ppl    87.600
| epoch  49 step    20950 |     22 batches | lr 0.000326 | ms/batch 417.24 | loss  4.55 | ppl    94.626
| epoch  49 step    21000 |     72 batches | lr 0.000326 | ms/batch 425.05 | loss  4.44 | ppl    84.542
| epoch  49 step    21050 |    122 batches | lr 0.000326 | ms/batch 424.14 | loss  4.48 | ppl    88.462
| epoch  49 step    21100 |    172 batches | lr 0.000326 | ms/batch 425.04 | loss  4.49 | ppl    89.343
| epoch  49 step    21150 |    222 batches | lr 0.000326 | ms/batch 424.84 | loss  4.51 | ppl    91.091
| epoch  49 step    21200 |    272 batches | lr 0.000326 | ms/batch 424.78 | loss  4.52 | ppl    91.459
----------------------------------------------------------------------------------------------------
| Eval  53 at step    21200 | time: 175.93s | valid loss  4.39 | valid ppl    80.913
----------------------------------------------------------------------------------------------------
| epoch  49 step    21250 |    322 batches | lr 0.000326 | ms/batch 580.00 | loss  4.48 | ppl    88.270
| epoch  49 step    21300 |    372 batches | lr 0.000326 | ms/batch 424.33 | loss  4.46 | ppl    86.153
| epoch  49 step    21350 |    422 batches | lr 0.000325 | ms/batch 424.54 | loss  4.46 | ppl    86.491
| epoch  50 step    21400 |     36 batches | lr 0.000325 | ms/batch 417.07 | loss  4.49 | ppl    89.527
| epoch  50 step    21450 |     86 batches | lr 0.000325 | ms/batch 425.36 | loss  4.46 | ppl    86.127
| epoch  50 step    21500 |    136 batches | lr 0.000325 | ms/batch 424.83 | loss  4.47 | ppl    87.500
| epoch  50 step    21550 |    186 batches | lr 0.000325 | ms/batch 424.48 | loss  4.48 | ppl    88.350
| epoch  50 step    21600 |    236 batches | lr 0.000325 | ms/batch 425.97 | loss  4.48 | ppl    88.002
----------------------------------------------------------------------------------------------------
| Eval  54 at step    21600 | time: 175.89s | valid loss  4.39 | valid ppl    80.917
----------------------------------------------------------------------------------------------------
| epoch  50 step    21650 |    286 batches | lr 0.000325 | ms/batch 554.35 | loss  4.53 | ppl    92.393
| epoch  50 step    21700 |    336 batches | lr 0.000325 | ms/batch 426.41 | loss  4.39 | ppl    80.693
| epoch  50 step    21750 |    386 batches | lr 0.000324 | ms/batch 425.73 | loss  4.50 | ppl    90.112
| epoch  50 step    21800 |    436 batches | lr 0.000324 | ms/batch 418.89 | loss  4.50 | ppl    89.770
| epoch  51 step    21850 |     50 batches | lr 0.000324 | ms/batch 423.27 | loss  4.47 | ppl    87.338
| epoch  51 step    21900 |    100 batches | lr 0.000324 | ms/batch 423.88 | loss  4.45 | ppl    85.245
| epoch  51 step    21950 |    150 batches | lr 0.000324 | ms/batch 425.32 | loss  4.44 | ppl    85.005
| epoch  51 step    22000 |    200 batches | lr 0.000324 | ms/batch 425.49 | loss  4.49 | ppl    88.733
----------------------------------------------------------------------------------------------------
| Eval  55 at step    22000 | time: 176.15s | valid loss  4.39 | valid ppl    80.501
----------------------------------------------------------------------------------------------------
| epoch  51 step    22050 |    250 batches | lr 0.000324 | ms/batch 580.26 | loss  4.51 | ppl    90.713
| epoch  51 step    22100 |    300 batches | lr 0.000324 | ms/batch 425.47 | loss  4.52 | ppl    92.221
| epoch  51 step    22150 |    350 batches | lr 0.000324 | ms/batch 425.86 | loss  4.38 | ppl    80.106
| epoch  51 step    22200 |    400 batches | lr 0.000323 | ms/batch 423.90 | loss  4.46 | ppl    86.866
| epoch  52 step    22250 |     14 batches | lr 0.000323 | ms/batch 416.56 | loss  4.52 | ppl    91.570
| epoch  52 step    22300 |     64 batches | lr 0.000323 | ms/batch 424.73 | loss  4.44 | ppl    85.127
| epoch  52 step    22350 |    114 batches | lr 0.000323 | ms/batch 424.30 | loss  4.44 | ppl    84.542
| epoch  52 step    22400 |    164 batches | lr 0.000323 | ms/batch 425.91 | loss  4.47 | ppl    87.311
----------------------------------------------------------------------------------------------------
| Eval  56 at step    22400 | time: 175.91s | valid loss  4.38 | valid ppl    80.209
----------------------------------------------------------------------------------------------------
| epoch  52 step    22450 |    214 batches | lr 0.000323 | ms/batch 579.79 | loss  4.50 | ppl    90.090
| epoch  52 step    22500 |    264 batches | lr 0.000323 | ms/batch 424.68 | loss  4.48 | ppl    88.178
| epoch  52 step    22550 |    314 batches | lr 0.000323 | ms/batch 424.13 | loss  4.49 | ppl    88.818
| epoch  52 step    22600 |    364 batches | lr 0.000323 | ms/batch 426.31 | loss  4.41 | ppl    81.926
| epoch  52 step    22650 |    414 batches | lr 0.000322 | ms/batch 424.09 | loss  4.45 | ppl    86.048
| epoch  53 step    22700 |     28 batches | lr 0.000322 | ms/batch 415.30 | loss  4.50 | ppl    90.387
| epoch  53 step    22750 |     78 batches | lr 0.000322 | ms/batch 426.17 | loss  4.42 | ppl    83.083
| epoch  53 step    22800 |    128 batches | lr 0.000322 | ms/batch 424.95 | loss  4.45 | ppl    85.269
----------------------------------------------------------------------------------------------------
| Eval  57 at step    22800 | time: 175.82s | valid loss  4.39 | valid ppl    80.368
----------------------------------------------------------------------------------------------------
| epoch  53 step    22850 |    178 batches | lr 0.000322 | ms/batch 549.99 | loss  4.47 | ppl    87.704
| epoch  53 step    22900 |    228 batches | lr 0.000322 | ms/batch 424.42 | loss  4.46 | ppl    86.200
| epoch  53 step    22950 |    278 batches | lr 0.000322 | ms/batch 423.59 | loss  4.48 | ppl    88.214
| epoch  53 step    23000 |    328 batches | lr 0.000322 | ms/batch 424.86 | loss  4.41 | ppl    81.922
| epoch  53 step    23050 |    378 batches | lr 0.000321 | ms/batch 424.19 | loss  4.44 | ppl    84.922
| epoch  53 step    23100 |    428 batches | lr 0.000321 | ms/batch 424.03 | loss  4.49 | ppl    88.724
| epoch  54 step    23150 |     42 batches | lr 0.000321 | ms/batch 416.42 | loss  4.45 | ppl    85.829
| epoch  54 step    23200 |     92 batches | lr 0.000321 | ms/batch 424.67 | loss  4.41 | ppl    82.150
----------------------------------------------------------------------------------------------------
| Eval  58 at step    23200 | time: 175.60s | valid loss  4.37 | valid ppl    79.124
----------------------------------------------------------------------------------------------------
| epoch  54 step    23250 |    142 batches | lr 0.000321 | ms/batch 579.39 | loss  4.46 | ppl    86.164
| epoch  54 step    23300 |    192 batches | lr 0.000321 | ms/batch 425.82 | loss  4.48 | ppl    87.829
| epoch  54 step    23350 |    242 batches | lr 0.000321 | ms/batch 426.41 | loss  4.48 | ppl    88.529
| epoch  54 step    23400 |    292 batches | lr 0.000321 | ms/batch 426.13 | loss  4.49 | ppl    89.282
| epoch  54 step    23450 |    342 batches | lr 0.00032 | ms/batch 425.55 | loss  4.37 | ppl    79.105
| epoch  54 step    23500 |    392 batches | lr 0.00032 | ms/batch 424.34 | loss  4.46 | ppl    86.822
| epoch  55 step    23550 |      6 batches | lr 0.00032 | ms/batch 417.76 | loss  4.48 | ppl    88.071
| epoch  55 step    23600 |     56 batches | lr 0.00032 | ms/batch 424.39 | loss  4.44 | ppl    84.800
----------------------------------------------------------------------------------------------------
| Eval  59 at step    23600 | time: 176.05s | valid loss  4.37 | valid ppl    79.201
----------------------------------------------------------------------------------------------------
| epoch  55 step    23650 |    106 batches | lr 0.00032 | ms/batch 552.49 | loss  4.41 | ppl    82.231
| epoch  55 step    23700 |    156 batches | lr 0.00032 | ms/batch 425.14 | loss  4.45 | ppl    85.380
| epoch  55 step    23750 |    206 batches | lr 0.00032 | ms/batch 423.55 | loss  4.45 | ppl    85.261
| epoch  55 step    23800 |    256 batches | lr 0.00032 | ms/batch 424.10 | loss  4.45 | ppl    85.529
| epoch  55 step    23850 |    306 batches | lr 0.000319 | ms/batch 425.67 | loss  4.47 | ppl    87.288
| epoch  55 step    23900 |    356 batches | lr 0.000319 | ms/batch 426.12 | loss  4.36 | ppl    78.583
| epoch  55 step    23950 |    406 batches | lr 0.000319 | ms/batch 425.79 | loss  4.43 | ppl    84.126
| epoch  56 step    24000 |     20 batches | lr 0.000319 | ms/batch 415.14 | loss  4.48 | ppl    88.607
----------------------------------------------------------------------------------------------------
| Eval  60 at step    24000 | time: 175.88s | valid loss  4.38 | valid ppl    79.476
----------------------------------------------------------------------------------------------------
| epoch  56 step    24050 |     70 batches | lr 0.000319 | ms/batch 549.04 | loss  4.40 | ppl    81.301
| epoch  56 step    24100 |    120 batches | lr 0.000319 | ms/batch 424.84 | loss  4.42 | ppl    83.310
| epoch  56 step    24150 |    170 batches | lr 0.000319 | ms/batch 426.22 | loss  4.42 | ppl    82.844
| epoch  56 step    24200 |    220 batches | lr 0.000319 | ms/batch 425.28 | loss  4.46 | ppl    86.277
| epoch  56 step    24250 |    270 batches | lr 0.000318 | ms/batch 425.13 | loss  4.47 | ppl    87.286
| epoch  56 step    24300 |    320 batches | lr 0.000318 | ms/batch 424.43 | loss  4.42 | ppl    83.123
| epoch  56 step    24350 |    370 batches | lr 0.000318 | ms/batch 424.32 | loss  4.39 | ppl    80.929
| epoch  56 step    24400 |    420 batches | lr 0.000318 | ms/batch 425.21 | loss  4.42 | ppl    83.163
----------------------------------------------------------------------------------------------------
| Eval  61 at step    24400 | time: 176.25s | valid loss  4.36 | valid ppl    78.458
----------------------------------------------------------------------------------------------------
| epoch  57 step    24450 |     34 batches | lr 0.000318 | ms/batch 571.30 | loss  4.45 | ppl    85.883
| epoch  57 step    24500 |     84 batches | lr 0.000318 | ms/batch 424.36 | loss  4.36 | ppl    78.539
| epoch  57 step    24550 |    134 batches | lr 0.000318 | ms/batch 425.01 | loss  4.43 | ppl    83.998
| epoch  57 step    24600 |    184 batches | lr 0.000318 | ms/batch 424.05 | loss  4.44 | ppl    84.361
| epoch  57 step    24650 |    234 batches | lr 0.000317 | ms/batch 424.93 | loss  4.46 | ppl    86.572
| epoch  57 step    24700 |    284 batches | lr 0.000317 | ms/batch 425.09 | loss  4.46 | ppl    86.307
| epoch  57 step    24750 |    334 batches | lr 0.000317 | ms/batch 424.61 | loss  4.37 | ppl    78.723
| epoch  57 step    24800 |    384 batches | lr 0.000317 | ms/batch 423.61 | loss  4.41 | ppl    81.909
----------------------------------------------------------------------------------------------------
| Eval  62 at step    24800 | time: 175.68s | valid loss  4.37 | valid ppl    78.912
----------------------------------------------------------------------------------------------------
| epoch  57 step    24850 |    434 batches | lr 0.000317 | ms/batch 551.28 | loss  4.47 | ppl    87.018
| epoch  58 step    24900 |     48 batches | lr 0.000317 | ms/batch 415.10 | loss  4.40 | ppl    81.200
| epoch  58 step    24950 |     98 batches | lr 0.000317 | ms/batch 425.65 | loss  4.39 | ppl    80.886
| epoch  58 step    25000 |    148 batches | lr 0.000317 | ms/batch 423.55 | loss  4.44 | ppl    84.828
| epoch  58 step    25050 |    198 batches | lr 0.000316 | ms/batch 424.48 | loss  4.42 | ppl    83.257
| epoch  58 step    25100 |    248 batches | lr 0.000316 | ms/batch 423.25 | loss  4.44 | ppl    85.104
| epoch  58 step    25150 |    298 batches | lr 0.000316 | ms/batch 424.77 | loss  4.49 | ppl    89.090
| epoch  58 step    25200 |    348 batches | lr 0.000316 | ms/batch 424.74 | loss  4.34 | ppl    76.368
----------------------------------------------------------------------------------------------------
| Eval  63 at step    25200 | time: 175.64s | valid loss  4.36 | valid ppl    78.517
----------------------------------------------------------------------------------------------------
| epoch  58 step    25250 |    398 batches | lr 0.000316 | ms/batch 550.07 | loss  4.44 | ppl    84.864
| epoch  59 step    25300 |     12 batches | lr 0.000316 | ms/batch 416.04 | loss  4.47 | ppl    87.209
| epoch  59 step    25350 |     62 batches | lr 0.000316 | ms/batch 424.11 | loss  4.37 | ppl    78.877
| epoch  59 step    25400 |    112 batches | lr 0.000316 | ms/batch 424.93 | loss  4.38 | ppl    79.896
| epoch  59 step    25450 |    162 batches | lr 0.000315 | ms/batch 425.42 | loss  4.42 | ppl    83.027
| epoch  59 step    25500 |    212 batches | lr 0.000315 | ms/batch 426.79 | loss  4.43 | ppl    83.717
| epoch  59 step    25550 |    262 batches | lr 0.000315 | ms/batch 423.59 | loss  4.44 | ppl    84.907
| epoch  59 step    25600 |    312 batches | lr 0.000315 | ms/batch 424.87 | loss  4.43 | ppl    84.278
----------------------------------------------------------------------------------------------------
| Eval  64 at step    25600 | time: 175.84s | valid loss  4.37 | valid ppl    79.252
----------------------------------------------------------------------------------------------------
| epoch  59 step    25650 |    362 batches | lr 0.000315 | ms/batch 553.99 | loss  4.34 | ppl    76.992
| epoch  59 step    25700 |    412 batches | lr 0.000315 | ms/batch 424.93 | loss  4.40 | ppl    81.372
| epoch  60 step    25750 |     26 batches | lr 0.000315 | ms/batch 415.69 | loss  4.45 | ppl    85.505
| epoch  60 step    25800 |     76 batches | lr 0.000314 | ms/batch 424.57 | loss  4.38 | ppl    80.194
| epoch  60 step    25850 |    126 batches | lr 0.000314 | ms/batch 424.45 | loss  4.41 | ppl    82.436
| epoch  60 step    25900 |    176 batches | lr 0.000314 | ms/batch 424.18 | loss  4.42 | ppl    83.108
| epoch  60 step    25950 |    226 batches | lr 0.000314 | ms/batch 424.51 | loss  4.44 | ppl    84.560
| epoch  60 step    26000 |    276 batches | lr 0.000314 | ms/batch 423.45 | loss  4.45 | ppl    85.235
----------------------------------------------------------------------------------------------------
| Eval  65 at step    26000 | time: 175.75s | valid loss  4.35 | valid ppl    77.669
----------------------------------------------------------------------------------------------------
| epoch  60 step    26050 |    326 batches | lr 0.000314 | ms/batch 582.28 | loss  4.36 | ppl    78.622
| epoch  60 step    26100 |    376 batches | lr 0.000314 | ms/batch 423.81 | loss  4.37 | ppl    78.943
| epoch  60 step    26150 |    426 batches | lr 0.000314 | ms/batch 424.20 | loss  4.42 | ppl    82.857
| epoch  61 step    26200 |     40 batches | lr 0.000313 | ms/batch 416.09 | loss  4.40 | ppl    81.241
| epoch  61 step    26250 |     90 batches | lr 0.000313 | ms/batch 424.60 | loss  4.37 | ppl    79.182
| epoch  61 step    26300 |    140 batches | lr 0.000313 | ms/batch 424.38 | loss  4.41 | ppl    82.595
| epoch  61 step    26350 |    190 batches | lr 0.000313 | ms/batch 425.30 | loss  4.42 | ppl    83.075
| epoch  61 step    26400 |    240 batches | lr 0.000313 | ms/batch 425.29 | loss  4.43 | ppl    83.604
----------------------------------------------------------------------------------------------------
| Eval  66 at step    26400 | time: 175.78s | valid loss  4.34 | valid ppl    76.758
----------------------------------------------------------------------------------------------------
| epoch  61 step    26450 |    290 batches | lr 0.000313 | ms/batch 615.93 | loss  4.43 | ppl    84.054
| epoch  61 step    26500 |    340 batches | lr 0.000313 | ms/batch 423.84 | loss  4.30 | ppl    73.985
| epoch  61 step    26550 |    390 batches | lr 0.000312 | ms/batch 423.66 | loss  4.42 | ppl    82.807
| epoch  62 step    26600 |      4 batches | lr 0.000312 | ms/batch 415.19 | loss  4.41 | ppl    82.060
| epoch  62 step    26650 |     54 batches | lr 0.000312 | ms/batch 423.49 | loss  4.37 | ppl    78.775
| epoch  62 step    26700 |    104 batches | lr 0.000312 | ms/batch 426.56 | loss  4.37 | ppl    79.252
| epoch  62 step    26750 |    154 batches | lr 0.000312 | ms/batch 423.53 | loss  4.41 | ppl    82.037
| epoch  62 step    26800 |    204 batches | lr 0.000312 | ms/batch 425.55 | loss  4.40 | ppl    81.260
----------------------------------------------------------------------------------------------------
| Eval  67 at step    26800 | time: 175.69s | valid loss  4.34 | valid ppl    77.064
----------------------------------------------------------------------------------------------------
| epoch  62 step    26850 |    254 batches | lr 0.000312 | ms/batch 553.27 | loss  4.41 | ppl    82.253
| epoch  62 step    26900 |    304 batches | lr 0.000312 | ms/batch 423.48 | loss  4.45 | ppl    85.318
| epoch  62 step    26950 |    354 batches | lr 0.000311 | ms/batch 423.70 | loss  4.32 | ppl    75.103
| epoch  62 step    27000 |    404 batches | lr 0.000311 | ms/batch 424.01 | loss  4.40 | ppl    81.246
| epoch  63 step    27050 |     18 batches | lr 0.000311 | ms/batch 416.38 | loss  4.43 | ppl    83.866
| epoch  63 step    27100 |     68 batches | lr 0.000311 | ms/batch 424.87 | loss  4.34 | ppl    76.470
| epoch  63 step    27150 |    118 batches | lr 0.000311 | ms/batch 424.76 | loss  4.36 | ppl    78.337
| epoch  63 step    27200 |    168 batches | lr 0.000311 | ms/batch 424.83 | loss  4.40 | ppl    81.061
----------------------------------------------------------------------------------------------------
| Eval  68 at step    27200 | time: 175.78s | valid loss  4.34 | valid ppl    77.085
----------------------------------------------------------------------------------------------------
| epoch  63 step    27250 |    218 batches | lr 0.000311 | ms/batch 549.85 | loss  4.41 | ppl    82.532
| epoch  63 step    27300 |    268 batches | lr 0.00031 | ms/batch 424.46 | loss  4.42 | ppl    82.793
| epoch  63 step    27350 |    318 batches | lr 0.00031 | ms/batch 423.20 | loss  4.39 | ppl    80.900
| epoch  63 step    27400 |    368 batches | lr 0.00031 | ms/batch 426.03 | loss  4.35 | ppl    77.741
| epoch  63 step    27450 |    418 batches | lr 0.00031 | ms/batch 423.37 | loss  4.38 | ppl    80.183
| epoch  64 step    27500 |     32 batches | lr 0.00031 | ms/batch 415.20 | loss  4.41 | ppl    82.283
| epoch  64 step    27550 |     82 batches | lr 0.00031 | ms/batch 423.23 | loss  4.33 | ppl    75.711
| epoch  64 step    27600 |    132 batches | lr 0.00031 | ms/batch 424.61 | loss  4.39 | ppl    80.552
----------------------------------------------------------------------------------------------------
| Eval  69 at step    27600 | time: 175.53s | valid loss  4.33 | valid ppl    75.886
----------------------------------------------------------------------------------------------------
| epoch  64 step    27650 |    182 batches | lr 0.000309 | ms/batch 579.74 | loss  4.38 | ppl    80.191
| epoch  64 step    27700 |    232 batches | lr 0.000309 | ms/batch 425.02 | loss  4.41 | ppl    82.242
| epoch  64 step    27750 |    282 batches | lr 0.000309 | ms/batch 426.18 | loss  4.42 | ppl    83.235
| epoch  64 step    27800 |    332 batches | lr 0.000309 | ms/batch 426.06 | loss  4.31 | ppl    74.698
| epoch  64 step    27850 |    382 batches | lr 0.000309 | ms/batch 424.45 | loss  4.36 | ppl    78.389
| epoch  64 step    27900 |    432 batches | lr 0.000309 | ms/batch 423.90 | loss  4.40 | ppl    81.053
| epoch  65 step    27950 |     46 batches | lr 0.000309 | ms/batch 414.58 | loss  4.38 | ppl    79.954
| epoch  65 step    28000 |     96 batches | lr 0.000308 | ms/batch 426.37 | loss  4.31 | ppl    74.477
----------------------------------------------------------------------------------------------------
| Eval  70 at step    28000 | time: 175.86s | valid loss  4.35 | valid ppl    77.278
----------------------------------------------------------------------------------------------------
| epoch  65 step    28050 |    146 batches | lr 0.000308 | ms/batch 552.62 | loss  4.40 | ppl    81.427
| epoch  65 step    28100 |    196 batches | lr 0.000308 | ms/batch 426.46 | loss  4.40 | ppl    81.513
| epoch  65 step    28150 |    246 batches | lr 0.000308 | ms/batch 425.79 | loss  4.40 | ppl    81.231
| epoch  65 step    28200 |    296 batches | lr 0.000308 | ms/batch 424.73 | loss  4.43 | ppl    83.723
| epoch  65 step    28250 |    346 batches | lr 0.000308 | ms/batch 424.30 | loss  4.29 | ppl    73.301
| epoch  65 step    28300 |    396 batches | lr 0.000308 | ms/batch 425.65 | loss  4.38 | ppl    79.982
| epoch  66 step    28350 |     10 batches | lr 0.000307 | ms/batch 417.61 | loss  4.39 | ppl    80.648
| epoch  66 step    28400 |     60 batches | lr 0.000307 | ms/batch 425.81 | loss  4.32 | ppl    75.205
----------------------------------------------------------------------------------------------------
| Eval  71 at step    28400 | time: 176.15s | valid loss  4.33 | valid ppl    75.591
----------------------------------------------------------------------------------------------------
| epoch  66 step    28450 |    110 batches | lr 0.000307 | ms/batch 676.01 | loss  4.34 | ppl    77.054
| epoch  66 step    28500 |    160 batches | lr 0.000307 | ms/batch 424.50 | loss  4.37 | ppl    78.957
| epoch  66 step    28550 |    210 batches | lr 0.000307 | ms/batch 425.23 | loss  4.39 | ppl    80.747
| epoch  66 step    28600 |    260 batches | lr 0.000307 | ms/batch 426.88 | loss  4.38 | ppl    79.989
| epoch  66 step    28650 |    310 batches | lr 0.000307 | ms/batch 425.50 | loss  4.40 | ppl    81.567
| epoch  66 step    28700 |    360 batches | lr 0.000306 | ms/batch 427.53 | loss  4.29 | ppl    73.217
| epoch  66 step    28750 |    410 batches | lr 0.000306 | ms/batch 424.23 | loss  4.36 | ppl    78.520
| epoch  67 step    28800 |     24 batches | lr 0.000306 | ms/batch 418.24 | loss  4.40 | ppl    81.624
----------------------------------------------------------------------------------------------------
| Eval  72 at step    28800 | time: 176.20s | valid loss  4.34 | valid ppl    76.365
----------------------------------------------------------------------------------------------------
| epoch  67 step    28850 |     74 batches | lr 0.000306 | ms/batch 552.09 | loss  4.31 | ppl    74.661
| epoch  67 step    28900 |    124 batches | lr 0.000306 | ms/batch 425.78 | loss  4.35 | ppl    77.401
| epoch  67 step    28950 |    174 batches | lr 0.000306 | ms/batch 425.57 | loss  4.39 | ppl    80.608
| epoch  67 step    29000 |    224 batches | lr 0.000306 | ms/batch 424.22 | loss  4.38 | ppl    80.179
| epoch  67 step    29050 |    274 batches | lr 0.000305 | ms/batch 425.68 | loss  4.38 | ppl    79.649
| epoch  67 step    29100 |    324 batches | lr 0.000305 | ms/batch 426.27 | loss  4.34 | ppl    76.721
| epoch  67 step    29150 |    374 batches | lr 0.000305 | ms/batch 425.07 | loss  4.34 | ppl    76.445
| epoch  67 step    29200 |    424 batches | lr 0.000305 | ms/batch 423.88 | loss  4.35 | ppl    77.181
----------------------------------------------------------------------------------------------------
| Eval  73 at step    29200 | time: 176.41s | valid loss  4.34 | valid ppl    76.535
----------------------------------------------------------------------------------------------------
| epoch  68 step    29250 |     38 batches | lr 0.000305 | ms/batch 541.94 | loss  4.37 | ppl    79.337
| epoch  68 step    29300 |     88 batches | lr 0.000305 | ms/batch 424.31 | loss  4.32 | ppl    75.270
| epoch  68 step    29350 |    138 batches | lr 0.000305 | ms/batch 425.63 | loss  4.36 | ppl    78.457
| epoch  68 step    29400 |    188 batches | lr 0.000304 | ms/batch 425.27 | loss  4.37 | ppl    78.775
| epoch  68 step    29450 |    238 batches | lr 0.000304 | ms/batch 423.87 | loss  4.40 | ppl    81.172
| epoch  68 step    29500 |    288 batches | lr 0.000304 | ms/batch 424.22 | loss  4.41 | ppl    82.403
| epoch  68 step    29550 |    338 batches | lr 0.000304 | ms/batch 423.83 | loss  4.29 | ppl    73.298
| epoch  68 step    29600 |    388 batches | lr 0.000304 | ms/batch 424.61 | loss  4.35 | ppl    77.644
----------------------------------------------------------------------------------------------------
| Eval  74 at step    29600 | time: 175.71s | valid loss  4.33 | valid ppl    76.184
----------------------------------------------------------------------------------------------------
| epoch  69 step    29650 |      2 batches | lr 0.000304 | ms/batch 542.64 | loss  4.38 | ppl    79.851
| epoch  69 step    29700 |     52 batches | lr 0.000303 | ms/batch 424.45 | loss  4.31 | ppl    74.593
| epoch  69 step    29750 |    102 batches | lr 0.000303 | ms/batch 424.48 | loss  4.31 | ppl    74.568
| epoch  69 step    29800 |    152 batches | lr 0.000303 | ms/batch 425.42 | loss  4.35 | ppl    77.725
| epoch  69 step    29850 |    202 batches | lr 0.000303 | ms/batch 423.66 | loss  4.37 | ppl    78.731
| epoch  69 step    29900 |    252 batches | lr 0.000303 | ms/batch 424.52 | loss  4.39 | ppl    81.003
| epoch  69 step    29950 |    302 batches | lr 0.000303 | ms/batch 426.09 | loss  4.40 | ppl    81.573
| epoch  69 step    30000 |    352 batches | lr 0.000303 | ms/batch 423.54 | loss  4.26 | ppl    70.676
----------------------------------------------------------------------------------------------------
| Eval  75 at step    30000 | time: 175.76s | valid loss  4.32 | valid ppl    75.397
----------------------------------------------------------------------------------------------------
| epoch  69 step    30050 |    402 batches | lr 0.000302 | ms/batch 581.16 | loss  4.34 | ppl    77.074
| epoch  70 step    30100 |     16 batches | lr 0.000302 | ms/batch 416.96 | loss  4.36 | ppl    78.378
| epoch  70 step    30150 |     66 batches | lr 0.000302 | ms/batch 423.16 | loss  4.30 | ppl    73.385
| epoch  70 step    30200 |    116 batches | lr 0.000302 | ms/batch 426.10 | loss  4.34 | ppl    76.441
| epoch  70 step    30250 |    166 batches | lr 0.000302 | ms/batch 424.65 | loss  4.34 | ppl    76.719
| epoch  70 step    30300 |    216 batches | lr 0.000302 | ms/batch 423.18 | loss  4.39 | ppl    80.719
| epoch  70 step    30350 |    266 batches | lr 0.000302 | ms/batch 423.78 | loss  4.38 | ppl    79.651
| epoch  70 step    30400 |    316 batches | lr 0.000301 | ms/batch 422.85 | loss  4.37 | ppl    78.683
----------------------------------------------------------------------------------------------------
| Eval  76 at step    30400 | time: 175.68s | valid loss  4.33 | valid ppl    76.055
----------------------------------------------------------------------------------------------------
| epoch  70 step    30450 |    366 batches | lr 0.000301 | ms/batch 550.14 | loss  4.27 | ppl    71.347
| epoch  70 step    30500 |    416 batches | lr 0.000301 | ms/batch 424.12 | loss  4.36 | ppl    77.955
| epoch  71 step    30550 |     30 batches | lr 0.000301 | ms/batch 418.00 | loss  4.36 | ppl    77.973
| epoch  71 step    30600 |     80 batches | lr 0.000301 | ms/batch 425.08 | loss  4.31 | ppl    74.343
| epoch  71 step    30650 |    130 batches | lr 0.000301 | ms/batch 427.31 | loss  4.36 | ppl    78.115
| epoch  71 step    30700 |    180 batches | lr 0.0003 | ms/batch 424.01 | loss  4.32 | ppl    75.544
| epoch  71 step    30750 |    230 batches | lr 0.0003 | ms/batch 426.89 | loss  4.37 | ppl    79.042
| epoch  71 step    30800 |    280 batches | lr 0.0003 | ms/batch 425.05 | loss  4.39 | ppl    80.734
----------------------------------------------------------------------------------------------------
| Eval  77 at step    30800 | time: 176.04s | valid loss  4.31 | valid ppl    74.785
----------------------------------------------------------------------------------------------------
| epoch  71 step    30850 |    330 batches | lr 0.0003 | ms/batch 625.89 | loss  4.29 | ppl    73.168
| epoch  71 step    30900 |    380 batches | lr 0.0003 | ms/batch 426.74 | loss  4.33 | ppl    75.695
| epoch  71 step    30950 |    430 batches | lr 0.0003 | ms/batch 426.86 | loss  4.37 | ppl    79.002
| epoch  72 step    31000 |     44 batches | lr 0.0003 | ms/batch 417.93 | loss  4.35 | ppl    77.361
| epoch  72 step    31050 |     94 batches | lr 0.000299 | ms/batch 424.47 | loss  4.30 | ppl    73.435
| epoch  72 step    31100 |    144 batches | lr 0.000299 | ms/batch 425.33 | loss  4.33 | ppl    75.863
| epoch  72 step    31150 |    194 batches | lr 0.000299 | ms/batch 424.11 | loss  4.34 | ppl    77.078
| epoch  72 step    31200 |    244 batches | lr 0.000299 | ms/batch 424.57 | loss  4.36 | ppl    78.409
----------------------------------------------------------------------------------------------------
| Eval  78 at step    31200 | time: 176.25s | valid loss  4.32 | valid ppl    75.049
----------------------------------------------------------------------------------------------------
| epoch  72 step    31250 |    294 batches | lr 0.000299 | ms/batch 551.14 | loss  4.38 | ppl    79.965
| epoch  72 step    31300 |    344 batches | lr 0.000299 | ms/batch 424.41 | loss  4.24 | ppl    69.646
| epoch  72 step    31350 |    394 batches | lr 0.000298 | ms/batch 424.93 | loss  4.34 | ppl    77.060
| epoch  73 step    31400 |      8 batches | lr 0.000298 | ms/batch 415.44 | loss  4.37 | ppl    78.821
| epoch  73 step    31450 |     58 batches | lr 0.000298 | ms/batch 425.79 | loss  4.29 | ppl    72.888
| epoch  73 step    31500 |    108 batches | lr 0.000298 | ms/batch 425.36 | loss  4.31 | ppl    74.738
| epoch  73 step    31550 |    158 batches | lr 0.000298 | ms/batch 424.69 | loss  4.34 | ppl    76.412
| epoch  73 step    31600 |    208 batches | lr 0.000298 | ms/batch 424.10 | loss  4.34 | ppl    76.583
----------------------------------------------------------------------------------------------------
| Eval  79 at step    31600 | time: 175.79s | valid loss  4.31 | valid ppl    74.540
----------------------------------------------------------------------------------------------------
| epoch  73 step    31650 |    258 batches | lr 0.000297 | ms/batch 583.90 | loss  4.37 | ppl    79.105
| epoch  73 step    31700 |    308 batches | lr 0.000297 | ms/batch 427.06 | loss  4.35 | ppl    77.192
| epoch  73 step    31750 |    358 batches | lr 0.000297 | ms/batch 427.29 | loss  4.26 | ppl    70.685
| epoch  73 step    31800 |    408 batches | lr 0.000297 | ms/batch 425.69 | loss  4.33 | ppl    76.127
| epoch  74 step    31850 |     22 batches | lr 0.000297 | ms/batch 417.05 | loss  4.38 | ppl    79.898
| epoch  74 step    31900 |     72 batches | lr 0.000297 | ms/batch 424.93 | loss  4.29 | ppl    73.029
| epoch  74 step    31950 |    122 batches | lr 0.000297 | ms/batch 425.28 | loss  4.33 | ppl    76.320
| epoch  74 step    32000 |    172 batches | lr 0.000296 | ms/batch 425.47 | loss  4.31 | ppl    74.722
----------------------------------------------------------------------------------------------------
| Eval  80 at step    32000 | time: 176.38s | valid loss  4.32 | valid ppl    75.060
----------------------------------------------------------------------------------------------------
| epoch  74 step    32050 |    222 batches | lr 0.000296 | ms/batch 552.98 | loss  4.36 | ppl    78.390
| epoch  74 step    32100 |    272 batches | lr 0.000296 | ms/batch 424.74 | loss  4.35 | ppl    77.370
| epoch  74 step    32150 |    322 batches | lr 0.000296 | ms/batch 425.38 | loss  4.31 | ppl    74.243
| epoch  74 step    32200 |    372 batches | lr 0.000296 | ms/batch 424.23 | loss  4.31 | ppl    74.589
| epoch  74 step    32250 |    422 batches | lr 0.000296 | ms/batch 424.03 | loss  4.32 | ppl    75.145
| epoch  75 step    32300 |     36 batches | lr 0.000295 | ms/batch 417.27 | loss  4.35 | ppl    77.194
| epoch  75 step    32350 |     86 batches | lr 0.000295 | ms/batch 425.25 | loss  4.25 | ppl    70.242
| epoch  75 step    32400 |    136 batches | lr 0.000295 | ms/batch 423.66 | loss  4.33 | ppl    75.722
----------------------------------------------------------------------------------------------------
| Eval  81 at step    32400 | time: 175.86s | valid loss  4.31 | valid ppl    74.691
----------------------------------------------------------------------------------------------------
| epoch  75 step    32450 |    186 batches | lr 0.000295 | ms/batch 552.86 | loss  4.33 | ppl    76.276
| epoch  75 step    32500 |    236 batches | lr 0.000295 | ms/batch 426.35 | loss  4.33 | ppl    75.971
| epoch  75 step    32550 |    286 batches | lr 0.000295 | ms/batch 425.48 | loss  4.35 | ppl    77.572
| epoch  75 step    32600 |    336 batches | lr 0.000294 | ms/batch 423.77 | loss  4.25 | ppl    69.782
| epoch  75 step    32650 |    386 batches | lr 0.000294 | ms/batch 423.78 | loss  4.34 | ppl    76.871
| epoch  75 step    32700 |    436 batches | lr 0.000294 | ms/batch 417.81 | loss  4.34 | ppl    76.712
| epoch  76 step    32750 |     50 batches | lr 0.000294 | ms/batch 420.72 | loss  4.29 | ppl    73.096
| epoch  76 step    32800 |    100 batches | lr 0.000294 | ms/batch 426.14 | loss  4.29 | ppl    73.146
----------------------------------------------------------------------------------------------------
| Eval  82 at step    32800 | time: 175.84s | valid loss  4.32 | valid ppl    75.215
----------------------------------------------------------------------------------------------------
| epoch  76 step    32850 |    150 batches | lr 0.000294 | ms/batch 553.05 | loss  4.33 | ppl    76.183
| epoch  76 step    32900 |    200 batches | lr 0.000294 | ms/batch 424.53 | loss  4.33 | ppl    76.152
| epoch  76 step    32950 |    250 batches | lr 0.000293 | ms/batch 424.91 | loss  4.31 | ppl    74.795
| epoch  76 step    33000 |    300 batches | lr 0.000293 | ms/batch 426.02 | loss  4.37 | ppl    79.009
| epoch  76 step    33050 |    350 batches | lr 0.000293 | ms/batch 425.36 | loss  4.23 | ppl    68.829
| epoch  76 step    33100 |    400 batches | lr 0.000293 | ms/batch 425.91 | loss  4.30 | ppl    73.789
| epoch  77 step    33150 |     14 batches | lr 0.000293 | ms/batch 417.23 | loss  4.35 | ppl    77.435
| epoch  77 step    33200 |     64 batches | lr 0.000293 | ms/batch 426.55 | loss  4.26 | ppl    70.557
----------------------------------------------------------------------------------------------------
| Eval  83 at step    33200 | time: 176.22s | valid loss  4.30 | valid ppl    73.373
----------------------------------------------------------------------------------------------------
| epoch  77 step    33250 |    114 batches | lr 0.000292 | ms/batch 604.58 | loss  4.31 | ppl    74.468
| epoch  77 step    33300 |    164 batches | lr 0.000292 | ms/batch 423.83 | loss  4.31 | ppl    74.543
| epoch  77 step    33350 |    214 batches | lr 0.000292 | ms/batch 423.44 | loss  4.30 | ppl    73.711
| epoch  77 step    33400 |    264 batches | lr 0.000292 | ms/batch 423.95 | loss  4.33 | ppl    75.937
| epoch  77 step    33450 |    314 batches | lr 0.000292 | ms/batch 424.69 | loss  4.34 | ppl    76.390
| epoch  77 step    33500 |    364 batches | lr 0.000292 | ms/batch 426.03 | loss  4.26 | ppl    70.885
| epoch  77 step    33550 |    414 batches | lr 0.000291 | ms/batch 424.38 | loss  4.30 | ppl    73.930
| epoch  78 step    33600 |     28 batches | lr 0.000291 | ms/batch 415.28 | loss  4.34 | ppl    76.666
----------------------------------------------------------------------------------------------------
| Eval  84 at step    33600 | time: 175.65s | valid loss  4.30 | valid ppl    73.516
----------------------------------------------------------------------------------------------------
| epoch  78 step    33650 |     78 batches | lr 0.000291 | ms/batch 552.22 | loss  4.26 | ppl    70.736
| epoch  78 step    33700 |    128 batches | lr 0.000291 | ms/batch 425.72 | loss  4.29 | ppl    73.170
| epoch  78 step    33750 |    178 batches | lr 0.000291 | ms/batch 425.31 | loss  4.32 | ppl    75.106
| epoch  78 step    33800 |    228 batches | lr 0.000291 | ms/batch 425.92 | loss  4.31 | ppl    74.471
| epoch  78 step    33850 |    278 batches | lr 0.00029 | ms/batch 425.64 | loss  4.32 | ppl    75.504
| epoch  78 step    33900 |    328 batches | lr 0.00029 | ms/batch 425.37 | loss  4.27 | ppl    71.211
| epoch  78 step    33950 |    378 batches | lr 0.00029 | ms/batch 424.44 | loss  4.30 | ppl    73.426
| epoch  78 step    34000 |    428 batches | lr 0.00029 | ms/batch 423.95 | loss  4.33 | ppl    75.754
----------------------------------------------------------------------------------------------------
| Eval  85 at step    34000 | time: 176.42s | valid loss  4.32 | valid ppl    74.894
----------------------------------------------------------------------------------------------------
| epoch  79 step    34050 |     42 batches | lr 0.00029 | ms/batch 541.86 | loss  4.29 | ppl    73.321
| epoch  79 step    34100 |     92 batches | lr 0.00029 | ms/batch 423.51 | loss  4.26 | ppl    70.944
| epoch  79 step    34150 |    142 batches | lr 0.000289 | ms/batch 424.25 | loss  4.30 | ppl    73.926
| epoch  79 step    34200 |    192 batches | lr 0.000289 | ms/batch 423.71 | loss  4.33 | ppl    75.940
| epoch  79 step    34250 |    242 batches | lr 0.000289 | ms/batch 425.70 | loss  4.33 | ppl    75.782
| epoch  79 step    34300 |    292 batches | lr 0.000289 | ms/batch 426.24 | loss  4.35 | ppl    77.129
| epoch  79 step    34350 |    342 batches | lr 0.000289 | ms/batch 425.00 | loss  4.21 | ppl    67.247
| epoch  79 step    34400 |    392 batches | lr 0.000289 | ms/batch 424.34 | loss  4.30 | ppl    73.888
----------------------------------------------------------------------------------------------------
| Eval  86 at step    34400 | time: 175.75s | valid loss  4.30 | valid ppl    73.777
----------------------------------------------------------------------------------------------------
| epoch  80 step    34450 |      6 batches | lr 0.000288 | ms/batch 543.16 | loss  4.34 | ppl    77.071
| epoch  80 step    34500 |     56 batches | lr 0.000288 | ms/batch 423.64 | loss  4.26 | ppl    70.722
| epoch  80 step    34550 |    106 batches | lr 0.000288 | ms/batch 423.73 | loss  4.27 | ppl    71.600
| epoch  80 step    34600 |    156 batches | lr 0.000288 | ms/batch 423.89 | loss  4.28 | ppl    72.475
| epoch  80 step    34650 |    206 batches | lr 0.000288 | ms/batch 424.44 | loss  4.29 | ppl    72.831
| epoch  80 step    34700 |    256 batches | lr 0.000288 | ms/batch 424.40 | loss  4.34 | ppl    76.881
| epoch  80 step    34750 |    306 batches | lr 0.000287 | ms/batch 424.84 | loss  4.33 | ppl    75.786
| epoch  80 step    34800 |    356 batches | lr 0.000287 | ms/batch 425.07 | loss  4.22 | ppl    68.216
----------------------------------------------------------------------------------------------------
| Eval  87 at step    34800 | time: 175.80s | valid loss  4.30 | valid ppl    73.405
----------------------------------------------------------------------------------------------------
| epoch  80 step    34850 |    406 batches | lr 0.000287 | ms/batch 550.46 | loss  4.30 | ppl    74.048
| epoch  81 step    34900 |     20 batches | lr 0.000287 | ms/batch 416.50 | loss  4.33 | ppl    76.121
| epoch  81 step    34950 |     70 batches | lr 0.000287 | ms/batch 424.02 | loss  4.24 | ppl    69.584
| epoch  81 step    35000 |    120 batches | lr 0.000287 | ms/batch 423.37 | loss  4.28 | ppl    72.599
| epoch  81 step    35050 |    170 batches | lr 0.000286 | ms/batch 424.23 | loss  4.30 | ppl    73.409
| epoch  81 step    35100 |    220 batches | lr 0.000286 | ms/batch 424.33 | loss  4.32 | ppl    75.152
| epoch  81 step    35150 |    270 batches | lr 0.000286 | ms/batch 424.84 | loss  4.32 | ppl    74.981
| epoch  81 step    35200 |    320 batches | lr 0.000286 | ms/batch 423.89 | loss  4.29 | ppl    72.738
----------------------------------------------------------------------------------------------------
| Eval  88 at step    35200 | time: 175.60s | valid loss  4.31 | valid ppl    74.383
----------------------------------------------------------------------------------------------------
| epoch  81 step    35250 |    370 batches | lr 0.000286 | ms/batch 552.73 | loss  4.25 | ppl    69.988
| epoch  81 step    35300 |    420 batches | lr 0.000286 | ms/batch 424.32 | loss  4.29 | ppl    72.656
| epoch  82 step    35350 |     34 batches | lr 0.000285 | ms/batch 418.05 | loss  4.31 | ppl    74.286
| epoch  82 step    35400 |     84 batches | lr 0.000285 | ms/batch 425.38 | loss  4.22 | ppl    68.232
| epoch  82 step    35450 |    134 batches | lr 0.000285 | ms/batch 425.60 | loss  4.30 | ppl    73.596
| epoch  82 step    35500 |    184 batches | lr 0.000285 | ms/batch 425.48 | loss  4.28 | ppl    72.590
| epoch  82 step    35550 |    234 batches | lr 0.000285 | ms/batch 423.99 | loss  4.31 | ppl    74.443
| epoch  82 step    35600 |    284 batches | lr 0.000285 | ms/batch 426.80 | loss  4.33 | ppl    76.039
----------------------------------------------------------------------------------------------------
| Eval  89 at step    35600 | time: 176.12s | valid loss  4.30 | valid ppl    74.000
----------------------------------------------------------------------------------------------------
| epoch  82 step    35650 |    334 batches | lr 0.000284 | ms/batch 551.95 | loss  4.21 | ppl    67.275
| epoch  82 step    35700 |    384 batches | lr 0.000284 | ms/batch 424.37 | loss  4.29 | ppl    72.733
| epoch  82 step    35750 |    434 batches | lr 0.000284 | ms/batch 425.40 | loss  4.33 | ppl    75.699
| epoch  83 step    35800 |     48 batches | lr 0.000284 | ms/batch 418.29 | loss  4.25 | ppl    70.435
| epoch  83 step    35850 |     98 batches | lr 0.000284 | ms/batch 425.86 | loss  4.25 | ppl    69.795
| epoch  83 step    35900 |    148 batches | lr 0.000283 | ms/batch 424.60 | loss  4.27 | ppl    71.276
| epoch  83 step    35950 |    198 batches | lr 0.000283 | ms/batch 425.20 | loss  4.32 | ppl    75.075
| epoch  83 step    36000 |    248 batches | lr 0.000283 | ms/batch 425.05 | loss  4.30 | ppl    74.062
----------------------------------------------------------------------------------------------------
| Eval  90 at step    36000 | time: 176.05s | valid loss  4.29 | valid ppl    73.217
----------------------------------------------------------------------------------------------------
| epoch  83 step    36050 |    298 batches | lr 0.000283 | ms/batch 594.30 | loss  4.33 | ppl    76.132
| epoch  83 step    36100 |    348 batches | lr 0.000283 | ms/batch 427.61 | loss  4.21 | ppl    67.436
| epoch  83 step    36150 |    398 batches | lr 0.000283 | ms/batch 427.41 | loss  4.28 | ppl    71.923
| epoch  84 step    36200 |     12 batches | lr 0.000282 | ms/batch 418.15 | loss  4.33 | ppl    75.653
| epoch  84 step    36250 |     62 batches | lr 0.000282 | ms/batch 427.80 | loss  4.24 | ppl    69.712
| epoch  84 step    36300 |    112 batches | lr 0.000282 | ms/batch 426.80 | loss  4.25 | ppl    69.860
| epoch  84 step    36350 |    162 batches | lr 0.000282 | ms/batch 424.82 | loss  4.26 | ppl    71.003
| epoch  84 step    36400 |    212 batches | lr 0.000282 | ms/batch 424.06 | loss  4.30 | ppl    74.024
----------------------------------------------------------------------------------------------------
| Eval  91 at step    36400 | time: 176.46s | valid loss  4.30 | valid ppl    73.847
----------------------------------------------------------------------------------------------------
| epoch  84 step    36450 |    262 batches | lr 0.000282 | ms/batch 551.69 | loss  4.29 | ppl    73.094
| epoch  84 step    36500 |    312 batches | lr 0.000281 | ms/batch 424.70 | loss  4.29 | ppl    72.872
| epoch  84 step    36550 |    362 batches | lr 0.000281 | ms/batch 425.67 | loss  4.22 | ppl    68.199
| epoch  84 step    36600 |    412 batches | lr 0.000281 | ms/batch 425.35 | loss  4.26 | ppl    70.781
| epoch  85 step    36650 |     26 batches | lr 0.000281 | ms/batch 417.11 | loss  4.34 | ppl    76.336
| epoch  85 step    36700 |     76 batches | lr 0.000281 | ms/batch 425.62 | loss  4.25 | ppl    70.116
| epoch  85 step    36750 |    126 batches | lr 0.000281 | ms/batch 425.25 | loss  4.27 | ppl    71.260
| epoch  85 step    36800 |    176 batches | lr 0.00028 | ms/batch 425.03 | loss  4.30 | ppl    73.386
----------------------------------------------------------------------------------------------------
| Eval  92 at step    36800 | time: 176.04s | valid loss  4.29 | valid ppl    72.650
----------------------------------------------------------------------------------------------------
| epoch  85 step    36850 |    226 batches | lr 0.00028 | ms/batch 582.05 | loss  4.30 | ppl    73.892
| epoch  85 step    36900 |    276 batches | lr 0.00028 | ms/batch 425.48 | loss  4.30 | ppl    73.848
| epoch  85 step    36950 |    326 batches | lr 0.00028 | ms/batch 425.05 | loss  4.24 | ppl    69.358
| epoch  85 step    37000 |    376 batches | lr 0.00028 | ms/batch 424.47 | loss  4.25 | ppl    70.093
| epoch  85 step    37050 |    426 batches | lr 0.000279 | ms/batch 424.98 | loss  4.29 | ppl    72.881
| epoch  86 step    37100 |     40 batches | lr 0.000279 | ms/batch 415.46 | loss  4.29 | ppl    72.873
| epoch  86 step    37150 |     90 batches | lr 0.000279 | ms/batch 425.22 | loss  4.21 | ppl    67.201
| epoch  86 step    37200 |    140 batches | lr 0.000279 | ms/batch 426.65 | loss  4.26 | ppl    70.571
----------------------------------------------------------------------------------------------------
| Eval  93 at step    37200 | time: 176.03s | valid loss  4.29 | valid ppl    73.086
----------------------------------------------------------------------------------------------------
| epoch  86 step    37250 |    190 batches | lr 0.000279 | ms/batch 551.90 | loss  4.29 | ppl    72.975
| epoch  86 step    37300 |    240 batches | lr 0.000279 | ms/batch 425.21 | loss  4.30 | ppl    73.508
| epoch  86 step    37350 |    290 batches | lr 0.000278 | ms/batch 424.89 | loss  4.33 | ppl    76.138
| epoch  86 step    37400 |    340 batches | lr 0.000278 | ms/batch 426.01 | loss  4.18 | ppl    65.538
| epoch  86 step    37450 |    390 batches | lr 0.000278 | ms/batch 424.48 | loss  4.29 | ppl    72.891
| epoch  87 step    37500 |      4 batches | lr 0.000278 | ms/batch 417.11 | loss  4.30 | ppl    73.894
| epoch  87 step    37550 |     54 batches | lr 0.000278 | ms/batch 424.60 | loss  4.22 | ppl    67.861
| epoch  87 step    37600 |    104 batches | lr 0.000278 | ms/batch 424.99 | loss  4.25 | ppl    69.962
----------------------------------------------------------------------------------------------------
| Eval  94 at step    37600 | time: 175.94s | valid loss  4.29 | valid ppl    73.035
----------------------------------------------------------------------------------------------------
| epoch  87 step    37650 |    154 batches | lr 0.000277 | ms/batch 551.40 | loss  4.26 | ppl    70.947
| epoch  87 step    37700 |    204 batches | lr 0.000277 | ms/batch 424.91 | loss  4.29 | ppl    73.041
| epoch  87 step    37750 |    254 batches | lr 0.000277 | ms/batch 424.23 | loss  4.29 | ppl    72.926
| epoch  87 step    37800 |    304 batches | lr 0.000277 | ms/batch 425.50 | loss  4.31 | ppl    74.167
| epoch  87 step    37850 |    354 batches | lr 0.000277 | ms/batch 424.86 | loss  4.20 | ppl    66.827
| epoch  87 step    37900 |    404 batches | lr 0.000276 | ms/batch 425.26 | loss  4.27 | ppl    71.282
| epoch  88 step    37950 |     18 batches | lr 0.000276 | ms/batch 416.92 | loss  4.30 | ppl    73.747
| epoch  88 step    38000 |     68 batches | lr 0.000276 | ms/batch 423.86 | loss  4.20 | ppl    66.383
----------------------------------------------------------------------------------------------------
| Eval  95 at step    38000 | time: 175.84s | valid loss  4.29 | valid ppl    72.848
----------------------------------------------------------------------------------------------------
| epoch  88 step    38050 |    118 batches | lr 0.000276 | ms/batch 550.12 | loss  4.24 | ppl    69.650
| epoch  88 step    38100 |    168 batches | lr 0.000276 | ms/batch 425.93 | loss  4.26 | ppl    70.843
| epoch  88 step    38150 |    218 batches | lr 0.000276 | ms/batch 423.36 | loss  4.29 | ppl    72.665
| epoch  88 step    38200 |    268 batches | lr 0.000275 | ms/batch 423.81 | loss  4.27 | ppl    71.800
| epoch  88 step    38250 |    318 batches | lr 0.000275 | ms/batch 425.15 | loss  4.24 | ppl    69.712
| epoch  88 step    38300 |    368 batches | lr 0.000275 | ms/batch 425.88 | loss  4.21 | ppl    67.572
| epoch  88 step    38350 |    418 batches | lr 0.000275 | ms/batch 425.33 | loss  4.26 | ppl    70.968
| epoch  89 step    38400 |     32 batches | lr 0.000275 | ms/batch 417.76 | loss  4.29 | ppl    73.128
----------------------------------------------------------------------------------------------------
| Eval  96 at step    38400 | time: 175.87s | valid loss  4.29 | valid ppl    72.775
----------------------------------------------------------------------------------------------------
| epoch  89 step    38450 |     82 batches | lr 0.000274 | ms/batch 549.89 | loss  4.19 | ppl    66.042
| epoch  89 step    38500 |    132 batches | lr 0.000274 | ms/batch 423.68 | loss  4.25 | ppl    70.209
| epoch  89 step    38550 |    182 batches | lr 0.000274 | ms/batch 423.54 | loss  4.25 | ppl    69.904
| epoch  89 step    38600 |    232 batches | lr 0.000274 | ms/batch 423.88 | loss  4.25 | ppl    70.120
| epoch  89 step    38650 |    282 batches | lr 0.000274 | ms/batch 425.07 | loss  4.29 | ppl    73.004
| epoch  89 step    38700 |    332 batches | lr 0.000274 | ms/batch 423.95 | loss  4.21 | ppl    67.253
| epoch  89 step    38750 |    382 batches | lr 0.000273 | ms/batch 424.84 | loss  4.25 | ppl    69.946
| epoch  89 step    38800 |    432 batches | lr 0.000273 | ms/batch 425.00 | loss  4.27 | ppl    71.559
----------------------------------------------------------------------------------------------------
| Eval  97 at step    38800 | time: 175.98s | valid loss  4.29 | valid ppl    73.097
----------------------------------------------------------------------------------------------------
| epoch  90 step    38850 |     46 batches | lr 0.000273 | ms/batch 541.99 | loss  4.25 | ppl    70.208
| epoch  90 step    38900 |     96 batches | lr 0.000273 | ms/batch 424.67 | loss  4.20 | ppl    66.811
| epoch  90 step    38950 |    146 batches | lr 0.000273 | ms/batch 423.92 | loss  4.25 | ppl    70.339
| epoch  90 step    39000 |    196 batches | lr 0.000272 | ms/batch 423.89 | loss  4.28 | ppl    72.094
| epoch  90 step    39050 |    246 batches | lr 0.000272 | ms/batch 424.03 | loss  4.29 | ppl    72.998
| epoch  90 step    39100 |    296 batches | lr 0.000272 | ms/batch 423.46 | loss  4.29 | ppl    72.950
| epoch  90 step    39150 |    346 batches | lr 0.000272 | ms/batch 423.39 | loss  4.17 | ppl    64.590
| epoch  90 step    39200 |    396 batches | lr 0.000272 | ms/batch 423.35 | loss  4.26 | ppl    71.148
----------------------------------------------------------------------------------------------------
| Eval  98 at step    39200 | time: 175.46s | valid loss  4.28 | valid ppl    72.351
----------------------------------------------------------------------------------------------------
| epoch  91 step    39250 |     10 batches | lr 0.000272 | ms/batch 583.18 | loss  4.29 | ppl    72.782
| epoch  91 step    39300 |     60 batches | lr 0.000271 | ms/batch 424.40 | loss  4.21 | ppl    67.249
| epoch  91 step    39350 |    110 batches | lr 0.000271 | ms/batch 422.98 | loss  4.23 | ppl    68.642
| epoch  91 step    39400 |    160 batches | lr 0.000271 | ms/batch 425.50 | loss  4.21 | ppl    67.634
| epoch  91 step    39450 |    210 batches | lr 0.000271 | ms/batch 427.46 | loss  4.28 | ppl    72.014
| epoch  91 step    39500 |    260 batches | lr 0.000271 | ms/batch 426.54 | loss  4.27 | ppl    71.222
| epoch  91 step    39550 |    310 batches | lr 0.00027 | ms/batch 427.25 | loss  4.26 | ppl    70.783
| epoch  91 step    39600 |    360 batches | lr 0.00027 | ms/batch 427.26 | loss  4.20 | ppl    67.021
----------------------------------------------------------------------------------------------------
| Eval  99 at step    39600 | time: 176.21s | valid loss  4.28 | valid ppl    71.933
----------------------------------------------------------------------------------------------------
| epoch  91 step    39650 |    410 batches | lr 0.00027 | ms/batch 582.34 | loss  4.24 | ppl    69.234
| epoch  92 step    39700 |     24 batches | lr 0.00027 | ms/batch 417.25 | loss  4.28 | ppl    72.293
| epoch  92 step    39750 |     74 batches | lr 0.00027 | ms/batch 424.52 | loss  4.19 | ppl    65.998
| epoch  92 step    39800 |    124 batches | lr 0.00027 | ms/batch 424.52 | loss  4.21 | ppl    67.351
| epoch  92 step    39850 |    174 batches | lr 0.000269 | ms/batch 424.82 | loss  4.25 | ppl    69.769
| epoch  92 step    39900 |    224 batches | lr 0.000269 | ms/batch 425.56 | loss  4.28 | ppl    72.510
| epoch  92 step    39950 |    274 batches | lr 0.000269 | ms/batch 426.49 | loss  4.28 | ppl    72.131
| epoch  92 step    40000 |    324 batches | lr 0.000269 | ms/batch 424.23 | loss  4.22 | ppl    67.961
----------------------------------------------------------------------------------------------------
| Eval 100 at step    40000 | time: 175.99s | valid loss  4.28 | valid ppl    72.580
----------------------------------------------------------------------------------------------------
| epoch  92 step    40050 |    374 batches | lr 0.000269 | ms/batch 550.77 | loss  4.23 | ppl    68.780
| epoch  92 step    40100 |    424 batches | lr 0.000268 | ms/batch 425.20 | loss  4.24 | ppl    69.168
| epoch  93 step    40150 |     38 batches | lr 0.000268 | ms/batch 417.93 | loss  4.26 | ppl    70.989
| epoch  93 step    40200 |     88 batches | lr 0.000268 | ms/batch 423.91 | loss  4.20 | ppl    66.524
| epoch  93 step    40250 |    138 batches | lr 0.000268 | ms/batch 425.81 | loss  4.25 | ppl    69.894
| epoch  93 step    40300 |    188 batches | lr 0.000268 | ms/batch 423.38 | loss  4.24 | ppl    69.247
| epoch  93 step    40350 |    238 batches | lr 0.000267 | ms/batch 423.87 | loss  4.28 | ppl    72.071
| epoch  93 step    40400 |    288 batches | lr 0.000267 | ms/batch 423.97 | loss  4.29 | ppl    73.267
----------------------------------------------------------------------------------------------------
| Eval 101 at step    40400 | time: 175.77s | valid loss  4.27 | valid ppl    71.779
----------------------------------------------------------------------------------------------------
| epoch  93 step    40450 |    338 batches | lr 0.000267 | ms/batch 583.11 | loss  4.15 | ppl    63.334
| epoch  93 step    40500 |    388 batches | lr 0.000267 | ms/batch 423.10 | loss  4.23 | ppl    68.537
| epoch  94 step    40550 |      2 batches | lr 0.000267 | ms/batch 419.19 | loss  4.28 | ppl    72.230
| epoch  94 step    40600 |     52 batches | lr 0.000267 | ms/batch 426.46 | loss  4.20 | ppl    66.872
| epoch  94 step    40650 |    102 batches | lr 0.000266 | ms/batch 427.39 | loss  4.18 | ppl    65.527
| epoch  94 step    40700 |    152 batches | lr 0.000266 | ms/batch 425.63 | loss  4.22 | ppl    68.153
| epoch  94 step    40750 |    202 batches | lr 0.000266 | ms/batch 427.10 | loss  4.25 | ppl    70.166
| epoch  94 step    40800 |    252 batches | lr 0.000266 | ms/batch 426.21 | loss  4.26 | ppl    70.659
----------------------------------------------------------------------------------------------------
| Eval 102 at step    40800 | time: 176.45s | valid loss  4.28 | valid ppl    72.570
----------------------------------------------------------------------------------------------------
| epoch  94 step    40850 |    302 batches | lr 0.000266 | ms/batch 552.46 | loss  4.30 | ppl    73.372
| epoch  94 step    40900 |    352 batches | lr 0.000265 | ms/batch 425.76 | loss  4.15 | ppl    63.578
| epoch  94 step    40950 |    402 batches | lr 0.000265 | ms/batch 424.85 | loss  4.23 | ppl    68.576
| epoch  95 step    41000 |     16 batches | lr 0.000265 | ms/batch 418.63 | loss  4.28 | ppl    72.411
| epoch  95 step    41050 |     66 batches | lr 0.000265 | ms/batch 427.08 | loss  4.19 | ppl    65.843
| epoch  95 step    41100 |    116 batches | lr 0.000265 | ms/batch 426.81 | loss  4.22 | ppl    67.956
| epoch  95 step    41150 |    166 batches | lr 0.000264 | ms/batch 425.32 | loss  4.23 | ppl    68.526
| epoch  95 step    41200 |    216 batches | lr 0.000264 | ms/batch 425.21 | loss  4.25 | ppl    70.138
----------------------------------------------------------------------------------------------------
| Eval 103 at step    41200 | time: 176.30s | valid loss  4.28 | valid ppl    72.573
----------------------------------------------------------------------------------------------------
| epoch  95 step    41250 |    266 batches | lr 0.000264 | ms/batch 551.94 | loss  4.24 | ppl    69.387
| epoch  95 step    41300 |    316 batches | lr 0.000264 | ms/batch 425.03 | loss  4.22 | ppl    67.952
| epoch  95 step    41350 |    366 batches | lr 0.000264 | ms/batch 426.48 | loss  4.17 | ppl    64.926
| epoch  95 step    41400 |    416 batches | lr 0.000264 | ms/batch 426.27 | loss  4.21 | ppl    67.511
| epoch  96 step    41450 |     30 batches | lr 0.000263 | ms/batch 418.47 | loss  4.25 | ppl    70.011
| epoch  96 step    41500 |     80 batches | lr 0.000263 | ms/batch 425.04 | loss  4.16 | ppl    64.153
| epoch  96 step    41550 |    130 batches | lr 0.000263 | ms/batch 424.37 | loss  4.21 | ppl    67.166
| epoch  96 step    41600 |    180 batches | lr 0.000263 | ms/batch 424.56 | loss  4.22 | ppl    67.963
----------------------------------------------------------------------------------------------------
| Eval 104 at step    41600 | time: 176.08s | valid loss  4.26 | valid ppl    70.973
----------------------------------------------------------------------------------------------------
| epoch  96 step    41650 |    230 batches | lr 0.000263 | ms/batch 577.89 | loss  4.24 | ppl    69.296
| epoch  96 step    41700 |    280 batches | lr 0.000262 | ms/batch 424.38 | loss  4.25 | ppl    69.840
| epoch  96 step    41750 |    330 batches | lr 0.000262 | ms/batch 425.03 | loss  4.19 | ppl    66.094
| epoch  96 step    41800 |    380 batches | lr 0.000262 | ms/batch 424.69 | loss  4.20 | ppl    66.373
| epoch  96 step    41850 |    430 batches | lr 0.000262 | ms/batch 424.53 | loss  4.24 | ppl    69.420
| epoch  97 step    41900 |     44 batches | lr 0.000262 | ms/batch 417.98 | loss  4.20 | ppl    66.380
| epoch  97 step    41950 |     94 batches | lr 0.000261 | ms/batch 427.26 | loss  4.18 | ppl    65.137
| epoch  97 step    42000 |    144 batches | lr 0.000261 | ms/batch 426.70 | loss  4.21 | ppl    67.243
----------------------------------------------------------------------------------------------------
| Eval 105 at step    42000 | time: 176.07s | valid loss  4.27 | valid ppl    71.775
----------------------------------------------------------------------------------------------------
| epoch  97 step    42050 |    194 batches | lr 0.000261 | ms/batch 553.32 | loss  4.23 | ppl    68.821
| epoch  97 step    42100 |    244 batches | lr 0.000261 | ms/batch 425.72 | loss  4.26 | ppl    70.626
| epoch  97 step    42150 |    294 batches | lr 0.000261 | ms/batch 424.18 | loss  4.26 | ppl    70.966
| epoch  97 step    42200 |    344 batches | lr 0.00026 | ms/batch 423.45 | loss  4.13 | ppl    62.406
| epoch  97 step    42250 |    394 batches | lr 0.00026 | ms/batch 424.74 | loss  4.21 | ppl    67.323
| epoch  98 step    42300 |      8 batches | lr 0.00026 | ms/batch 416.73 | loss  4.26 | ppl    70.498
| epoch  98 step    42350 |     58 batches | lr 0.00026 | ms/batch 424.71 | loss  4.18 | ppl    65.567
| epoch  98 step    42400 |    108 batches | lr 0.00026 | ms/batch 427.08 | loss  4.20 | ppl    66.934
----------------------------------------------------------------------------------------------------
| Eval 106 at step    42400 | time: 176.00s | valid loss  4.27 | valid ppl    71.687
----------------------------------------------------------------------------------------------------
| epoch  98 step    42450 |    158 batches | lr 0.000259 | ms/batch 553.31 | loss  4.19 | ppl    66.193
| epoch  98 step    42500 |    208 batches | lr 0.000259 | ms/batch 427.13 | loss  4.21 | ppl    67.221
| epoch  98 step    42550 |    258 batches | lr 0.000259 | ms/batch 425.11 | loss  4.22 | ppl    68.292
| epoch  98 step    42600 |    308 batches | lr 0.000259 | ms/batch 425.06 | loss  4.24 | ppl    69.305
| epoch  98 step    42650 |    358 batches | lr 0.000259 | ms/batch 426.34 | loss  4.16 | ppl    63.831
| epoch  98 step    42700 |    408 batches | lr 0.000259 | ms/batch 427.30 | loss  4.22 | ppl    67.703
| epoch  99 step    42750 |     22 batches | lr 0.000258 | ms/batch 414.97 | loss  4.27 | ppl    71.840
| epoch  99 step    42800 |     72 batches | lr 0.000258 | ms/batch 426.24 | loss  4.15 | ppl    63.355
----------------------------------------------------------------------------------------------------
| Eval 107 at step    42800 | time: 176.28s | valid loss  4.28 | valid ppl    71.966
----------------------------------------------------------------------------------------------------
| epoch  99 step    42850 |    122 batches | lr 0.000258 | ms/batch 551.32 | loss  4.21 | ppl    67.343
| epoch  99 step    42900 |    172 batches | lr 0.000258 | ms/batch 426.34 | loss  4.21 | ppl    67.575
| epoch  99 step    42950 |    222 batches | lr 0.000258 | ms/batch 426.80 | loss  4.24 | ppl    69.321
| epoch  99 step    43000 |    272 batches | lr 0.000257 | ms/batch 423.56 | loss  4.24 | ppl    69.442
| epoch  99 step    43050 |    322 batches | lr 0.000257 | ms/batch 424.04 | loss  4.20 | ppl    66.377
| epoch  99 step    43100 |    372 batches | lr 0.000257 | ms/batch 426.13 | loss  4.19 | ppl    65.906
| epoch  99 step    43150 |    422 batches | lr 0.000257 | ms/batch 427.06 | loss  4.21 | ppl    67.672
| epoch 100 step    43200 |     36 batches | lr 0.000257 | ms/batch 418.49 | loss  4.23 | ppl    68.796
----------------------------------------------------------------------------------------------------
| Eval 108 at step    43200 | time: 176.19s | valid loss  4.27 | valid ppl    71.498
----------------------------------------------------------------------------------------------------
| epoch 100 step    43250 |     86 batches | lr 0.000256 | ms/batch 553.04 | loss  4.15 | ppl    63.358
| epoch 100 step    43300 |    136 batches | lr 0.000256 | ms/batch 425.72 | loss  4.21 | ppl    67.056
| epoch 100 step    43350 |    186 batches | lr 0.000256 | ms/batch 424.67 | loss  4.24 | ppl    69.333
| epoch 100 step    43400 |    236 batches | lr 0.000256 | ms/batch 424.68 | loss  4.22 | ppl    67.803
| epoch 100 step    43450 |    286 batches | lr 0.000256 | ms/batch 424.19 | loss  4.26 | ppl    70.570
| epoch 100 step    43500 |    336 batches | lr 0.000255 | ms/batch 425.29 | loss  4.12 | ppl    61.693
| epoch 100 step    43550 |    386 batches | lr 0.000255 | ms/batch 425.66 | loss  4.20 | ppl    66.757
| epoch 100 step    43600 |    436 batches | lr 0.000255 | ms/batch 418.57 | loss  4.23 | ppl    68.515
----------------------------------------------------------------------------------------------------
| Eval 109 at step    43600 | time: 176.06s | valid loss  4.26 | valid ppl    71.140
----------------------------------------------------------------------------------------------------
| epoch 101 step    43650 |     50 batches | lr 0.000255 | ms/batch 547.43 | loss  4.16 | ppl    63.761
| epoch 101 step    43700 |    100 batches | lr 0.000255 | ms/batch 427.80 | loss  4.18 | ppl    65.399
| epoch 101 step    43750 |    150 batches | lr 0.000254 | ms/batch 424.69 | loss  4.19 | ppl    66.338
| epoch 101 step    43800 |    200 batches | lr 0.000254 | ms/batch 423.98 | loss  4.22 | ppl    68.255
| epoch 101 step    43850 |    250 batches | lr 0.000254 | ms/batch 424.30 | loss  4.25 | ppl    69.858
| epoch 101 step    43900 |    300 batches | lr 0.000254 | ms/batch 427.05 | loss  4.24 | ppl    69.694
| epoch 101 step    43950 |    350 batches | lr 0.000254 | ms/batch 425.70 | loss  4.14 | ppl    62.750
| epoch 101 step    44000 |    400 batches | lr 0.000253 | ms/batch 427.78 | loss  4.22 | ppl    67.767
----------------------------------------------------------------------------------------------------
| Eval 110 at step    44000 | time: 176.49s | valid loss  4.28 | valid ppl    72.011
----------------------------------------------------------------------------------------------------
| epoch 102 step    44050 |     14 batches | lr 0.000253 | ms/batch 546.43 | loss  4.25 | ppl    70.003
| epoch 102 step    44100 |     64 batches | lr 0.000253 | ms/batch 427.66 | loss  4.15 | ppl    63.299
| epoch 102 step    44150 |    114 batches | lr 0.000253 | ms/batch 427.14 | loss  4.18 | ppl    65.291
| epoch 102 step    44200 |    164 batches | lr 0.000253 | ms/batch 426.44 | loss  4.19 | ppl    66.066
| epoch 102 step    44250 |    214 batches | lr 0.000252 | ms/batch 426.09 | loss  4.21 | ppl    67.435
| epoch 102 step    44300 |    264 batches | lr 0.000252 | ms/batch 425.91 | loss  4.19 | ppl    65.991
| epoch 102 step    44350 |    314 batches | lr 0.000252 | ms/batch 425.76 | loss  4.23 | ppl    68.602
| epoch 102 step    44400 |    364 batches | lr 0.000252 | ms/batch 425.66 | loss  4.14 | ppl    62.815
----------------------------------------------------------------------------------------------------
| Eval 111 at step    44400 | time: 176.52s | valid loss  4.25 | valid ppl    70.411
----------------------------------------------------------------------------------------------------
| epoch 102 step    44450 |    414 batches | lr 0.000252 | ms/batch 579.43 | loss  4.20 | ppl    66.968
| epoch 103 step    44500 |     28 batches | lr 0.000251 | ms/batch 417.36 | loss  4.21 | ppl    67.607
| epoch 103 step    44550 |     78 batches | lr 0.000251 | ms/batch 425.81 | loss  4.14 | ppl    62.869
| epoch 103 step    44600 |    128 batches | lr 0.000251 | ms/batch 425.70 | loss  4.17 | ppl    64.851
| epoch 103 step    44650 |    178 batches | lr 0.000251 | ms/batch 425.61 | loss  4.18 | ppl    65.460
| epoch 103 step    44700 |    228 batches | lr 0.000251 | ms/batch 425.61 | loss  4.21 | ppl    67.628
| epoch 103 step    44750 |    278 batches | lr 0.000251 | ms/batch 424.33 | loss  4.23 | ppl    68.477
| epoch 103 step    44800 |    328 batches | lr 0.00025 | ms/batch 424.11 | loss  4.15 | ppl    63.623
----------------------------------------------------------------------------------------------------
| Eval 112 at step    44800 | time: 176.00s | valid loss  4.27 | valid ppl    71.530
----------------------------------------------------------------------------------------------------
| epoch 103 step    44850 |    378 batches | lr 0.00025 | ms/batch 549.80 | loss  4.17 | ppl    64.739
| epoch 103 step    44900 |    428 batches | lr 0.00025 | ms/batch 424.18 | loss  4.21 | ppl    67.314
| epoch 104 step    44950 |     42 batches | lr 0.00025 | ms/batch 417.41 | loss  4.19 | ppl    65.953
| epoch 104 step    45000 |     92 batches | lr 0.00025 | ms/batch 425.78 | loss  4.15 | ppl    63.393
| epoch 104 step    45050 |    142 batches | lr 0.000249 | ms/batch 426.30 | loss  4.17 | ppl    64.873
| epoch 104 step    45100 |    192 batches | lr 0.000249 | ms/batch 426.06 | loss  4.20 | ppl    66.431
| epoch 104 step    45150 |    242 batches | lr 0.000249 | ms/batch 426.17 | loss  4.20 | ppl    66.438
| epoch 104 step    45200 |    292 batches | lr 0.000249 | ms/batch 424.54 | loss  4.24 | ppl    69.243
----------------------------------------------------------------------------------------------------
| Eval 113 at step    45200 | time: 176.01s | valid loss  4.26 | valid ppl    70.740
----------------------------------------------------------------------------------------------------
| epoch 104 step    45250 |    342 batches | lr 0.000249 | ms/batch 550.86 | loss  4.08 | ppl    59.022
| epoch 104 step    45300 |    392 batches | lr 0.000248 | ms/batch 423.95 | loss  4.20 | ppl    66.777
| epoch 105 step    45350 |      6 batches | lr 0.000248 | ms/batch 416.21 | loss  4.24 | ppl    69.264
| epoch 105 step    45400 |     56 batches | lr 0.000248 | ms/batch 424.77 | loss  4.18 | ppl    65.181
| epoch 105 step    45450 |    106 batches | lr 0.000248 | ms/batch 424.66 | loss  4.16 | ppl    64.361
| epoch 105 step    45500 |    156 batches | lr 0.000248 | ms/batch 423.47 | loss  4.19 | ppl    66.082
| epoch 105 step    45550 |    206 batches | lr 0.000247 | ms/batch 425.00 | loss  4.18 | ppl    65.377
| epoch 105 step    45600 |    256 batches | lr 0.000247 | ms/batch 425.63 | loss  4.20 | ppl    66.422
----------------------------------------------------------------------------------------------------
| Eval 114 at step    45600 | time: 175.73s | valid loss  4.28 | valid ppl    72.350
----------------------------------------------------------------------------------------------------
| epoch 105 step    45650 |    306 batches | lr 0.000247 | ms/batch 551.16 | loss  4.22 | ppl    67.778
| epoch 105 step    45700 |    356 batches | lr 0.000247 | ms/batch 425.13 | loss  4.11 | ppl    60.869
| epoch 105 step    45750 |    406 batches | lr 0.000247 | ms/batch 426.11 | loss  4.18 | ppl    65.389
| epoch 106 step    45800 |     20 batches | lr 0.000246 | ms/batch 416.56 | loss  4.22 | ppl    68.098
| epoch 106 step    45850 |     70 batches | lr 0.000246 | ms/batch 425.19 | loss  4.14 | ppl    62.617
| epoch 106 step    45900 |    120 batches | lr 0.000246 | ms/batch 425.24 | loss  4.18 | ppl    65.241
| epoch 106 step    45950 |    170 batches | lr 0.000246 | ms/batch 426.48 | loss  4.20 | ppl    66.595
| epoch 106 step    46000 |    220 batches | lr 0.000246 | ms/batch 424.43 | loss  4.20 | ppl    66.946
----------------------------------------------------------------------------------------------------
| Eval 115 at step    46000 | time: 176.04s | valid loss  4.26 | valid ppl    70.577
----------------------------------------------------------------------------------------------------
| epoch 106 step    46050 |    270 batches | lr 0.000245 | ms/batch 553.89 | loss  4.18 | ppl    65.506
| epoch 106 step    46100 |    320 batches | lr 0.000245 | ms/batch 424.01 | loss  4.17 | ppl    64.590
| epoch 106 step    46150 |    370 batches | lr 0.000245 | ms/batch 424.58 | loss  4.15 | ppl    63.392
| epoch 106 step    46200 |    420 batches | lr 0.000245 | ms/batch 424.50 | loss  4.17 | ppl    64.686
| epoch 107 step    46250 |     34 batches | lr 0.000245 | ms/batch 414.88 | loss  4.21 | ppl    67.655
| epoch 107 step    46300 |     84 batches | lr 0.000244 | ms/batch 424.95 | loss  4.12 | ppl    61.464
| epoch 107 step    46350 |    134 batches | lr 0.000244 | ms/batch 424.73 | loss  4.18 | ppl    65.267
| epoch 107 step    46400 |    184 batches | lr 0.000244 | ms/batch 424.13 | loss  4.18 | ppl    65.153
----------------------------------------------------------------------------------------------------
| Eval 116 at step    46400 | time: 175.74s | valid loss  4.26 | valid ppl    70.502
----------------------------------------------------------------------------------------------------
| epoch 107 step    46450 |    234 batches | lr 0.000244 | ms/batch 552.27 | loss  4.19 | ppl    66.158
| epoch 107 step    46500 |    284 batches | lr 0.000243 | ms/batch 424.04 | loss  4.21 | ppl    67.035
| epoch 107 step    46550 |    334 batches | lr 0.000243 | ms/batch 423.82 | loss  4.10 | ppl    60.375
| epoch 107 step    46600 |    384 batches | lr 0.000243 | ms/batch 423.66 | loss  4.18 | ppl    65.334
| epoch 107 step    46650 |    434 batches | lr 0.000243 | ms/batch 423.34 | loss  4.22 | ppl    68.279
| epoch 108 step    46700 |     48 batches | lr 0.000243 | ms/batch 418.09 | loss  4.15 | ppl    63.661
| epoch 108 step    46750 |     98 batches | lr 0.000242 | ms/batch 426.40 | loss  4.12 | ppl    61.861
| epoch 108 step    46800 |    148 batches | lr 0.000242 | ms/batch 425.02 | loss  4.17 | ppl    64.657
----------------------------------------------------------------------------------------------------
| Eval 117 at step    46800 | time: 175.86s | valid loss  4.25 | valid ppl    70.093
----------------------------------------------------------------------------------------------------
| epoch 108 step    46850 |    198 batches | lr 0.000242 | ms/batch 578.92 | loss  4.19 | ppl    66.087
| epoch 108 step    46900 |    248 batches | lr 0.000242 | ms/batch 425.95 | loss  4.18 | ppl    65.501
| epoch 108 step    46950 |    298 batches | lr 0.000242 | ms/batch 425.11 | loss  4.20 | ppl    66.953
| epoch 108 step    47000 |    348 batches | lr 0.000241 | ms/batch 424.02 | loss  4.10 | ppl    60.408
| epoch 108 step    47050 |    398 batches | lr 0.000241 | ms/batch 426.25 | loss  4.17 | ppl    64.571
| epoch 109 step    47100 |     12 batches | lr 0.000241 | ms/batch 417.75 | loss  4.20 | ppl    66.714
| epoch 109 step    47150 |     62 batches | lr 0.000241 | ms/batch 426.60 | loss  4.12 | ppl    61.261
| epoch 109 step    47200 |    112 batches | lr 0.000241 | ms/batch 425.34 | loss  4.13 | ppl    62.251
----------------------------------------------------------------------------------------------------
| Eval 118 at step    47200 | time: 176.11s | valid loss  4.26 | valid ppl    71.004
----------------------------------------------------------------------------------------------------
| epoch 109 step    47250 |    162 batches | lr 0.00024 | ms/batch 551.74 | loss  4.16 | ppl    64.376
| epoch 109 step    47300 |    212 batches | lr 0.00024 | ms/batch 423.16 | loss  4.17 | ppl    64.590
| epoch 109 step    47350 |    262 batches | lr 0.00024 | ms/batch 427.84 | loss  4.17 | ppl    64.883
| epoch 109 step    47400 |    312 batches | lr 0.00024 | ms/batch 425.55 | loss  4.18 | ppl    65.115
| epoch 109 step    47450 |    362 batches | lr 0.00024 | ms/batch 423.32 | loss  4.14 | ppl    62.524
| epoch 109 step    47500 |    412 batches | lr 0.000239 | ms/batch 428.60 | loss  4.14 | ppl    63.030
| epoch 110 step    47550 |     26 batches | lr 0.000239 | ms/batch 418.92 | loss  4.21 | ppl    67.670
| epoch 110 step    47600 |     76 batches | lr 0.000239 | ms/batch 427.84 | loss  4.10 | ppl    60.126
----------------------------------------------------------------------------------------------------
| Eval 119 at step    47600 | time: 176.36s | valid loss  4.27 | valid ppl    71.315
----------------------------------------------------------------------------------------------------
| epoch 110 step    47650 |    126 batches | lr 0.000239 | ms/batch 552.88 | loss  4.15 | ppl    63.687
| epoch 110 step    47700 |    176 batches | lr 0.000239 | ms/batch 424.21 | loss  4.17 | ppl    64.662
| epoch 110 step    47750 |    226 batches | lr 0.000238 | ms/batch 426.28 | loss  4.17 | ppl    64.659
| epoch 110 step    47800 |    276 batches | lr 0.000238 | ms/batch 425.54 | loss  4.19 | ppl    66.131
| epoch 110 step    47850 |    326 batches | lr 0.000238 | ms/batch 424.64 | loss  4.15 | ppl    63.228
| epoch 110 step    47900 |    376 batches | lr 0.000238 | ms/batch 425.81 | loss  4.14 | ppl    62.991
| epoch 110 step    47950 |    426 batches | lr 0.000238 | ms/batch 423.99 | loss  4.18 | ppl    65.268
| epoch 111 step    48000 |     40 batches | lr 0.000237 | ms/batch 416.90 | loss  4.16 | ppl    63.828
----------------------------------------------------------------------------------------------------
| Eval 120 at step    48000 | time: 175.99s | valid loss  4.25 | valid ppl    70.396
----------------------------------------------------------------------------------------------------
| epoch 111 step    48050 |     90 batches | lr 0.000237 | ms/batch 550.64 | loss  4.09 | ppl    60.003
| epoch 111 step    48100 |    140 batches | lr 0.000237 | ms/batch 424.61 | loss  4.17 | ppl    64.889
| epoch 111 step    48150 |    190 batches | lr 0.000237 | ms/batch 424.44 | loss  4.16 | ppl    64.304
| epoch 111 step    48200 |    240 batches | lr 0.000237 | ms/batch 424.55 | loss  4.18 | ppl    65.116
| epoch 111 step    48250 |    290 batches | lr 0.000236 | ms/batch 425.52 | loss  4.19 | ppl    66.039
| epoch 111 step    48300 |    340 batches | lr 0.000236 | ms/batch 425.49 | loss  4.06 | ppl    58.212
| epoch 111 step    48350 |    390 batches | lr 0.000236 | ms/batch 424.09 | loss  4.15 | ppl    63.456
| epoch 112 step    48400 |      4 batches | lr 0.000236 | ms/batch 416.44 | loss  4.19 | ppl    66.205
----------------------------------------------------------------------------------------------------
| Eval 121 at step    48400 | time: 175.80s | valid loss  4.25 | valid ppl    69.936
----------------------------------------------------------------------------------------------------
| epoch 112 step    48450 |     54 batches | lr 0.000236 | ms/batch 580.96 | loss  4.11 | ppl    60.834
| epoch 112 step    48500 |    104 batches | lr 0.000235 | ms/batch 425.40 | loss  4.11 | ppl    61.207
| epoch 112 step    48550 |    154 batches | lr 0.000235 | ms/batch 424.37 | loss  4.14 | ppl    62.835
| epoch 112 step    48600 |    204 batches | lr 0.000235 | ms/batch 427.55 | loss  4.17 | ppl    64.501
| epoch 112 step    48650 |    254 batches | lr 0.000235 | ms/batch 423.69 | loss  4.15 | ppl    63.224
| epoch 112 step    48700 |    304 batches | lr 0.000234 | ms/batch 424.61 | loss  4.18 | ppl    65.294
| epoch 112 step    48750 |    354 batches | lr 0.000234 | ms/batch 426.58 | loss  4.08 | ppl    59.122
| epoch 112 step    48800 |    404 batches | lr 0.000234 | ms/batch 426.15 | loss  4.15 | ppl    63.420
----------------------------------------------------------------------------------------------------
| Eval 122 at step    48800 | time: 176.50s | valid loss  4.25 | valid ppl    69.984
----------------------------------------------------------------------------------------------------
| epoch 113 step    48850 |     18 batches | lr 0.000234 | ms/batch 541.91 | loss  4.21 | ppl    67.590
| epoch 113 step    48900 |     68 batches | lr 0.000234 | ms/batch 425.41 | loss  4.12 | ppl    61.468
| epoch 113 step    48950 |    118 batches | lr 0.000233 | ms/batch 424.70 | loss  4.15 | ppl    63.591
| epoch 113 step    49000 |    168 batches | lr 0.000233 | ms/batch 425.60 | loss  4.15 | ppl    63.268
| epoch 113 step    49050 |    218 batches | lr 0.000233 | ms/batch 423.71 | loss  4.19 | ppl    66.024
| epoch 113 step    49100 |    268 batches | lr 0.000233 | ms/batch 424.99 | loss  4.17 | ppl    64.864
| epoch 113 step    49150 |    318 batches | lr 0.000233 | ms/batch 425.66 | loss  4.16 | ppl    63.866
| epoch 113 step    49200 |    368 batches | lr 0.000232 | ms/batch 424.58 | loss  4.11 | ppl    60.751
----------------------------------------------------------------------------------------------------
| Eval 123 at step    49200 | time: 175.82s | valid loss  4.24 | valid ppl    69.752
----------------------------------------------------------------------------------------------------
| epoch 113 step    49250 |    418 batches | lr 0.000232 | ms/batch 580.15 | loss  4.14 | ppl    62.921
| epoch 114 step    49300 |     32 batches | lr 0.000232 | ms/batch 416.53 | loss  4.16 | ppl    64.118
| epoch 114 step    49350 |     82 batches | lr 0.000232 | ms/batch 425.68 | loss  4.11 | ppl    61.150
| epoch 114 step    49400 |    132 batches | lr 0.000232 | ms/batch 425.14 | loss  4.16 | ppl    63.769
| epoch 114 step    49450 |    182 batches | lr 0.000231 | ms/batch 424.81 | loss  4.15 | ppl    63.642
| epoch 114 step    49500 |    232 batches | lr 0.000231 | ms/batch 424.31 | loss  4.16 | ppl    64.070
| epoch 114 step    49550 |    282 batches | lr 0.000231 | ms/batch 423.60 | loss  4.21 | ppl    67.278
| epoch 114 step    49600 |    332 batches | lr 0.000231 | ms/batch 425.14 | loss  4.11 | ppl    61.067
----------------------------------------------------------------------------------------------------
| Eval 124 at step    49600 | time: 175.82s | valid loss  4.27 | valid ppl    71.403
----------------------------------------------------------------------------------------------------
| epoch 114 step    49650 |    382 batches | lr 0.000231 | ms/batch 549.53 | loss  4.14 | ppl    62.580
| epoch 114 step    49700 |    432 batches | lr 0.00023 | ms/batch 423.87 | loss  4.17 | ppl    64.900
| epoch 115 step    49750 |     46 batches | lr 0.00023 | ms/batch 415.64 | loss  4.13 | ppl    62.172
| epoch 115 step    49800 |     96 batches | lr 0.00023 | ms/batch 424.94 | loss  4.11 | ppl    60.826
| epoch 115 step    49850 |    146 batches | lr 0.00023 | ms/batch 426.06 | loss  4.13 | ppl    62.338
| epoch 115 step    49900 |    196 batches | lr 0.000229 | ms/batch 424.57 | loss  4.18 | ppl    65.112
| epoch 115 step    49950 |    246 batches | lr 0.000229 | ms/batch 423.75 | loss  4.16 | ppl    63.884
| epoch 115 step    50000 |    296 batches | lr 0.000229 | ms/batch 423.80 | loss  4.20 | ppl    66.540
----------------------------------------------------------------------------------------------------
| Eval 125 at step    50000 | time: 175.59s | valid loss  4.24 | valid ppl    69.437
----------------------------------------------------------------------------------------------------
| epoch 115 step    50050 |    346 batches | lr 0.000229 | ms/batch 579.24 | loss  4.05 | ppl    57.583
| epoch 115 step    50100 |    396 batches | lr 0.000229 | ms/batch 427.00 | loss  4.14 | ppl    62.850
| epoch 116 step    50150 |     10 batches | lr 0.000228 | ms/batch 417.76 | loss  4.17 | ppl    64.548
| epoch 116 step    50200 |     60 batches | lr 0.000228 | ms/batch 425.36 | loss  4.09 | ppl    59.600
| epoch 116 step    50250 |    110 batches | lr 0.000228 | ms/batch 425.39 | loss  4.09 | ppl    60.033
| epoch 116 step    50300 |    160 batches | lr 0.000228 | ms/batch 424.09 | loss  4.12 | ppl    61.262
| epoch 116 step    50350 |    210 batches | lr 0.000228 | ms/batch 425.36 | loss  4.16 | ppl    64.301
| epoch 116 step    50400 |    260 batches | lr 0.000227 | ms/batch 425.29 | loss  4.16 | ppl    64.073
----------------------------------------------------------------------------------------------------
| Eval 126 at step    50400 | time: 176.14s | valid loss  4.26 | valid ppl    70.742
----------------------------------------------------------------------------------------------------
| epoch 116 step    50450 |    310 batches | lr 0.000227 | ms/batch 553.19 | loss  4.17 | ppl    64.556
| epoch 116 step    50500 |    360 batches | lr 0.000227 | ms/batch 427.33 | loss  4.07 | ppl    58.439
| epoch 116 step    50550 |    410 batches | lr 0.000227 | ms/batch 427.35 | loss  4.15 | ppl    63.135
| epoch 117 step    50600 |     24 batches | lr 0.000227 | ms/batch 418.52 | loss  4.19 | ppl    65.700
| epoch 117 step    50650 |     74 batches | lr 0.000226 | ms/batch 426.87 | loss  4.08 | ppl    58.954
| epoch 117 step    50700 |    124 batches | lr 0.000226 | ms/batch 426.26 | loss  4.15 | ppl    63.449
| epoch 117 step    50750 |    174 batches | lr 0.000226 | ms/batch 427.27 | loss  4.15 | ppl    63.417
| epoch 117 step    50800 |    224 batches | lr 0.000226 | ms/batch 425.52 | loss  4.18 | ppl    65.310
----------------------------------------------------------------------------------------------------
| Eval 127 at step    50800 | time: 176.61s | valid loss  4.25 | valid ppl    70.297
----------------------------------------------------------------------------------------------------
| epoch 117 step    50850 |    274 batches | lr 0.000226 | ms/batch 553.11 | loss  4.16 | ppl    63.918
| epoch 117 step    50900 |    324 batches | lr 0.000225 | ms/batch 424.90 | loss  4.11 | ppl    61.027
| epoch 117 step    50950 |    374 batches | lr 0.000225 | ms/batch 427.05 | loss  4.13 | ppl    62.271
| epoch 117 step    51000 |    424 batches | lr 0.000225 | ms/batch 424.67 | loss  4.15 | ppl    63.302
| epoch 118 step    51050 |     38 batches | lr 0.000225 | ms/batch 417.30 | loss  4.13 | ppl    62.149
| epoch 118 step    51100 |     88 batches | lr 0.000224 | ms/batch 424.47 | loss  4.09 | ppl    59.863
| epoch 118 step    51150 |    138 batches | lr 0.000224 | ms/batch 424.65 | loss  4.13 | ppl    62.469
| epoch 118 step    51200 |    188 batches | lr 0.000224 | ms/batch 424.37 | loss  4.14 | ppl    62.918
----------------------------------------------------------------------------------------------------
| Eval 128 at step    51200 | time: 176.01s | valid loss  4.25 | valid ppl    70.134
----------------------------------------------------------------------------------------------------
| epoch 118 step    51250 |    238 batches | lr 0.000224 | ms/batch 551.51 | loss  4.15 | ppl    63.331
| epoch 118 step    51300 |    288 batches | lr 0.000224 | ms/batch 424.56 | loss  4.19 | ppl    66.195
| epoch 118 step    51350 |    338 batches | lr 0.000223 | ms/batch 426.01 | loss  4.04 | ppl    56.764
| epoch 118 step    51400 |    388 batches | lr 0.000223 | ms/batch 424.50 | loss  4.16 | ppl    64.092
| epoch 119 step    51450 |      2 batches | lr 0.000223 | ms/batch 416.32 | loss  4.17 | ppl    64.931
| epoch 119 step    51500 |     52 batches | lr 0.000223 | ms/batch 424.89 | loss  4.10 | ppl    60.602
| epoch 119 step    51550 |    102 batches | lr 0.000223 | ms/batch 425.51 | loss  4.07 | ppl    58.464
| epoch 119 step    51600 |    152 batches | lr 0.000222 | ms/batch 425.93 | loss  4.14 | ppl    63.064
----------------------------------------------------------------------------------------------------
| Eval 129 at step    51600 | time: 175.99s | valid loss  4.25 | valid ppl    70.186
----------------------------------------------------------------------------------------------------
| epoch 119 step    51650 |    202 batches | lr 0.000222 | ms/batch 554.23 | loss  4.14 | ppl    62.588
| epoch 119 step    51700 |    252 batches | lr 0.000222 | ms/batch 426.47 | loss  4.16 | ppl    64.221
| epoch 119 step    51750 |    302 batches | lr 0.000222 | ms/batch 425.78 | loss  4.17 | ppl    64.436
| epoch 119 step    51800 |    352 batches | lr 0.000221 | ms/batch 424.20 | loss  4.06 | ppl    57.779
| epoch 119 step    51850 |    402 batches | lr 0.000221 | ms/batch 424.62 | loss  4.13 | ppl    62.047
| epoch 120 step    51900 |     16 batches | lr 0.000221 | ms/batch 417.33 | loss  4.17 | ppl    64.748
| epoch 120 step    51950 |     66 batches | lr 0.000221 | ms/batch 426.99 | loss  4.09 | ppl    59.849
| epoch 120 step    52000 |    116 batches | lr 0.000221 | ms/batch 425.60 | loss  4.12 | ppl    61.511
----------------------------------------------------------------------------------------------------
| Eval 130 at step    52000 | time: 176.22s | valid loss  4.25 | valid ppl    69.806
----------------------------------------------------------------------------------------------------
| epoch 120 step    52050 |    166 batches | lr 0.00022 | ms/batch 550.41 | loss  4.11 | ppl    61.094
| epoch 120 step    52100 |    216 batches | lr 0.00022 | ms/batch 424.44 | loss  4.15 | ppl    63.387
| epoch 120 step    52150 |    266 batches | lr 0.00022 | ms/batch 424.01 | loss  4.17 | ppl    64.454
| epoch 120 step    52200 |    316 batches | lr 0.00022 | ms/batch 424.57 | loss  4.12 | ppl    61.841
| epoch 120 step    52250 |    366 batches | lr 0.00022 | ms/batch 425.59 | loss  4.07 | ppl    58.525
| epoch 120 step    52300 |    416 batches | lr 0.000219 | ms/batch 424.06 | loss  4.14 | ppl    62.544
| epoch 121 step    52350 |     30 batches | lr 0.000219 | ms/batch 416.84 | loss  4.15 | ppl    63.516
| epoch 121 step    52400 |     80 batches | lr 0.000219 | ms/batch 424.46 | loss  4.10 | ppl    60.404
----------------------------------------------------------------------------------------------------
| Eval 131 at step    52400 | time: 175.73s | valid loss  4.25 | valid ppl    70.251
----------------------------------------------------------------------------------------------------
| epoch 121 step    52450 |    130 batches | lr 0.000219 | ms/batch 552.76 | loss  4.13 | ppl    62.242
| epoch 121 step    52500 |    180 batches | lr 0.000219 | ms/batch 425.83 | loss  4.13 | ppl    62.327
| epoch 121 step    52550 |    230 batches | lr 0.000218 | ms/batch 424.11 | loss  4.14 | ppl    62.833
| epoch 121 step    52600 |    280 batches | lr 0.000218 | ms/batch 424.08 | loss  4.15 | ppl    63.137
| epoch 121 step    52650 |    330 batches | lr 0.000218 | ms/batch 424.53 | loss  4.09 | ppl    59.930
| epoch 121 step    52700 |    380 batches | lr 0.000218 | ms/batch 425.93 | loss  4.07 | ppl    58.832
| epoch 121 step    52750 |    430 batches | lr 0.000217 | ms/batch 424.02 | loss  4.13 | ppl    62.368
| epoch 122 step    52800 |     44 batches | lr 0.000217 | ms/batch 416.05 | loss  4.11 | ppl    61.252
----------------------------------------------------------------------------------------------------
| Eval 132 at step    52800 | time: 175.86s | valid loss  4.25 | valid ppl    69.805
----------------------------------------------------------------------------------------------------
| epoch 122 step    52850 |     94 batches | lr 0.000217 | ms/batch 551.35 | loss  4.06 | ppl    58.022
| epoch 122 step    52900 |    144 batches | lr 0.000217 | ms/batch 427.63 | loss  4.12 | ppl    61.589
| epoch 122 step    52950 |    194 batches | lr 0.000217 | ms/batch 426.17 | loss  4.12 | ppl    61.279
| epoch 122 step    53000 |    244 batches | lr 0.000216 | ms/batch 426.47 | loss  4.15 | ppl    63.503
| epoch 122 step    53050 |    294 batches | lr 0.000216 | ms/batch 425.48 | loss  4.16 | ppl    64.239
| epoch 122 step    53100 |    344 batches | lr 0.000216 | ms/batch 423.90 | loss  4.02 | ppl    55.431
| epoch 122 step    53150 |    394 batches | lr 0.000216 | ms/batch 423.66 | loss  4.11 | ppl    61.188
| epoch 123 step    53200 |      8 batches | lr 0.000216 | ms/batch 417.24 | loss  4.15 | ppl    63.243
----------------------------------------------------------------------------------------------------
| Eval 133 at step    53200 | time: 176.12s | valid loss  4.24 | valid ppl    69.445
----------------------------------------------------------------------------------------------------
| epoch 123 step    53250 |     58 batches | lr 0.000215 | ms/batch 551.27 | loss  4.07 | ppl    58.837
| epoch 123 step    53300 |    108 batches | lr 0.000215 | ms/batch 426.87 | loss  4.09 | ppl    59.494
| epoch 123 step    53350 |    158 batches | lr 0.000215 | ms/batch 424.05 | loss  4.12 | ppl    61.832
| epoch 123 step    53400 |    208 batches | lr 0.000215 | ms/batch 426.26 | loss  4.13 | ppl    62.348
| epoch 123 step    53450 |    258 batches | lr 0.000214 | ms/batch 424.00 | loss  4.15 | ppl    63.301
| epoch 123 step    53500 |    308 batches | lr 0.000214 | ms/batch 425.83 | loss  4.13 | ppl    62.316
| epoch 123 step    53550 |    358 batches | lr 0.000214 | ms/batch 423.56 | loss  4.06 | ppl    57.791
| epoch 123 step    53600 |    408 batches | lr 0.000214 | ms/batch 425.13 | loss  4.10 | ppl    60.359
----------------------------------------------------------------------------------------------------
| Eval 134 at step    53600 | time: 176.35s | valid loss  4.24 | valid ppl    69.486
----------------------------------------------------------------------------------------------------
| epoch 124 step    53650 |     22 batches | lr 0.000214 | ms/batch 542.33 | loss  4.15 | ppl    63.201
| epoch 124 step    53700 |     72 batches | lr 0.000213 | ms/batch 427.11 | loss  4.05 | ppl    57.548
| epoch 124 step    53750 |    122 batches | lr 0.000213 | ms/batch 424.96 | loss  4.10 | ppl    60.138
| epoch 124 step    53800 |    172 batches | lr 0.000213 | ms/batch 424.82 | loss  4.12 | ppl    61.536
| epoch 124 step    53850 |    222 batches | lr 0.000213 | ms/batch 425.08 | loss  4.14 | ppl    63.104
| epoch 124 step    53900 |    272 batches | lr 0.000213 | ms/batch 426.08 | loss  4.13 | ppl    62.292
| epoch 124 step    53950 |    322 batches | lr 0.000212 | ms/batch 426.82 | loss  4.12 | ppl    61.303
| epoch 124 step    54000 |    372 batches | lr 0.000212 | ms/batch 426.27 | loss  4.08 | ppl    59.294
----------------------------------------------------------------------------------------------------
| Eval 135 at step    54000 | time: 176.18s | valid loss  4.23 | valid ppl    68.745
----------------------------------------------------------------------------------------------------
| epoch 124 step    54050 |    422 batches | lr 0.000212 | ms/batch 608.91 | loss  4.08 | ppl    59.417
| epoch 125 step    54100 |     36 batches | lr 0.000212 | ms/batch 416.02 | loss  4.12 | ppl    61.335
| epoch 125 step    54150 |     86 batches | lr 0.000211 | ms/batch 426.64 | loss  4.03 | ppl    56.224
| epoch 125 step    54200 |    136 batches | lr 0.000211 | ms/batch 426.48 | loss  4.08 | ppl    59.110
| epoch 125 step    54250 |    186 batches | lr 0.000211 | ms/batch 423.73 | loss  4.09 | ppl    59.741
| epoch 125 step    54300 |    236 batches | lr 0.000211 | ms/batch 423.03 | loss  4.12 | ppl    61.812
| epoch 125 step    54350 |    286 batches | lr 0.000211 | ms/batch 425.54 | loss  4.15 | ppl    63.366
| epoch 125 step    54400 |    336 batches | lr 0.00021 | ms/batch 424.57 | loss  4.04 | ppl    56.700
----------------------------------------------------------------------------------------------------
| Eval 136 at step    54400 | time: 175.97s | valid loss  4.25 | valid ppl    69.990
----------------------------------------------------------------------------------------------------
| epoch 125 step    54450 |    386 batches | lr 0.00021 | ms/batch 552.29 | loss  4.09 | ppl    59.777
| epoch 125 step    54500 |    436 batches | lr 0.00021 | ms/batch 418.76 | loss  4.12 | ppl    61.473
| epoch 126 step    54550 |     50 batches | lr 0.00021 | ms/batch 422.83 | loss  4.06 | ppl    57.738
| epoch 126 step    54600 |    100 batches | lr 0.00021 | ms/batch 425.38 | loss  4.08 | ppl    59.048
| epoch 126 step    54650 |    150 batches | lr 0.000209 | ms/batch 425.94 | loss  4.11 | ppl    60.653
| epoch 126 step    54700 |    200 batches | lr 0.000209 | ms/batch 426.10 | loss  4.13 | ppl    62.355
| epoch 126 step    54750 |    250 batches | lr 0.000209 | ms/batch 425.79 | loss  4.14 | ppl    63.036
| epoch 126 step    54800 |    300 batches | lr 0.000209 | ms/batch 426.11 | loss  4.16 | ppl    63.981
----------------------------------------------------------------------------------------------------
| Eval 137 at step    54800 | time: 176.16s | valid loss  4.24 | valid ppl    69.703
----------------------------------------------------------------------------------------------------
| epoch 126 step    54850 |    350 batches | lr 0.000208 | ms/batch 551.98 | loss  4.04 | ppl    56.928
| epoch 126 step    54900 |    400 batches | lr 0.000208 | ms/batch 425.66 | loss  4.10 | ppl    60.321
| epoch 127 step    54950 |     14 batches | lr 0.000208 | ms/batch 416.64 | loss  4.13 | ppl    62.419
| epoch 127 step    55000 |     64 batches | lr 0.000208 | ms/batch 424.62 | loss  4.03 | ppl    56.484
| epoch 127 step    55050 |    114 batches | lr 0.000208 | ms/batch 424.63 | loss  4.06 | ppl    58.209
| epoch 127 step    55100 |    164 batches | lr 0.000207 | ms/batch 424.68 | loss  4.10 | ppl    60.479
| epoch 127 step    55150 |    214 batches | lr 0.000207 | ms/batch 423.77 | loss  4.12 | ppl    61.464
| epoch 127 step    55200 |    264 batches | lr 0.000207 | ms/batch 423.56 | loss  4.11 | ppl    61.226
----------------------------------------------------------------------------------------------------
| Eval 138 at step    55200 | time: 175.75s | valid loss  4.24 | valid ppl    69.643
----------------------------------------------------------------------------------------------------
| epoch 127 step    55250 |    314 batches | lr 0.000207 | ms/batch 551.19 | loss  4.11 | ppl    61.055
| epoch 127 step    55300 |    364 batches | lr 0.000206 | ms/batch 423.75 | loss  4.03 | ppl    56.495
| epoch 127 step    55350 |    414 batches | lr 0.000206 | ms/batch 426.47 | loss  4.09 | ppl    59.643
| epoch 128 step    55400 |     28 batches | lr 0.000206 | ms/batch 417.34 | loss  4.13 | ppl    62.038
| epoch 128 step    55450 |     78 batches | lr 0.000206 | ms/batch 425.45 | loss  4.06 | ppl    58.062
| epoch 128 step    55500 |    128 batches | lr 0.000206 | ms/batch 424.39 | loss  4.08 | ppl    59.302
| epoch 128 step    55550 |    178 batches | lr 0.000205 | ms/batch 426.16 | loss  4.12 | ppl    61.324
| epoch 128 step    55600 |    228 batches | lr 0.000205 | ms/batch 423.75 | loss  4.11 | ppl    60.644
----------------------------------------------------------------------------------------------------
| Eval 139 at step    55600 | time: 175.94s | valid loss  4.24 | valid ppl    69.318
----------------------------------------------------------------------------------------------------
| epoch 128 step    55650 |    278 batches | lr 0.000205 | ms/batch 550.13 | loss  4.11 | ppl    60.791
| epoch 128 step    55700 |    328 batches | lr 0.000205 | ms/batch 423.68 | loss  4.05 | ppl    57.448
| epoch 128 step    55750 |    378 batches | lr 0.000205 | ms/batch 423.88 | loss  4.07 | ppl    58.518
| epoch 128 step    55800 |    428 batches | lr 0.000204 | ms/batch 423.10 | loss  4.11 | ppl    60.811
| epoch 129 step    55850 |     42 batches | lr 0.000204 | ms/batch 417.44 | loss  4.08 | ppl    58.858
| epoch 129 step    55900 |     92 batches | lr 0.000204 | ms/batch 424.81 | loss  4.04 | ppl    56.905
| epoch 129 step    55950 |    142 batches | lr 0.000204 | ms/batch 423.84 | loss  4.08 | ppl    59.127
| epoch 129 step    56000 |    192 batches | lr 0.000203 | ms/batch 425.57 | loss  4.08 | ppl    59.330
----------------------------------------------------------------------------------------------------
| Eval 140 at step    56000 | time: 175.63s | valid loss  4.25 | valid ppl    69.863
----------------------------------------------------------------------------------------------------
| epoch 129 step    56050 |    242 batches | lr 0.000203 | ms/batch 550.73 | loss  4.11 | ppl    61.048
| epoch 129 step    56100 |    292 batches | lr 0.000203 | ms/batch 424.10 | loss  4.14 | ppl    62.621
| epoch 129 step    56150 |    342 batches | lr 0.000203 | ms/batch 424.26 | loss  4.00 | ppl    54.473
| epoch 129 step    56200 |    392 batches | lr 0.000203 | ms/batch 425.60 | loss  4.09 | ppl    59.875
| epoch 130 step    56250 |      6 batches | lr 0.000202 | ms/batch 419.15 | loss  4.10 | ppl    60.077
| epoch 130 step    56300 |     56 batches | lr 0.000202 | ms/batch 425.31 | loss  4.07 | ppl    58.399
| epoch 130 step    56350 |    106 batches | lr 0.000202 | ms/batch 425.05 | loss  4.05 | ppl    57.456
| epoch 130 step    56400 |    156 batches | lr 0.000202 | ms/batch 427.91 | loss  4.08 | ppl    59.186
----------------------------------------------------------------------------------------------------
| Eval 141 at step    56400 | time: 176.10s | valid loss  4.23 | valid ppl    68.534
----------------------------------------------------------------------------------------------------
| epoch 130 step    56450 |    206 batches | lr 0.000202 | ms/batch 580.92 | loss  4.09 | ppl    59.823
| epoch 130 step    56500 |    256 batches | lr 0.000201 | ms/batch 425.96 | loss  4.11 | ppl    61.118
| epoch 130 step    56550 |    306 batches | lr 0.000201 | ms/batch 425.73 | loss  4.14 | ppl    63.015
| epoch 130 step    56600 |    356 batches | lr 0.000201 | ms/batch 425.57 | loss  4.02 | ppl    55.751
| epoch 130 step    56650 |    406 batches | lr 0.000201 | ms/batch 425.31 | loss  4.11 | ppl    60.824
| epoch 131 step    56700 |     20 batches | lr 0.0002 | ms/batch 417.78 | loss  4.12 | ppl    61.518
| epoch 131 step    56750 |     70 batches | lr 0.0002 | ms/batch 425.91 | loss  4.05 | ppl    57.185
| epoch 131 step    56800 |    120 batches | lr 0.0002 | ms/batch 425.45 | loss  4.08 | ppl    59.359
----------------------------------------------------------------------------------------------------
| Eval 142 at step    56800 | time: 176.82s | valid loss  4.24 | valid ppl    69.203
----------------------------------------------------------------------------------------------------
| epoch 131 step    56850 |    170 batches | lr 0.0002 | ms/batch 562.92 | loss  4.06 | ppl    57.719
| epoch 131 step    56900 |    220 batches | lr 0.0002 | ms/batch 423.72 | loss  4.12 | ppl    61.537
| epoch 131 step    56950 |    270 batches | lr 0.000199 | ms/batch 425.26 | loss  4.10 | ppl    60.287
| epoch 131 step    57000 |    320 batches | lr 0.000199 | ms/batch 424.69 | loss  4.07 | ppl    58.757
| epoch 131 step    57050 |    370 batches | lr 0.000199 | ms/batch 425.60 | loss  4.05 | ppl    57.453
| epoch 131 step    57100 |    420 batches | lr 0.000199 | ms/batch 426.84 | loss  4.06 | ppl    57.921
| epoch 132 step    57150 |     34 batches | lr 0.000198 | ms/batch 416.33 | loss  4.11 | ppl    61.137
| epoch 132 step    57200 |     84 batches | lr 0.000198 | ms/batch 423.61 | loss  4.01 | ppl    55.208
----------------------------------------------------------------------------------------------------
| Eval 143 at step    57200 | time: 175.85s | valid loss  4.25 | valid ppl    70.243
----------------------------------------------------------------------------------------------------
| epoch 132 step    57250 |    134 batches | lr 0.000198 | ms/batch 549.59 | loss  4.06 | ppl    58.082
| epoch 132 step    57300 |    184 batches | lr 0.000198 | ms/batch 423.59 | loss  4.09 | ppl    59.675
| epoch 132 step    57350 |    234 batches | lr 0.000198 | ms/batch 423.64 | loss  4.09 | ppl    59.597
| epoch 132 step    57400 |    284 batches | lr 0.000197 | ms/batch 423.80 | loss  4.11 | ppl    61.105
| epoch 132 step    57450 |    334 batches | lr 0.000197 | ms/batch 424.31 | loss  4.04 | ppl    56.623
| epoch 132 step    57500 |    384 batches | lr 0.000197 | ms/batch 425.35 | loss  4.10 | ppl    60.147
| epoch 132 step    57550 |    434 batches | lr 0.000197 | ms/batch 423.26 | loss  4.11 | ppl    61.178
| epoch 133 step    57600 |     48 batches | lr 0.000196 | ms/batch 417.44 | loss  4.03 | ppl    56.424
----------------------------------------------------------------------------------------------------
| Eval 144 at step    57600 | time: 175.58s | valid loss  4.24 | valid ppl    69.582
----------------------------------------------------------------------------------------------------
| epoch 133 step    57650 |     98 batches | lr 0.000196 | ms/batch 551.66 | loss  4.00 | ppl    54.697
| epoch 133 step    57700 |    148 batches | lr 0.000196 | ms/batch 426.66 | loss  4.05 | ppl    57.645
| epoch 133 step    57750 |    198 batches | lr 0.000196 | ms/batch 423.77 | loss  4.07 | ppl    58.711
| epoch 133 step    57800 |    248 batches | lr 0.000196 | ms/batch 425.73 | loss  4.10 | ppl    60.535
| epoch 133 step    57850 |    298 batches | lr 0.000195 | ms/batch 422.92 | loss  4.15 | ppl    63.617
| epoch 133 step    57900 |    348 batches | lr 0.000195 | ms/batch 424.24 | loss  3.98 | ppl    53.281
| epoch 133 step    57950 |    398 batches | lr 0.000195 | ms/batch 423.37 | loss  4.08 | ppl    58.939
| epoch 134 step    58000 |     12 batches | lr 0.000195 | ms/batch 415.83 | loss  4.11 | ppl    61.023
----------------------------------------------------------------------------------------------------
| Eval 145 at step    58000 | time: 175.73s | valid loss  4.23 | valid ppl    68.820
----------------------------------------------------------------------------------------------------
| epoch 134 step    58050 |     62 batches | lr 0.000195 | ms/batch 550.94 | loss  4.03 | ppl    56.383
| epoch 134 step    58100 |    112 batches | lr 0.000194 | ms/batch 426.28 | loss  4.06 | ppl    58.223
| epoch 134 step    58150 |    162 batches | lr 0.000194 | ms/batch 424.23 | loss  4.09 | ppl    59.939
| epoch 134 step    58200 |    212 batches | lr 0.000194 | ms/batch 426.05 | loss  4.09 | ppl    59.505
| epoch 134 step    58250 |    262 batches | lr 0.000194 | ms/batch 426.17 | loss  4.07 | ppl    58.818
| epoch 134 step    58300 |    312 batches | lr 0.000193 | ms/batch 428.11 | loss  4.11 | ppl    60.671
| epoch 134 step    58350 |    362 batches | lr 0.000193 | ms/batch 426.41 | loss  4.01 | ppl    55.371
| epoch 134 step    58400 |    412 batches | lr 0.000193 | ms/batch 425.26 | loss  4.06 | ppl    58.098
----------------------------------------------------------------------------------------------------
| Eval 146 at step    58400 | time: 176.66s | valid loss  4.24 | valid ppl    69.120
----------------------------------------------------------------------------------------------------
| epoch 135 step    58450 |     26 batches | lr 0.000193 | ms/batch 542.05 | loss  4.10 | ppl    60.473
| epoch 135 step    58500 |     76 batches | lr 0.000193 | ms/batch 424.79 | loss  4.02 | ppl    55.494
| epoch 135 step    58550 |    126 batches | lr 0.000192 | ms/batch 424.39 | loss  4.08 | ppl    59.021
| epoch 135 step    58600 |    176 batches | lr 0.000192 | ms/batch 425.34 | loss  4.07 | ppl    58.349
| epoch 135 step    58650 |    226 batches | lr 0.000192 | ms/batch 426.62 | loss  4.10 | ppl    60.571
| epoch 135 step    58700 |    276 batches | lr 0.000192 | ms/batch 427.15 | loss  4.11 | ppl    61.090
| epoch 135 step    58750 |    326 batches | lr 0.000191 | ms/batch 425.58 | loss  4.04 | ppl    56.741
| epoch 135 step    58800 |    376 batches | lr 0.000191 | ms/batch 424.74 | loss  4.05 | ppl    57.644
----------------------------------------------------------------------------------------------------
| Eval 147 at step    58800 | time: 176.01s | valid loss  4.23 | valid ppl    68.550
----------------------------------------------------------------------------------------------------
| epoch 135 step    58850 |    426 batches | lr 0.000191 | ms/batch 549.83 | loss  4.09 | ppl    59.480
| epoch 136 step    58900 |     40 batches | lr 0.000191 | ms/batch 416.69 | loss  4.08 | ppl    59.188
| epoch 136 step    58950 |     90 batches | lr 0.000191 | ms/batch 424.71 | loss  4.00 | ppl    54.473
| epoch 136 step    59000 |    140 batches | lr 0.00019 | ms/batch 423.30 | loss  4.07 | ppl    58.626
| epoch 136 step    59050 |    190 batches | lr 0.00019 | ms/batch 423.81 | loss  4.08 | ppl    59.096
| epoch 136 step    59100 |    240 batches | lr 0.00019 | ms/batch 424.41 | loss  4.09 | ppl    60.037
| epoch 136 step    59150 |    290 batches | lr 0.00019 | ms/batch 426.15 | loss  4.15 | ppl    63.188
| epoch 136 step    59200 |    340 batches | lr 0.000189 | ms/batch 426.90 | loss  3.99 | ppl    53.869
----------------------------------------------------------------------------------------------------
| Eval 148 at step    59200 | time: 175.82s | valid loss  4.24 | valid ppl    69.484
----------------------------------------------------------------------------------------------------
| epoch 136 step    59250 |    390 batches | lr 0.000189 | ms/batch 553.34 | loss  4.08 | ppl    58.998
| epoch 137 step    59300 |      4 batches | lr 0.000189 | ms/batch 419.25 | loss  4.09 | ppl    59.905
| epoch 137 step    59350 |     54 batches | lr 0.000189 | ms/batch 425.83 | loss  4.03 | ppl    56.162
| epoch 137 step    59400 |    104 batches | lr 0.000189 | ms/batch 425.77 | loss  4.00 | ppl    54.627
| epoch 137 step    59450 |    154 batches | lr 0.000188 | ms/batch 425.88 | loss  4.08 | ppl    59.302
| epoch 137 step    59500 |    204 batches | lr 0.000188 | ms/batch 425.03 | loss  4.07 | ppl    58.634
| epoch 137 step    59550 |    254 batches | lr 0.000188 | ms/batch 426.91 | loss  4.10 | ppl    60.363
| epoch 137 step    59600 |    304 batches | lr 0.000188 | ms/batch 423.55 | loss  4.12 | ppl    61.625
----------------------------------------------------------------------------------------------------
| Eval 149 at step    59600 | time: 176.24s | valid loss  4.23 | valid ppl    68.410
----------------------------------------------------------------------------------------------------
| epoch 137 step    59650 |    354 batches | lr 0.000188 | ms/batch 578.29 | loss  3.99 | ppl    54.014
| epoch 137 step    59700 |    404 batches | lr 0.000187 | ms/batch 424.26 | loss  4.05 | ppl    57.290
| epoch 138 step    59750 |     18 batches | lr 0.000187 | ms/batch 414.97 | loss  4.07 | ppl    58.368
| epoch 138 step    59800 |     68 batches | lr 0.000187 | ms/batch 423.29 | loss  4.01 | ppl    55.170
| epoch 138 step    59850 |    118 batches | lr 0.000187 | ms/batch 425.19 | loss  4.07 | ppl    58.349
| epoch 138 step    59900 |    168 batches | lr 0.000186 | ms/batch 424.30 | loss  4.06 | ppl    58.105
| epoch 138 step    59950 |    218 batches | lr 0.000186 | ms/batch 425.77 | loss  4.09 | ppl    59.473
| epoch 138 step    60000 |    268 batches | lr 0.000186 | ms/batch 425.91 | loss  4.07 | ppl    58.646
----------------------------------------------------------------------------------------------------
| Eval 150 at step    60000 | time: 175.71s | valid loss  4.23 | valid ppl    69.061
----------------------------------------------------------------------------------------------------
| epoch 138 step    60050 |    318 batches | lr 0.000186 | ms/batch 552.80 | loss  4.05 | ppl    57.423
| epoch 138 step    60100 |    368 batches | lr 0.000186 | ms/batch 424.86 | loss  4.02 | ppl    55.771
| epoch 138 step    60150 |    418 batches | lr 0.000185 | ms/batch 424.16 | loss  4.04 | ppl    56.704
| epoch 139 step    60200 |     32 batches | lr 0.000185 | ms/batch 415.99 | loss  4.10 | ppl    60.173
| epoch 139 step    60250 |     82 batches | lr 0.000185 | ms/batch 426.73 | loss  4.01 | ppl    55.370
| epoch 139 step    60300 |    132 batches | lr 0.000185 | ms/batch 424.82 | loss  4.06 | ppl    57.890
| epoch 139 step    60350 |    182 batches | lr 0.000184 | ms/batch 424.37 | loss  4.07 | ppl    58.816
| epoch 139 step    60400 |    232 batches | lr 0.000184 | ms/batch 424.52 | loss  4.05 | ppl    57.549
----------------------------------------------------------------------------------------------------
| Eval 151 at step    60400 | time: 175.90s | valid loss  4.24 | valid ppl    69.204
----------------------------------------------------------------------------------------------------
| epoch 139 step    60450 |    282 batches | lr 0.000184 | ms/batch 550.31 | loss  4.10 | ppl    60.324
| epoch 139 step    60500 |    332 batches | lr 0.000184 | ms/batch 424.82 | loss  4.03 | ppl    56.033
| epoch 139 step    60550 |    382 batches | lr 0.000184 | ms/batch 423.64 | loss  4.05 | ppl    57.355
| epoch 139 step    60600 |    432 batches | lr 0.000183 | ms/batch 423.19 | loss  4.07 | ppl    58.634
| epoch 140 step    60650 |     46 batches | lr 0.000183 | ms/batch 415.32 | loss  4.06 | ppl    57.860
| epoch 140 step    60700 |     96 batches | lr 0.000183 | ms/batch 423.35 | loss  3.99 | ppl    54.268
| epoch 140 step    60750 |    146 batches | lr 0.000183 | ms/batch 424.58 | loss  4.05 | ppl    57.451
| epoch 140 step    60800 |    196 batches | lr 0.000182 | ms/batch 424.44 | loss  4.07 | ppl    58.270
----------------------------------------------------------------------------------------------------
| Eval 152 at step    60800 | time: 175.48s | valid loss  4.22 | valid ppl    68.297
----------------------------------------------------------------------------------------------------
| epoch 140 step    60850 |    246 batches | lr 0.000182 | ms/batch 579.22 | loss  4.09 | ppl    59.879
| epoch 140 step    60900 |    296 batches | lr 0.000182 | ms/batch 423.70 | loss  4.09 | ppl    59.875
| epoch 140 step    60950 |    346 batches | lr 0.000182 | ms/batch 424.28 | loss  3.97 | ppl    53.061
| epoch 140 step    61000 |    396 batches | lr 0.000182 | ms/batch 425.32 | loss  4.05 | ppl    57.219
| epoch 141 step    61050 |     10 batches | lr 0.000181 | ms/batch 417.95 | loss  4.09 | ppl    59.764
| epoch 141 step    61100 |     60 batches | lr 0.000181 | ms/batch 424.74 | loss  3.99 | ppl    54.283
| epoch 141 step    61150 |    110 batches | lr 0.000181 | ms/batch 426.48 | loss  4.01 | ppl    55.289
| epoch 141 step    61200 |    160 batches | lr 0.000181 | ms/batch 425.21 | loss  4.04 | ppl    56.889
----------------------------------------------------------------------------------------------------
| Eval 153 at step    61200 | time: 175.98s | valid loss  4.21 | valid ppl    67.434
----------------------------------------------------------------------------------------------------
| epoch 141 step    61250 |    210 batches | lr 0.00018 | ms/batch 581.90 | loss  4.03 | ppl    56.520
| epoch 141 step    61300 |    260 batches | lr 0.00018 | ms/batch 425.59 | loss  4.09 | ppl    59.879
| epoch 141 step    61350 |    310 batches | lr 0.00018 | ms/batch 426.04 | loss  4.06 | ppl    57.803
| epoch 141 step    61400 |    360 batches | lr 0.00018 | ms/batch 425.31 | loss  4.00 | ppl    54.774
| epoch 141 step    61450 |    410 batches | lr 0.00018 | ms/batch 424.33 | loss  4.05 | ppl    57.194
| epoch 142 step    61500 |     24 batches | lr 0.000179 | ms/batch 416.44 | loss  4.06 | ppl    58.139
| epoch 142 step    61550 |     74 batches | lr 0.000179 | ms/batch 424.95 | loss  4.00 | ppl    54.369
| epoch 142 step    61600 |    124 batches | lr 0.000179 | ms/batch 426.03 | loss  4.02 | ppl    55.633
----------------------------------------------------------------------------------------------------
| Eval 154 at step    61600 | time: 176.01s | valid loss  4.22 | valid ppl    68.082
----------------------------------------------------------------------------------------------------
| epoch 142 step    61650 |    174 batches | lr 0.000179 | ms/batch 550.99 | loss  4.05 | ppl    57.257
| epoch 142 step    61700 |    224 batches | lr 0.000179 | ms/batch 424.42 | loss  4.07 | ppl    58.713
| epoch 142 step    61750 |    274 batches | lr 0.000178 | ms/batch 426.43 | loss  4.08 | ppl    59.099
| epoch 142 step    61800 |    324 batches | lr 0.000178 | ms/batch 427.20 | loss  4.01 | ppl    54.906
| epoch 142 step    61850 |    374 batches | lr 0.000178 | ms/batch 427.11 | loss  4.00 | ppl    54.743
| epoch 142 step    61900 |    424 batches | lr 0.000178 | ms/batch 427.53 | loss  4.00 | ppl    54.768
| epoch 143 step    61950 |     38 batches | lr 0.000177 | ms/batch 417.77 | loss  4.01 | ppl    55.161
| epoch 143 step    62000 |     88 batches | lr 0.000177 | ms/batch 427.07 | loss  3.99 | ppl    54.032
----------------------------------------------------------------------------------------------------
| Eval 155 at step    62000 | time: 176.41s | valid loss  4.24 | valid ppl    69.258
----------------------------------------------------------------------------------------------------
| epoch 143 step    62050 |    138 batches | lr 0.000177 | ms/batch 551.44 | loss  4.03 | ppl    56.071
| epoch 143 step    62100 |    188 batches | lr 0.000177 | ms/batch 425.26 | loss  4.05 | ppl    57.222
| epoch 143 step    62150 |    238 batches | lr 0.000177 | ms/batch 424.60 | loss  4.05 | ppl    57.608
| epoch 143 step    62200 |    288 batches | lr 0.000176 | ms/batch 424.80 | loss  4.08 | ppl    59.270
| epoch 143 step    62250 |    338 batches | lr 0.000176 | ms/batch 425.54 | loss  3.96 | ppl    52.557
| epoch 143 step    62300 |    388 batches | lr 0.000176 | ms/batch 426.29 | loss  4.04 | ppl    57.075
| epoch 144 step    62350 |      2 batches | lr 0.000176 | ms/batch 416.65 | loss  4.05 | ppl    57.480
| epoch 144 step    62400 |     52 batches | lr 0.000175 | ms/batch 424.93 | loss  4.00 | ppl    54.863
----------------------------------------------------------------------------------------------------
| Eval 156 at step    62400 | time: 175.99s | valid loss  4.24 | valid ppl    69.599
----------------------------------------------------------------------------------------------------
| epoch 144 step    62450 |    102 batches | lr 0.000175 | ms/batch 550.84 | loss  3.98 | ppl    53.677
| epoch 144 step    62500 |    152 batches | lr 0.000175 | ms/batch 424.73 | loss  4.01 | ppl    54.951
| epoch 144 step    62550 |    202 batches | lr 0.000175 | ms/batch 423.97 | loss  4.05 | ppl    57.481
| epoch 144 step    62600 |    252 batches | lr 0.000175 | ms/batch 424.42 | loss  4.07 | ppl    58.440
| epoch 144 step    62650 |    302 batches | lr 0.000174 | ms/batch 423.38 | loss  4.09 | ppl    60.034
| epoch 144 step    62700 |    352 batches | lr 0.000174 | ms/batch 423.88 | loss  3.94 | ppl    51.625
| epoch 144 step    62750 |    402 batches | lr 0.000174 | ms/batch 425.97 | loss  4.03 | ppl    56.302
| epoch 145 step    62800 |     16 batches | lr 0.000174 | ms/batch 416.89 | loss  4.06 | ppl    57.781
----------------------------------------------------------------------------------------------------
| Eval 157 at step    62800 | time: 175.71s | valid loss  4.23 | valid ppl    68.724
----------------------------------------------------------------------------------------------------
| epoch 145 step    62850 |     66 batches | lr 0.000173 | ms/batch 552.80 | loss  3.99 | ppl    53.961
| epoch 145 step    62900 |    116 batches | lr 0.000173 | ms/batch 425.66 | loss  4.00 | ppl    54.781
| epoch 145 step    62950 |    166 batches | lr 0.000173 | ms/batch 424.44 | loss  4.04 | ppl    56.616
| epoch 145 step    63000 |    216 batches | lr 0.000173 | ms/batch 424.94 | loss  4.05 | ppl    57.275
| epoch 145 step    63050 |    266 batches | lr 0.000173 | ms/batch 425.45 | loss  4.07 | ppl    58.690
| epoch 145 step    63100 |    316 batches | lr 0.000172 | ms/batch 426.61 | loss  4.03 | ppl    56.486
| epoch 145 step    63150 |    366 batches | lr 0.000172 | ms/batch 426.22 | loss  3.99 | ppl    54.127
| epoch 145 step    63200 |    416 batches | lr 0.000172 | ms/batch 427.69 | loss  4.02 | ppl    55.497
----------------------------------------------------------------------------------------------------
| Eval 158 at step    63200 | time: 176.71s | valid loss  4.23 | valid ppl    68.718
----------------------------------------------------------------------------------------------------
| epoch 146 step    63250 |     30 batches | lr 0.000172 | ms/batch 545.32 | loss  4.04 | ppl    56.965
| epoch 146 step    63300 |     80 batches | lr 0.000171 | ms/batch 427.33 | loss  3.98 | ppl    53.386
| epoch 146 step    63350 |    130 batches | lr 0.000171 | ms/batch 425.12 | loss  4.01 | ppl    55.196
| epoch 146 step    63400 |    180 batches | lr 0.000171 | ms/batch 424.86 | loss  4.02 | ppl    55.826
| epoch 146 step    63450 |    230 batches | lr 0.000171 | ms/batch 424.00 | loss  4.07 | ppl    58.266
| epoch 146 step    63500 |    280 batches | lr 0.000171 | ms/batch 424.41 | loss  4.08 | ppl    58.954
| epoch 146 step    63550 |    330 batches | lr 0.00017 | ms/batch 424.07 | loss  3.97 | ppl    52.943
| epoch 146 step    63600 |    380 batches | lr 0.00017 | ms/batch 424.19 | loss  4.01 | ppl    55.251
----------------------------------------------------------------------------------------------------
| Eval 159 at step    63600 | time: 175.93s | valid loss  4.23 | valid ppl    68.751
----------------------------------------------------------------------------------------------------
| epoch 146 step    63650 |    430 batches | lr 0.00017 | ms/batch 550.52 | loss  4.03 | ppl    56.400
| epoch 147 step    63700 |     44 batches | lr 0.00017 | ms/batch 415.88 | loss  4.01 | ppl    55.008
| epoch 147 step    63750 |     94 batches | lr 0.00017 | ms/batch 425.26 | loss  3.97 | ppl    53.144
| epoch 147 step    63800 |    144 batches | lr 0.000169 | ms/batch 425.31 | loss  4.03 | ppl    56.159
| epoch 147 step    63850 |    194 batches | lr 0.000169 | ms/batch 426.81 | loss  4.02 | ppl    55.582
| epoch 147 step    63900 |    244 batches | lr 0.000169 | ms/batch 427.07 | loss  4.04 | ppl    56.763
| epoch 147 step    63950 |    294 batches | lr 0.000169 | ms/batch 424.55 | loss  4.08 | ppl    59.042
| epoch 147 step    64000 |    344 batches | lr 0.000168 | ms/batch 424.50 | loss  3.95 | ppl    52.137
----------------------------------------------------------------------------------------------------
| Eval 160 at step    64000 | time: 176.01s | valid loss  4.24 | valid ppl    69.385
----------------------------------------------------------------------------------------------------
| epoch 147 step    64050 |    394 batches | lr 0.000168 | ms/batch 550.81 | loss  4.03 | ppl    56.071
| epoch 148 step    64100 |      8 batches | lr 0.000168 | ms/batch 418.15 | loss  4.04 | ppl    57.086
| epoch 148 step    64150 |     58 batches | lr 0.000168 | ms/batch 427.18 | loss  3.96 | ppl    52.414
| epoch 148 step    64200 |    108 batches | lr 0.000168 | ms/batch 426.40 | loss  3.99 | ppl    53.873
| epoch 148 step    64250 |    158 batches | lr 0.000167 | ms/batch 426.34 | loss  4.00 | ppl    54.833
| epoch 148 step    64300 |    208 batches | lr 0.000167 | ms/batch 425.28 | loss  4.03 | ppl    56.207
| epoch 148 step    64350 |    258 batches | lr 0.000167 | ms/batch 424.49 | loss  4.05 | ppl    57.401
| epoch 148 step    64400 |    308 batches | lr 0.000167 | ms/batch 425.79 | loss  4.06 | ppl    58.154
----------------------------------------------------------------------------------------------------
| Eval 161 at step    64400 | time: 176.24s | valid loss  4.22 | valid ppl    67.904
----------------------------------------------------------------------------------------------------
| epoch 148 step    64450 |    358 batches | lr 0.000166 | ms/batch 553.32 | loss  3.97 | ppl    52.976
| epoch 148 step    64500 |    408 batches | lr 0.000166 | ms/batch 428.21 | loss  4.02 | ppl    55.427
| epoch 149 step    64550 |     22 batches | lr 0.000166 | ms/batch 418.42 | loss  4.06 | ppl    57.725
| epoch 149 step    64600 |     72 batches | lr 0.000166 | ms/batch 426.42 | loss  3.96 | ppl    52.602
| epoch 149 step    64650 |    122 batches | lr 0.000166 | ms/batch 425.06 | loss  4.00 | ppl    54.590
| epoch 149 step    64700 |    172 batches | lr 0.000165 | ms/batch 424.96 | loss  4.02 | ppl    55.641
| epoch 149 step    64750 |    222 batches | lr 0.000165 | ms/batch 424.62 | loss  4.04 | ppl    57.047
| epoch 149 step    64800 |    272 batches | lr 0.000165 | ms/batch 424.55 | loss  4.05 | ppl    57.322
----------------------------------------------------------------------------------------------------
| Eval 162 at step    64800 | time: 176.25s | valid loss  4.23 | valid ppl    68.694
----------------------------------------------------------------------------------------------------
| epoch 149 step    64850 |    322 batches | lr 0.000165 | ms/batch 550.58 | loss  4.01 | ppl    54.953
| epoch 149 step    64900 |    372 batches | lr 0.000164 | ms/batch 423.88 | loss  3.99 | ppl    53.811
| epoch 149 step    64950 |    422 batches | lr 0.000164 | ms/batch 423.99 | loss  4.00 | ppl    54.445
| epoch 150 step    65000 |     36 batches | lr 0.000164 | ms/batch 415.71 | loss  4.03 | ppl    56.441
| epoch 150 step    65050 |     86 batches | lr 0.000164 | ms/batch 425.21 | loss  3.95 | ppl    52.118
| epoch 150 step    65100 |    136 batches | lr 0.000164 | ms/batch 424.42 | loss  4.01 | ppl    54.973
| epoch 150 step    65150 |    186 batches | lr 0.000163 | ms/batch 426.55 | loss  4.00 | ppl    54.771
| epoch 150 step    65200 |    236 batches | lr 0.000163 | ms/batch 425.98 | loss  4.04 | ppl    57.033
----------------------------------------------------------------------------------------------------
| Eval 163 at step    65200 | time: 175.83s | valid loss  4.23 | valid ppl    68.817
----------------------------------------------------------------------------------------------------
| epoch 150 step    65250 |    286 batches | lr 0.000163 | ms/batch 550.21 | loss  4.08 | ppl    59.358
| epoch 150 step    65300 |    336 batches | lr 0.000163 | ms/batch 424.34 | loss  3.95 | ppl    52.062
| epoch 150 step    65350 |    386 batches | lr 0.000162 | ms/batch 427.19 | loss  4.01 | ppl    55.052
| epoch 150 step    65400 |    436 batches | lr 0.000162 | ms/batch 420.47 | loss  4.02 | ppl    55.906
| epoch 151 step    65450 |     50 batches | lr 0.000162 | ms/batch 422.17 | loss  3.99 | ppl    54.267
| epoch 151 step    65500 |    100 batches | lr 0.000162 | ms/batch 425.32 | loss  3.97 | ppl    52.998
| epoch 151 step    65550 |    150 batches | lr 0.000162 | ms/batch 423.62 | loss  4.02 | ppl    55.809
| epoch 151 step    65600 |    200 batches | lr 0.000161 | ms/batch 424.63 | loss  4.03 | ppl    56.526
----------------------------------------------------------------------------------------------------
| Eval 164 at step    65600 | time: 175.89s | valid loss  4.22 | valid ppl    68.110
----------------------------------------------------------------------------------------------------
| epoch 151 step    65650 |    250 batches | lr 0.000161 | ms/batch 550.79 | loss  4.07 | ppl    58.381
| epoch 151 step    65700 |    300 batches | lr 0.000161 | ms/batch 424.90 | loss  4.06 | ppl    57.800
| epoch 151 step    65750 |    350 batches | lr 0.000161 | ms/batch 424.57 | loss  3.92 | ppl    50.587
| epoch 151 step    65800 |    400 batches | lr 0.000161 | ms/batch 425.02 | loss  4.00 | ppl    54.471
| epoch 152 step    65850 |     14 batches | lr 0.00016 | ms/batch 415.51 | loss  4.03 | ppl    56.259
| epoch 152 step    65900 |     64 batches | lr 0.00016 | ms/batch 425.74 | loss  3.95 | ppl    51.885
| epoch 152 step    65950 |    114 batches | lr 0.00016 | ms/batch 423.81 | loss  4.02 | ppl    55.713
| epoch 152 step    66000 |    164 batches | lr 0.00016 | ms/batch 425.25 | loss  4.00 | ppl    54.518
----------------------------------------------------------------------------------------------------
| Eval 165 at step    66000 | time: 175.79s | valid loss  4.22 | valid ppl    67.706
----------------------------------------------------------------------------------------------------
| epoch 152 step    66050 |    214 batches | lr 0.000159 | ms/batch 551.66 | loss  4.02 | ppl    55.428
| epoch 152 step    66100 |    264 batches | lr 0.000159 | ms/batch 425.32 | loss  4.03 | ppl    56.371
| epoch 152 step    66150 |    314 batches | lr 0.000159 | ms/batch 426.64 | loss  4.02 | ppl    55.465
| epoch 152 step    66200 |    364 batches | lr 0.000159 | ms/batch 423.75 | loss  3.96 | ppl    52.242
| epoch 152 step    66250 |    414 batches | lr 0.000159 | ms/batch 424.03 | loss  3.99 | ppl    53.963
| epoch 153 step    66300 |     28 batches | lr 0.000158 | ms/batch 417.41 | loss  4.04 | ppl    56.610
| epoch 153 step    66350 |     78 batches | lr 0.000158 | ms/batch 424.86 | loss  3.95 | ppl    52.097
| epoch 153 step    66400 |    128 batches | lr 0.000158 | ms/batch 424.25 | loss  3.99 | ppl    54.111
----------------------------------------------------------------------------------------------------
| Eval 166 at step    66400 | time: 175.90s | valid loss  4.22 | valid ppl    67.931
----------------------------------------------------------------------------------------------------
| epoch 153 step    66450 |    178 batches | lr 0.000158 | ms/batch 551.10 | loss  4.01 | ppl    55.247
| epoch 153 step    66500 |    228 batches | lr 0.000157 | ms/batch 424.56 | loss  4.01 | ppl    55.085
| epoch 153 step    66550 |    278 batches | lr 0.000157 | ms/batch 423.75 | loss  4.04 | ppl    56.962
| epoch 153 step    66600 |    328 batches | lr 0.000157 | ms/batch 423.20 | loss  3.97 | ppl    52.843
| epoch 153 step    66650 |    378 batches | lr 0.000157 | ms/batch 425.10 | loss  3.99 | ppl    54.081
| epoch 153 step    66700 |    428 batches | lr 0.000157 | ms/batch 423.90 | loss  4.02 | ppl    55.862
| epoch 154 step    66750 |     42 batches | lr 0.000156 | ms/batch 414.96 | loss  4.00 | ppl    54.770
| epoch 154 step    66800 |     92 batches | lr 0.000156 | ms/batch 423.99 | loss  3.95 | ppl    51.712
----------------------------------------------------------------------------------------------------
| Eval 167 at step    66800 | time: 175.54s | valid loss  4.24 | valid ppl    69.339
----------------------------------------------------------------------------------------------------
| epoch 154 step    66850 |    142 batches | lr 0.000156 | ms/batch 549.58 | loss  4.00 | ppl    54.425
| epoch 154 step    66900 |    192 batches | lr 0.000156 | ms/batch 423.59 | loss  4.02 | ppl    55.561
| epoch 154 step    66950 |    242 batches | lr 0.000155 | ms/batch 422.99 | loss  4.03 | ppl    56.344
| epoch 154 step    67000 |    292 batches | lr 0.000155 | ms/batch 424.49 | loss  4.06 | ppl    57.879
| epoch 154 step    67050 |    342 batches | lr 0.000155 | ms/batch 425.39 | loss  3.92 | ppl    50.401
| epoch 154 step    67100 |    392 batches | lr 0.000155 | ms/batch 426.15 | loss  3.99 | ppl    54.200
| epoch 155 step    67150 |      6 batches | lr 0.000155 | ms/batch 416.58 | loss  4.03 | ppl    56.531
| epoch 155 step    67200 |     56 batches | lr 0.000154 | ms/batch 425.66 | loss  3.94 | ppl    51.548
----------------------------------------------------------------------------------------------------
| Eval 168 at step    67200 | time: 175.74s | valid loss  4.22 | valid ppl    68.363
----------------------------------------------------------------------------------------------------
| epoch 155 step    67250 |    106 batches | lr 0.000154 | ms/batch 551.61 | loss  4.00 | ppl    54.460
| epoch 155 step    67300 |    156 batches | lr 0.000154 | ms/batch 425.94 | loss  3.99 | ppl    54.254
| epoch 155 step    67350 |    206 batches | lr 0.000154 | ms/batch 425.44 | loss  4.02 | ppl    55.934
| epoch 155 step    67400 |    256 batches | lr 0.000154 | ms/batch 425.81 | loss  4.04 | ppl    57.006
| epoch 155 step    67450 |    306 batches | lr 0.000153 | ms/batch 424.11 | loss  4.05 | ppl    57.305
| epoch 155 step    67500 |    356 batches | lr 0.000153 | ms/batch 423.40 | loss  3.93 | ppl    50.678
| epoch 155 step    67550 |    406 batches | lr 0.000153 | ms/batch 423.29 | loss  3.99 | ppl    54.249
| epoch 156 step    67600 |     20 batches | lr 0.000153 | ms/batch 415.26 | loss  4.04 | ppl    56.787
----------------------------------------------------------------------------------------------------
| Eval 169 at step    67600 | time: 175.70s | valid loss  4.22 | valid ppl    68.298
----------------------------------------------------------------------------------------------------
| epoch 156 step    67650 |     70 batches | lr 0.000152 | ms/batch 549.99 | loss  3.94 | ppl    51.329
| epoch 156 step    67700 |    120 batches | lr 0.000152 | ms/batch 423.62 | loss  3.99 | ppl    54.248
| epoch 156 step    67750 |    170 batches | lr 0.000152 | ms/batch 424.82 | loss  4.01 | ppl    54.955
| epoch 156 step    67800 |    220 batches | lr 0.000152 | ms/batch 426.17 | loss  4.01 | ppl    54.883
| epoch 156 step    67850 |    270 batches | lr 0.000152 | ms/batch 426.60 | loss  4.02 | ppl    55.852
| epoch 156 step    67900 |    320 batches | lr 0.000151 | ms/batch 426.72 | loss  3.99 | ppl    53.854
| epoch 156 step    67950 |    370 batches | lr 0.000151 | ms/batch 426.61 | loss  3.95 | ppl    52.058
| epoch 156 step    68000 |    420 batches | lr 0.000151 | ms/batch 424.31 | loss  4.00 | ppl    54.767
----------------------------------------------------------------------------------------------------
| Eval 170 at step    68000 | time: 176.48s | valid loss  4.22 | valid ppl    68.336
----------------------------------------------------------------------------------------------------
| epoch 157 step    68050 |     34 batches | lr 0.000151 | ms/batch 542.15 | loss  4.02 | ppl    55.727
| epoch 157 step    68100 |     84 batches | lr 0.00015 | ms/batch 424.04 | loss  3.92 | ppl    50.466
| epoch 157 step    68150 |    134 batches | lr 0.00015 | ms/batch 424.33 | loss  4.00 | ppl    54.537
| epoch 157 step    68200 |    184 batches | lr 0.00015 | ms/batch 424.20 | loss  4.01 | ppl    54.942
| epoch 157 step    68250 |    234 batches | lr 0.00015 | ms/batch 424.95 | loss  4.01 | ppl    54.974
| epoch 157 step    68300 |    284 batches | lr 0.00015 | ms/batch 423.97 | loss  4.02 | ppl    55.561
| epoch 157 step    68350 |    334 batches | lr 0.000149 | ms/batch 424.84 | loss  3.96 | ppl    52.232
| epoch 157 step    68400 |    384 batches | lr 0.000149 | ms/batch 426.01 | loss  3.98 | ppl    53.341
----------------------------------------------------------------------------------------------------
| Eval 171 at step    68400 | time: 175.72s | valid loss  4.22 | valid ppl    67.978
----------------------------------------------------------------------------------------------------
| epoch 157 step    68450 |    434 batches | lr 0.000149 | ms/batch 550.06 | loss  4.00 | ppl    54.607
| epoch 158 step    68500 |     48 batches | lr 0.000149 | ms/batch 416.10 | loss  3.95 | ppl    51.932
| epoch 158 step    68550 |     98 batches | lr 0.000148 | ms/batch 424.96 | loss  3.93 | ppl    50.964
| epoch 158 step    68600 |    148 batches | lr 0.000148 | ms/batch 424.23 | loss  3.98 | ppl    53.414
| epoch 158 step    68650 |    198 batches | lr 0.000148 | ms/batch 426.15 | loss  4.01 | ppl    54.904
| epoch 158 step    68700 |    248 batches | lr 0.000148 | ms/batch 426.42 | loss  4.00 | ppl    54.386
| epoch 158 step    68750 |    298 batches | lr 0.000148 | ms/batch 425.05 | loss  4.04 | ppl    56.827
| epoch 158 step    68800 |    348 batches | lr 0.000147 | ms/batch 426.94 | loss  3.92 | ppl    50.592
----------------------------------------------------------------------------------------------------
| Eval 172 at step    68800 | time: 176.02s | valid loss  4.22 | valid ppl    68.207
----------------------------------------------------------------------------------------------------
| epoch 158 step    68850 |    398 batches | lr 0.000147 | ms/batch 553.27 | loss  3.99 | ppl    54.171
| epoch 159 step    68900 |     12 batches | lr 0.000147 | ms/batch 418.06 | loss  4.03 | ppl    56.145
| epoch 159 step    68950 |     62 batches | lr 0.000147 | ms/batch 426.46 | loss  3.92 | ppl    50.195
| epoch 159 step    69000 |    112 batches | lr 0.000147 | ms/batch 426.43 | loss  3.94 | ppl    51.530
| epoch 159 step    69050 |    162 batches | lr 0.000146 | ms/batch 426.89 | loss  3.97 | ppl    53.232
| epoch 159 step    69100 |    212 batches | lr 0.000146 | ms/batch 423.52 | loss  4.00 | ppl    54.380
| epoch 159 step    69150 |    262 batches | lr 0.000146 | ms/batch 425.97 | loss  3.99 | ppl    54.058
| epoch 159 step    69200 |    312 batches | lr 0.000146 | ms/batch 424.36 | loss  4.02 | ppl    55.656
----------------------------------------------------------------------------------------------------
| Eval 173 at step    69200 | time: 176.24s | valid loss  4.23 | valid ppl    68.397
----------------------------------------------------------------------------------------------------
| epoch 159 step    69250 |    362 batches | lr 0.000145 | ms/batch 549.90 | loss  3.94 | ppl    51.576
| epoch 159 step    69300 |    412 batches | lr 0.000145 | ms/batch 424.57 | loss  3.99 | ppl    54.177
| epoch 160 step    69350 |     26 batches | lr 0.000145 | ms/batch 414.94 | loss  4.01 | ppl    55.247
| epoch 160 step    69400 |     76 batches | lr 0.000145 | ms/batch 425.28 | loss  3.92 | ppl    50.459
| epoch 160 step    69450 |    126 batches | lr 0.000145 | ms/batch 426.37 | loss  3.98 | ppl    53.384
| epoch 160 step    69500 |    176 batches | lr 0.000144 | ms/batch 424.47 | loss  3.98 | ppl    53.529
| epoch 160 step    69550 |    226 batches | lr 0.000144 | ms/batch 426.68 | loss  4.01 | ppl    55.370
| epoch 160 step    69600 |    276 batches | lr 0.000144 | ms/batch 425.47 | loss  4.02 | ppl    55.561
----------------------------------------------------------------------------------------------------
| Eval 174 at step    69600 | time: 175.89s | valid loss  4.22 | valid ppl    68.065
----------------------------------------------------------------------------------------------------
| epoch 160 step    69650 |    326 batches | lr 0.000144 | ms/batch 551.52 | loss  3.95 | ppl    51.808
| epoch 160 step    69700 |    376 batches | lr 0.000144 | ms/batch 424.69 | loss  3.98 | ppl    53.321
| epoch 160 step    69750 |    426 batches | lr 0.000143 | ms/batch 425.29 | loss  3.98 | ppl    53.596
| epoch 161 step    69800 |     40 batches | lr 0.000143 | ms/batch 415.85 | loss  3.97 | ppl    53.071
| epoch 161 step    69850 |     90 batches | lr 0.000143 | ms/batch 425.68 | loss  3.94 | ppl    51.213
| epoch 161 step    69900 |    140 batches | lr 0.000143 | ms/batch 423.39 | loss  3.99 | ppl    54.057
| epoch 161 step    69950 |    190 batches | lr 0.000142 | ms/batch 423.62 | loss  4.01 | ppl    54.906
| epoch 161 step    70000 |    240 batches | lr 0.000142 | ms/batch 425.81 | loss  4.02 | ppl    55.756
----------------------------------------------------------------------------------------------------
| Eval 175 at step    70000 | time: 175.79s | valid loss  4.22 | valid ppl    68.320
----------------------------------------------------------------------------------------------------
| epoch 161 step    70050 |    290 batches | lr 0.000142 | ms/batch 551.17 | loss  4.04 | ppl    56.851
| epoch 161 step    70100 |    340 batches | lr 0.000142 | ms/batch 425.86 | loss  3.90 | ppl    49.373
| epoch 161 step    70150 |    390 batches | lr 0.000142 | ms/batch 425.34 | loss  4.00 | ppl    54.814
| epoch 162 step    70200 |      4 batches | lr 0.000141 | ms/batch 416.87 | loss  4.01 | ppl    54.959
| epoch 162 step    70250 |     54 batches | lr 0.000141 | ms/batch 425.99 | loss  3.93 | ppl    51.003
| epoch 162 step    70300 |    104 batches | lr 0.000141 | ms/batch 426.49 | loss  3.95 | ppl    51.777
| epoch 162 step    70350 |    154 batches | lr 0.000141 | ms/batch 426.72 | loss  3.99 | ppl    54.170
| epoch 162 step    70400 |    204 batches | lr 0.00014 | ms/batch 424.60 | loss  3.98 | ppl    53.386
----------------------------------------------------------------------------------------------------
| Eval 176 at step    70400 | time: 176.15s | valid loss  4.22 | valid ppl    67.733
----------------------------------------------------------------------------------------------------
| epoch 162 step    70450 |    254 batches | lr 0.00014 | ms/batch 550.67 | loss  4.00 | ppl    54.780
| epoch 162 step    70500 |    304 batches | lr 0.00014 | ms/batch 425.49 | loss  4.00 | ppl    54.800
| epoch 162 step    70550 |    354 batches | lr 0.00014 | ms/batch 425.17 | loss  3.90 | ppl    49.448
| epoch 162 step    70600 |    404 batches | lr 0.00014 | ms/batch 424.11 | loss  3.97 | ppl    52.846
| epoch 163 step    70650 |     18 batches | lr 0.000139 | ms/batch 415.22 | loss  4.01 | ppl    55.109
| epoch 163 step    70700 |     68 batches | lr 0.000139 | ms/batch 424.67 | loss  3.91 | ppl    49.701
| epoch 163 step    70750 |    118 batches | lr 0.000139 | ms/batch 423.92 | loss  3.96 | ppl    52.596
| epoch 163 step    70800 |    168 batches | lr 0.000139 | ms/batch 425.03 | loss  3.97 | ppl    52.764
----------------------------------------------------------------------------------------------------
| Eval 177 at step    70800 | time: 175.72s | valid loss  4.21 | valid ppl    67.383
----------------------------------------------------------------------------------------------------
| epoch 163 step    70850 |    218 batches | lr 0.000139 | ms/batch 582.09 | loss  3.99 | ppl    53.940
| epoch 163 step    70900 |    268 batches | lr 0.000138 | ms/batch 424.88 | loss  4.02 | ppl    55.630
| epoch 163 step    70950 |    318 batches | lr 0.000138 | ms/batch 425.16 | loss  3.98 | ppl    53.595
| epoch 163 step    71000 |    368 batches | lr 0.000138 | ms/batch 423.84 | loss  3.92 | ppl    50.480
| epoch 163 step    71050 |    418 batches | lr 0.000138 | ms/batch 424.02 | loss  3.97 | ppl    52.968
| epoch 164 step    71100 |     32 batches | lr 0.000137 | ms/batch 416.45 | loss  3.97 | ppl    53.210
| epoch 164 step    71150 |     82 batches | lr 0.000137 | ms/batch 425.30 | loss  3.92 | ppl    50.544
| epoch 164 step    71200 |    132 batches | lr 0.000137 | ms/batch 426.68 | loss  3.96 | ppl    52.435
----------------------------------------------------------------------------------------------------
| Eval 178 at step    71200 | time: 175.99s | valid loss  4.21 | valid ppl    67.350
----------------------------------------------------------------------------------------------------
| epoch 164 step    71250 |    182 batches | lr 0.000137 | ms/batch 583.18 | loss  3.96 | ppl    52.516
| epoch 164 step    71300 |    232 batches | lr 0.000137 | ms/batch 424.86 | loss  3.99 | ppl    54.264
| epoch 164 step    71350 |    282 batches | lr 0.000136 | ms/batch 423.48 | loss  3.97 | ppl    53.157
| epoch 164 step    71400 |    332 batches | lr 0.000136 | ms/batch 424.21 | loss  3.91 | ppl    49.853
| epoch 164 step    71450 |    382 batches | lr 0.000136 | ms/batch 424.31 | loss  3.98 | ppl    53.769
| epoch 164 step    71500 |    432 batches | lr 0.000136 | ms/batch 423.88 | loss  3.98 | ppl    53.595
| epoch 165 step    71550 |     46 batches | lr 0.000136 | ms/batch 417.64 | loss  3.93 | ppl    50.719
| epoch 165 step    71600 |     96 batches | lr 0.000135 | ms/batch 425.44 | loss  3.90 | ppl    49.451
----------------------------------------------------------------------------------------------------
| Eval 179 at step    71600 | time: 175.77s | valid loss  4.22 | valid ppl    68.081
----------------------------------------------------------------------------------------------------
| epoch 165 step    71650 |    146 batches | lr 0.000135 | ms/batch 550.57 | loss  3.98 | ppl    53.519
| epoch 165 step    71700 |    196 batches | lr 0.000135 | ms/batch 425.75 | loss  3.99 | ppl    54.152
| epoch 165 step    71750 |    246 batches | lr 0.000135 | ms/batch 425.32 | loss  4.00 | ppl    54.797
| epoch 165 step    71800 |    296 batches | lr 0.000134 | ms/batch 424.05 | loss  4.04 | ppl    56.662
| epoch 165 step    71850 |    346 batches | lr 0.000134 | ms/batch 423.47 | loss  3.88 | ppl    48.423
| epoch 165 step    71900 |    396 batches | lr 0.000134 | ms/batch 425.10 | loss  3.96 | ppl    52.408
| epoch 166 step    71950 |     10 batches | lr 0.000134 | ms/batch 416.08 | loss  3.99 | ppl    54.253
| epoch 166 step    72000 |     60 batches | lr 0.000134 | ms/batch 426.20 | loss  3.91 | ppl    50.111
----------------------------------------------------------------------------------------------------
| Eval 180 at step    72000 | time: 175.85s | valid loss  4.22 | valid ppl    67.853
----------------------------------------------------------------------------------------------------
| epoch 166 step    72050 |    110 batches | lr 0.000133 | ms/batch 551.34 | loss  3.94 | ppl    51.610
| epoch 166 step    72100 |    160 batches | lr 0.000133 | ms/batch 423.24 | loss  3.97 | ppl    52.995
| epoch 166 step    72150 |    210 batches | lr 0.000133 | ms/batch 423.74 | loss  3.97 | ppl    53.090
| epoch 166 step    72200 |    260 batches | lr 0.000133 | ms/batch 424.07 | loss  3.99 | ppl    53.934
| epoch 166 step    72250 |    310 batches | lr 0.000133 | ms/batch 423.89 | loss  3.99 | ppl    53.888
| epoch 166 step    72300 |    360 batches | lr 0.000132 | ms/batch 425.34 | loss  3.90 | ppl    49.628
| epoch 166 step    72350 |    410 batches | lr 0.000132 | ms/batch 423.41 | loss  3.96 | ppl    52.292
| epoch 167 step    72400 |     24 batches | lr 0.000132 | ms/batch 414.86 | loss  3.99 | ppl    54.082
----------------------------------------------------------------------------------------------------
| Eval 181 at step    72400 | time: 175.49s | valid loss  4.22 | valid ppl    67.840
----------------------------------------------------------------------------------------------------
| epoch 167 step    72450 |     74 batches | lr 0.000132 | ms/batch 549.36 | loss  3.91 | ppl    49.975
| epoch 167 step    72500 |    124 batches | lr 0.000131 | ms/batch 424.16 | loss  3.95 | ppl    52.023
| epoch 167 step    72550 |    174 batches | lr 0.000131 | ms/batch 424.23 | loss  3.97 | ppl    52.749
| epoch 167 step    72600 |    224 batches | lr 0.000131 | ms/batch 423.84 | loss  4.01 | ppl    54.995
| epoch 167 step    72650 |    274 batches | lr 0.000131 | ms/batch 424.23 | loss  3.98 | ppl    53.773
| epoch 167 step    72700 |    324 batches | lr 0.000131 | ms/batch 423.99 | loss  3.93 | ppl    50.726
| epoch 167 step    72750 |    374 batches | lr 0.00013 | ms/batch 424.63 | loss  3.93 | ppl    50.937
| epoch 167 step    72800 |    424 batches | lr 0.00013 | ms/batch 424.02 | loss  3.96 | ppl    52.591
----------------------------------------------------------------------------------------------------
| Eval 182 at step    72800 | time: 175.93s | valid loss  4.22 | valid ppl    67.854
----------------------------------------------------------------------------------------------------
| epoch 168 step    72850 |     38 batches | lr 0.00013 | ms/batch 543.80 | loss  3.97 | ppl    52.859
| epoch 168 step    72900 |     88 batches | lr 0.00013 | ms/batch 425.07 | loss  3.91 | ppl    49.981
| epoch 168 step    72950 |    138 batches | lr 0.00013 | ms/batch 423.34 | loss  3.95 | ppl    51.758
| epoch 168 step    73000 |    188 batches | lr 0.000129 | ms/batch 426.72 | loss  3.95 | ppl    51.819
| epoch 168 step    73050 |    238 batches | lr 0.000129 | ms/batch 424.43 | loss  3.97 | ppl    52.808
| epoch 168 step    73100 |    288 batches | lr 0.000129 | ms/batch 423.71 | loss  4.00 | ppl    54.510
| epoch 168 step    73150 |    338 batches | lr 0.000129 | ms/batch 425.96 | loss  3.87 | ppl    48.029
| epoch 168 step    73200 |    388 batches | lr 0.000129 | ms/batch 426.45 | loss  3.96 | ppl    52.641
----------------------------------------------------------------------------------------------------
| Eval 183 at step    73200 | time: 176.01s | valid loss  4.21 | valid ppl    67.506
----------------------------------------------------------------------------------------------------
| epoch 169 step    73250 |      2 batches | lr 0.000128 | ms/batch 545.03 | loss  3.99 | ppl    54.193
| epoch 169 step    73300 |     52 batches | lr 0.000128 | ms/batch 424.06 | loss  3.92 | ppl    50.228
| epoch 169 step    73350 |    102 batches | lr 0.000128 | ms/batch 423.64 | loss  3.91 | ppl    50.144
| epoch 169 step    73400 |    152 batches | lr 0.000128 | ms/batch 425.18 | loss  3.94 | ppl    51.586
| epoch 169 step    73450 |    202 batches | lr 0.000127 | ms/batch 423.60 | loss  3.98 | ppl    53.273
| epoch 169 step    73500 |    252 batches | lr 0.000127 | ms/batch 424.37 | loss  3.97 | ppl    52.724
| epoch 169 step    73550 |    302 batches | lr 0.000127 | ms/batch 424.50 | loss  4.02 | ppl    55.430
| epoch 169 step    73600 |    352 batches | lr 0.000127 | ms/batch 422.45 | loss  3.88 | ppl    48.665
----------------------------------------------------------------------------------------------------
| Eval 184 at step    73600 | time: 175.60s | valid loss  4.21 | valid ppl    67.663
----------------------------------------------------------------------------------------------------
| epoch 169 step    73650 |    402 batches | lr 0.000127 | ms/batch 549.82 | loss  3.95 | ppl    51.911
| epoch 170 step    73700 |     16 batches | lr 0.000126 | ms/batch 417.09 | loss  3.99 | ppl    54.076
| epoch 170 step    73750 |     66 batches | lr 0.000126 | ms/batch 426.15 | loss  3.89 | ppl    48.929
| epoch 170 step    73800 |    116 batches | lr 0.000126 | ms/batch 423.90 | loss  3.94 | ppl    51.267
| epoch 170 step    73850 |    166 batches | lr 0.000126 | ms/batch 425.46 | loss  3.96 | ppl    52.253
| epoch 170 step    73900 |    216 batches | lr 0.000126 | ms/batch 424.18 | loss  3.96 | ppl    52.468
| epoch 170 step    73950 |    266 batches | lr 0.000125 | ms/batch 425.22 | loss  3.96 | ppl    52.612
| epoch 170 step    74000 |    316 batches | lr 0.000125 | ms/batch 424.77 | loss  3.97 | ppl    52.791
----------------------------------------------------------------------------------------------------
| Eval 185 at step    74000 | time: 175.84s | valid loss  4.22 | valid ppl    67.823
----------------------------------------------------------------------------------------------------
| epoch 170 step    74050 |    366 batches | lr 0.000125 | ms/batch 552.00 | loss  3.90 | ppl    49.208
| epoch 170 step    74100 |    416 batches | lr 0.000125 | ms/batch 424.29 | loss  3.95 | ppl    51.766
| epoch 171 step    74150 |     30 batches | lr 0.000124 | ms/batch 418.73 | loss  3.97 | ppl    53.090
| epoch 171 step    74200 |     80 batches | lr 0.000124 | ms/batch 425.64 | loss  3.88 | ppl    48.642
| epoch 171 step    74250 |    130 batches | lr 0.000124 | ms/batch 423.41 | loss  3.94 | ppl    51.557
| epoch 171 step    74300 |    180 batches | lr 0.000124 | ms/batch 422.90 | loss  3.97 | ppl    52.806
| epoch 171 step    74350 |    230 batches | lr 0.000124 | ms/batch 423.40 | loss  3.99 | ppl    54.005
| epoch 171 step    74400 |    280 batches | lr 0.000123 | ms/batch 423.75 | loss  4.00 | ppl    54.549
----------------------------------------------------------------------------------------------------
| Eval 186 at step    74400 | time: 175.69s | valid loss  4.22 | valid ppl    67.722
----------------------------------------------------------------------------------------------------
| epoch 171 step    74450 |    330 batches | lr 0.000123 | ms/batch 551.81 | loss  3.92 | ppl    50.355
| epoch 171 step    74500 |    380 batches | lr 0.000123 | ms/batch 426.07 | loss  3.94 | ppl    51.668
| epoch 171 step    74550 |    430 batches | lr 0.000123 | ms/batch 426.94 | loss  3.96 | ppl    52.655
| epoch 172 step    74600 |     44 batches | lr 0.000123 | ms/batch 419.16 | loss  3.93 | ppl    50.988
| epoch 172 step    74650 |     94 batches | lr 0.000122 | ms/batch 428.21 | loss  3.88 | ppl    48.427
| epoch 172 step    74700 |    144 batches | lr 0.000122 | ms/batch 426.41 | loss  3.92 | ppl    50.301
| epoch 172 step    74750 |    194 batches | lr 0.000122 | ms/batch 427.53 | loss  3.96 | ppl    52.283
| epoch 172 step    74800 |    244 batches | lr 0.000122 | ms/batch 425.56 | loss  3.98 | ppl    53.664
----------------------------------------------------------------------------------------------------
| Eval 187 at step    74800 | time: 176.61s | valid loss  4.21 | valid ppl    67.613
----------------------------------------------------------------------------------------------------
| epoch 172 step    74850 |    294 batches | lr 0.000122 | ms/batch 552.60 | loss  4.00 | ppl    54.687
| epoch 172 step    74900 |    344 batches | lr 0.000121 | ms/batch 425.71 | loss  3.86 | ppl    47.433
| epoch 172 step    74950 |    394 batches | lr 0.000121 | ms/batch 425.40 | loss  3.95 | ppl    51.811
| epoch 173 step    75000 |      8 batches | lr 0.000121 | ms/batch 415.30 | loss  3.99 | ppl    53.888
| epoch 173 step    75050 |     58 batches | lr 0.000121 | ms/batch 424.21 | loss  3.89 | ppl    48.992
| epoch 173 step    75100 |    108 batches | lr 0.000121 | ms/batch 424.56 | loss  3.90 | ppl    49.329
| epoch 173 step    75150 |    158 batches | lr 0.00012 | ms/batch 424.32 | loss  3.93 | ppl    51.054
| epoch 173 step    75200 |    208 batches | lr 0.00012 | ms/batch 423.93 | loss  3.94 | ppl    51.278
----------------------------------------------------------------------------------------------------
| Eval 188 at step    75200 | time: 175.79s | valid loss  4.20 | valid ppl    66.976
----------------------------------------------------------------------------------------------------
| epoch 173 step    75250 |    258 batches | lr 0.00012 | ms/batch 580.49 | loss  3.97 | ppl    52.736
| epoch 173 step    75300 |    308 batches | lr 0.00012 | ms/batch 424.38 | loss  3.98 | ppl    53.333
| epoch 173 step    75350 |    358 batches | lr 0.000119 | ms/batch 425.43 | loss  3.89 | ppl    49.033
| epoch 173 step    75400 |    408 batches | lr 0.000119 | ms/batch 424.67 | loss  3.91 | ppl    50.067
| epoch 174 step    75450 |     22 batches | lr 0.000119 | ms/batch 415.51 | loss  3.97 | ppl    52.793
| epoch 174 step    75500 |     72 batches | lr 0.000119 | ms/batch 425.49 | loss  3.89 | ppl    48.887
| epoch 174 step    75550 |    122 batches | lr 0.000119 | ms/batch 424.56 | loss  3.92 | ppl    50.458
| epoch 174 step    75600 |    172 batches | lr 0.000118 | ms/batch 424.36 | loss  3.94 | ppl    51.385
----------------------------------------------------------------------------------------------------
| Eval 189 at step    75600 | time: 175.78s | valid loss  4.20 | valid ppl    66.694
----------------------------------------------------------------------------------------------------
| epoch 174 step    75650 |    222 batches | lr 0.000118 | ms/batch 580.07 | loss  3.95 | ppl    51.989
| epoch 174 step    75700 |    272 batches | lr 0.000118 | ms/batch 427.06 | loss  3.97 | ppl    52.961
| epoch 174 step    75750 |    322 batches | lr 0.000118 | ms/batch 424.24 | loss  3.91 | ppl    50.046
| epoch 174 step    75800 |    372 batches | lr 0.000118 | ms/batch 424.16 | loss  3.92 | ppl    50.506
| epoch 174 step    75850 |    422 batches | lr 0.000117 | ms/batch 424.00 | loss  3.93 | ppl    51.105
| epoch 175 step    75900 |     36 batches | lr 0.000117 | ms/batch 416.18 | loss  3.96 | ppl    52.230
| epoch 175 step    75950 |     86 batches | lr 0.000117 | ms/batch 424.87 | loss  3.87 | ppl    48.174
| epoch 175 step    76000 |    136 batches | lr 0.000117 | ms/batch 423.80 | loss  3.94 | ppl    51.461
----------------------------------------------------------------------------------------------------
| Eval 190 at step    76000 | time: 175.73s | valid loss  4.21 | valid ppl    67.579
----------------------------------------------------------------------------------------------------
| epoch 175 step    76050 |    186 batches | lr 0.000117 | ms/batch 549.04 | loss  3.94 | ppl    51.493
| epoch 175 step    76100 |    236 batches | lr 0.000116 | ms/batch 424.19 | loss  3.94 | ppl    51.212
| epoch 175 step    76150 |    286 batches | lr 0.000116 | ms/batch 423.54 | loss  3.97 | ppl    53.147
| epoch 175 step    76200 |    336 batches | lr 0.000116 | ms/batch 424.19 | loss  3.85 | ppl    46.814
| epoch 175 step    76250 |    386 batches | lr 0.000116 | ms/batch 424.24 | loss  3.92 | ppl    50.641
| epoch 175 step    76300 |    436 batches | lr 0.000116 | ms/batch 418.09 | loss  3.93 | ppl    51.077
| epoch 176 step    76350 |     50 batches | lr 0.000115 | ms/batch 422.31 | loss  3.91 | ppl    49.732
| epoch 176 step    76400 |    100 batches | lr 0.000115 | ms/batch 425.07 | loss  3.89 | ppl    49.034
----------------------------------------------------------------------------------------------------
| Eval 191 at step    76400 | time: 175.57s | valid loss  4.21 | valid ppl    67.689
----------------------------------------------------------------------------------------------------
| epoch 176 step    76450 |    150 batches | lr 0.000115 | ms/batch 551.39 | loss  3.92 | ppl    50.448
| epoch 176 step    76500 |    200 batches | lr 0.000115 | ms/batch 425.91 | loss  3.94 | ppl    51.209
| epoch 176 step    76550 |    250 batches | lr 0.000114 | ms/batch 423.24 | loss  3.97 | ppl    52.728
| epoch 176 step    76600 |    300 batches | lr 0.000114 | ms/batch 423.32 | loss  4.00 | ppl    54.574
| epoch 176 step    76650 |    350 batches | lr 0.000114 | ms/batch 422.23 | loss  3.87 | ppl    47.719
| epoch 176 step    76700 |    400 batches | lr 0.000114 | ms/batch 424.79 | loss  3.94 | ppl    51.389
| epoch 177 step    76750 |     14 batches | lr 0.000114 | ms/batch 415.68 | loss  3.95 | ppl    52.158
| epoch 177 step    76800 |     64 batches | lr 0.000113 | ms/batch 424.88 | loss  3.90 | ppl    49.244
----------------------------------------------------------------------------------------------------
| Eval 192 at step    76800 | time: 175.58s | valid loss  4.21 | valid ppl    67.575
----------------------------------------------------------------------------------------------------
| epoch 177 step    76850 |    114 batches | lr 0.000113 | ms/batch 549.01 | loss  3.93 | ppl    51.035
| epoch 177 step    76900 |    164 batches | lr 0.000113 | ms/batch 424.73 | loss  3.95 | ppl    51.742
| epoch 177 step    76950 |    214 batches | lr 0.000113 | ms/batch 423.50 | loss  3.94 | ppl    51.416
| epoch 177 step    77000 |    264 batches | lr 0.000113 | ms/batch 424.53 | loss  3.94 | ppl    51.256
| epoch 177 step    77050 |    314 batches | lr 0.000112 | ms/batch 423.70 | loss  3.93 | ppl    50.655
| epoch 177 step    77100 |    364 batches | lr 0.000112 | ms/batch 425.20 | loss  3.87 | ppl    47.742
| epoch 177 step    77150 |    414 batches | lr 0.000112 | ms/batch 424.15 | loss  3.92 | ppl    50.423
| epoch 178 step    77200 |     28 batches | lr 0.000112 | ms/batch 416.54 | loss  3.96 | ppl    52.377
----------------------------------------------------------------------------------------------------
| Eval 193 at step    77200 | time: 175.56s | valid loss  4.21 | valid ppl    67.258
----------------------------------------------------------------------------------------------------
| epoch 178 step    77250 |     78 batches | lr 0.000112 | ms/batch 550.15 | loss  3.88 | ppl    48.468
| epoch 178 step    77300 |    128 batches | lr 0.000111 | ms/batch 425.13 | loss  3.90 | ppl    49.610
| epoch 178 step    77350 |    178 batches | lr 0.000111 | ms/batch 423.98 | loss  3.93 | ppl    51.009
| epoch 178 step    77400 |    228 batches | lr 0.000111 | ms/batch 424.09 | loss  3.94 | ppl    51.193
| epoch 178 step    77450 |    278 batches | lr 0.000111 | ms/batch 424.94 | loss  3.95 | ppl    52.160
| epoch 178 step    77500 |    328 batches | lr 0.000111 | ms/batch 424.47 | loss  3.91 | ppl    49.776
| epoch 178 step    77550 |    378 batches | lr 0.00011 | ms/batch 423.83 | loss  3.92 | ppl    50.495
| epoch 178 step    77600 |    428 batches | lr 0.00011 | ms/batch 423.39 | loss  3.95 | ppl    52.151
----------------------------------------------------------------------------------------------------
| Eval 194 at step    77600 | time: 175.97s | valid loss  4.21 | valid ppl    67.290
----------------------------------------------------------------------------------------------------
| epoch 179 step    77650 |     42 batches | lr 0.00011 | ms/batch 540.56 | loss  3.92 | ppl    50.354
| epoch 179 step    77700 |     92 batches | lr 0.00011 | ms/batch 423.60 | loss  3.86 | ppl    47.479
| epoch 179 step    77750 |    142 batches | lr 0.00011 | ms/batch 426.28 | loss  3.91 | ppl    49.685
| epoch 179 step    77800 |    192 batches | lr 0.000109 | ms/batch 426.55 | loss  3.92 | ppl    50.305
| epoch 179 step    77850 |    242 batches | lr 0.000109 | ms/batch 424.62 | loss  3.97 | ppl    52.953
| epoch 179 step    77900 |    292 batches | lr 0.000109 | ms/batch 423.01 | loss  3.96 | ppl    52.542
| epoch 179 step    77950 |    342 batches | lr 0.000109 | ms/batch 423.44 | loss  3.82 | ppl    45.676
| epoch 179 step    78000 |    392 batches | lr 0.000109 | ms/batch 425.65 | loss  3.94 | ppl    51.177
----------------------------------------------------------------------------------------------------
| Eval 195 at step    78000 | time: 175.70s | valid loss  4.21 | valid ppl    67.342
----------------------------------------------------------------------------------------------------
| epoch 180 step    78050 |      6 batches | lr 0.000108 | ms/batch 540.14 | loss  3.95 | ppl    51.681
| epoch 180 step    78100 |     56 batches | lr 0.000108 | ms/batch 423.95 | loss  3.88 | ppl    48.227
| epoch 180 step    78150 |    106 batches | lr 0.000108 | ms/batch 424.26 | loss  3.89 | ppl    48.886
| epoch 180 step    78200 |    156 batches | lr 0.000108 | ms/batch 423.29 | loss  3.93 | ppl    50.758
| epoch 180 step    78250 |    206 batches | lr 0.000108 | ms/batch 424.12 | loss  3.93 | ppl    51.060
| epoch 180 step    78300 |    256 batches | lr 0.000107 | ms/batch 423.89 | loss  3.94 | ppl    51.646
| epoch 180 step    78350 |    306 batches | lr 0.000107 | ms/batch 423.16 | loss  3.96 | ppl    52.411
| epoch 180 step    78400 |    356 batches | lr 0.000107 | ms/batch 422.87 | loss  3.87 | ppl    47.850
----------------------------------------------------------------------------------------------------
| Eval 196 at step    78400 | time: 175.29s | valid loss  4.21 | valid ppl    67.025
----------------------------------------------------------------------------------------------------
| epoch 180 step    78450 |    406 batches | lr 0.000107 | ms/batch 549.31 | loss  3.91 | ppl    49.653
| epoch 181 step    78500 |     20 batches | lr 0.000107 | ms/batch 414.78 | loss  3.95 | ppl    51.732
| epoch 181 step    78550 |     70 batches | lr 0.000106 | ms/batch 423.49 | loss  3.87 | ppl    48.103
| epoch 181 step    78600 |    120 batches | lr 0.000106 | ms/batch 426.19 | loss  3.91 | ppl    50.119
| epoch 181 step    78650 |    170 batches | lr 0.000106 | ms/batch 427.32 | loss  3.92 | ppl    50.151
| epoch 181 step    78700 |    220 batches | lr 0.000106 | ms/batch 422.63 | loss  3.96 | ppl    52.366
| epoch 181 step    78750 |    270 batches | lr 0.000105 | ms/batch 423.80 | loss  3.93 | ppl    50.760
| epoch 181 step    78800 |    320 batches | lr 0.000105 | ms/batch 423.34 | loss  3.90 | ppl    49.326
----------------------------------------------------------------------------------------------------
| Eval 197 at step    78800 | time: 175.57s | valid loss  4.21 | valid ppl    67.265
----------------------------------------------------------------------------------------------------
| epoch 181 step    78850 |    370 batches | lr 0.000105 | ms/batch 550.40 | loss  3.88 | ppl    48.421
| epoch 181 step    78900 |    420 batches | lr 0.000105 | ms/batch 424.46 | loss  3.90 | ppl    49.181
| epoch 182 step    78950 |     34 batches | lr 0.000105 | ms/batch 417.63 | loss  3.93 | ppl    51.002
| epoch 182 step    79000 |     84 batches | lr 0.000104 | ms/batch 422.66 | loss  3.87 | ppl    47.779
| epoch 182 step    79050 |    134 batches | lr 0.000104 | ms/batch 423.75 | loss  3.92 | ppl    50.278
| epoch 182 step    79100 |    184 batches | lr 0.000104 | ms/batch 423.69 | loss  3.91 | ppl    49.657
| epoch 182 step    79150 |    234 batches | lr 0.000104 | ms/batch 423.54 | loss  3.95 | ppl    51.938
| epoch 182 step    79200 |    284 batches | lr 0.000104 | ms/batch 423.65 | loss  3.97 | ppl    52.899
----------------------------------------------------------------------------------------------------
| Eval 198 at step    79200 | time: 175.47s | valid loss  4.21 | valid ppl    67.360
----------------------------------------------------------------------------------------------------
| epoch 182 step    79250 |    334 batches | lr 0.000103 | ms/batch 550.70 | loss  3.86 | ppl    47.354
| epoch 182 step    79300 |    384 batches | lr 0.000103 | ms/batch 423.12 | loss  3.93 | ppl    50.697
| epoch 182 step    79350 |    434 batches | lr 0.000103 | ms/batch 426.55 | loss  3.93 | ppl    51.064
| epoch 183 step    79400 |     48 batches | lr 0.000103 | ms/batch 417.34 | loss  3.87 | ppl    47.771
| epoch 183 step    79450 |     98 batches | lr 0.000103 | ms/batch 426.11 | loss  3.85 | ppl    46.985
| epoch 183 step    79500 |    148 batches | lr 0.000102 | ms/batch 425.84 | loss  3.90 | ppl    49.375
| epoch 183 step    79550 |    198 batches | lr 0.000102 | ms/batch 423.79 | loss  3.94 | ppl    51.322
| epoch 183 step    79600 |    248 batches | lr 0.000102 | ms/batch 424.12 | loss  3.92 | ppl    50.469
----------------------------------------------------------------------------------------------------
| Eval 199 at step    79600 | time: 175.90s | valid loss  4.21 | valid ppl    67.387
----------------------------------------------------------------------------------------------------
| epoch 183 step    79650 |    298 batches | lr 0.000102 | ms/batch 551.36 | loss  3.95 | ppl    52.085
| epoch 183 step    79700 |    348 batches | lr 0.000102 | ms/batch 423.91 | loss  3.83 | ppl    46.229
| epoch 183 step    79750 |    398 batches | lr 0.000101 | ms/batch 425.29 | loss  3.90 | ppl    49.494
| epoch 184 step    79800 |     12 batches | lr 0.000101 | ms/batch 415.81 | loss  3.95 | ppl    51.679
| epoch 184 step    79850 |     62 batches | lr 0.000101 | ms/batch 425.00 | loss  3.87 | ppl    47.962
| epoch 184 step    79900 |    112 batches | lr 0.000101 | ms/batch 425.23 | loss  3.90 | ppl    49.357
| epoch 184 step    79950 |    162 batches | lr 0.000101 | ms/batch 426.20 | loss  3.93 | ppl    50.934
| epoch 184 step    80000 |    212 batches | lr 0.0001 | ms/batch 426.94 | loss  3.94 | ppl    51.556
----------------------------------------------------------------------------------------------------
| Eval 200 at step    80000 | time: 175.98s | valid loss  4.21 | valid ppl    67.508
----------------------------------------------------------------------------------------------------
| epoch 184 step    80050 |    262 batches | lr 0.0001 | ms/batch 552.99 | loss  3.94 | ppl    51.419
| epoch 184 step    80100 |    312 batches | lr 0.0001 | ms/batch 425.14 | loss  3.92 | ppl    50.249
| epoch 184 step    80150 |    362 batches | lr 9.99e-05 | ms/batch 425.22 | loss  3.88 | ppl    48.216
| epoch 184 step    80200 |    412 batches | lr 9.97e-05 | ms/batch 425.67 | loss  3.91 | ppl    49.895
| epoch 185 step    80250 |     26 batches | lr 9.95e-05 | ms/batch 416.27 | loss  3.95 | ppl    51.680
| epoch 185 step    80300 |     76 batches | lr 9.93e-05 | ms/batch 423.58 | loss  3.85 | ppl    47.074
| epoch 185 step    80350 |    126 batches | lr 9.91e-05 | ms/batch 423.19 | loss  3.92 | ppl    50.218
| epoch 185 step    80400 |    176 batches | lr 9.89e-05 | ms/batch 423.45 | loss  3.90 | ppl    49.177
----------------------------------------------------------------------------------------------------
| Eval 201 at step    80400 | time: 175.76s | valid loss  4.20 | valid ppl    66.968
----------------------------------------------------------------------------------------------------
| epoch 185 step    80450 |    226 batches | lr 9.87e-05 | ms/batch 551.37 | loss  3.90 | ppl    49.621
| epoch 185 step    80500 |    276 batches | lr 9.85e-05 | ms/batch 424.82 | loss  3.92 | ppl    50.195
| epoch 185 step    80550 |    326 batches | lr 9.83e-05 | ms/batch 425.56 | loss  3.88 | ppl    48.414
| epoch 185 step    80600 |    376 batches | lr 9.81e-05 | ms/batch 426.97 | loss  3.88 | ppl    48.471
| epoch 185 step    80650 |    426 batches | lr 9.79e-05 | ms/batch 425.63 | loss  3.91 | ppl    49.938
| epoch 186 step    80700 |     40 batches | lr 9.77e-05 | ms/batch 417.24 | loss  3.89 | ppl    49.033
| epoch 186 step    80750 |     90 batches | lr 9.75e-05 | ms/batch 426.80 | loss  3.83 | ppl    45.940
| epoch 186 step    80800 |    140 batches | lr 9.73e-05 | ms/batch 425.68 | loss  3.91 | ppl    49.904
----------------------------------------------------------------------------------------------------
| Eval 202 at step    80800 | time: 176.23s | valid loss  4.21 | valid ppl    67.059
----------------------------------------------------------------------------------------------------
| epoch 186 step    80850 |    190 batches | lr 9.71e-05 | ms/batch 552.49 | loss  3.92 | ppl    50.461
| epoch 186 step    80900 |    240 batches | lr 9.69e-05 | ms/batch 425.96 | loss  3.93 | ppl    50.759
| epoch 186 step    80950 |    290 batches | lr 9.67e-05 | ms/batch 424.29 | loss  3.94 | ppl    51.566
| epoch 186 step    81000 |    340 batches | lr 9.65e-05 | ms/batch 424.73 | loss  3.83 | ppl    46.037
| epoch 186 step    81050 |    390 batches | lr 9.63e-05 | ms/batch 425.29 | loss  3.90 | ppl    49.477
| epoch 187 step    81100 |      4 batches | lr 9.61e-05 | ms/batch 417.86 | loss  3.95 | ppl    52.047
| epoch 187 step    81150 |     54 batches | lr 9.59e-05 | ms/batch 425.96 | loss  3.90 | ppl    49.566
| epoch 187 step    81200 |    104 batches | lr 9.57e-05 | ms/batch 427.04 | loss  3.87 | ppl    47.889
----------------------------------------------------------------------------------------------------
| Eval 203 at step    81200 | time: 176.18s | valid loss  4.22 | valid ppl    67.816
----------------------------------------------------------------------------------------------------
| epoch 187 step    81250 |    154 batches | lr 9.56e-05 | ms/batch 550.95 | loss  3.94 | ppl    51.380
| epoch 187 step    81300 |    204 batches | lr 9.54e-05 | ms/batch 425.91 | loss  3.92 | ppl    50.456
| epoch 187 step    81350 |    254 batches | lr 9.52e-05 | ms/batch 426.23 | loss  3.93 | ppl    50.886
| epoch 187 step    81400 |    304 batches | lr 9.5e-05 | ms/batch 425.93 | loss  3.93 | ppl    50.863
| epoch 187 step    81450 |    354 batches | lr 9.48e-05 | ms/batch 425.58 | loss  3.81 | ppl    45.332
| epoch 187 step    81500 |    404 batches | lr 9.46e-05 | ms/batch 425.26 | loss  3.91 | ppl    49.721
| epoch 188 step    81550 |     18 batches | lr 9.44e-05 | ms/batch 415.72 | loss  3.91 | ppl    49.775
| epoch 188 step    81600 |     68 batches | lr 9.42e-05 | ms/batch 423.67 | loss  3.85 | ppl    46.860
----------------------------------------------------------------------------------------------------
| Eval 204 at step    81600 | time: 175.96s | valid loss  4.20 | valid ppl    66.986
----------------------------------------------------------------------------------------------------
| epoch 188 step    81650 |    118 batches | lr 9.4e-05 | ms/batch 551.25 | loss  3.91 | ppl    49.855
| epoch 188 step    81700 |    168 batches | lr 9.38e-05 | ms/batch 424.08 | loss  3.90 | ppl    49.316
| epoch 188 step    81750 |    218 batches | lr 9.36e-05 | ms/batch 423.21 | loss  3.91 | ppl    49.766
| epoch 188 step    81800 |    268 batches | lr 9.34e-05 | ms/batch 423.58 | loss  3.93 | ppl    50.867
| epoch 188 step    81850 |    318 batches | lr 9.32e-05 | ms/batch 423.06 | loss  3.91 | ppl    49.849
| epoch 188 step    81900 |    368 batches | lr 9.3e-05 | ms/batch 424.58 | loss  3.90 | ppl    49.412
| epoch 188 step    81950 |    418 batches | lr 9.28e-05 | ms/batch 422.92 | loss  3.91 | ppl    49.776
| epoch 189 step    82000 |     32 batches | lr 9.26e-05 | ms/batch 414.85 | loss  3.95 | ppl    52.092
----------------------------------------------------------------------------------------------------
| Eval 205 at step    82000 | time: 175.35s | valid loss  4.21 | valid ppl    67.515
----------------------------------------------------------------------------------------------------
| epoch 189 step    82050 |     82 batches | lr 9.24e-05 | ms/batch 549.06 | loss  3.83 | ppl    46.209
| epoch 189 step    82100 |    132 batches | lr 9.22e-05 | ms/batch 424.93 | loss  3.90 | ppl    49.559
| epoch 189 step    82150 |    182 batches | lr 9.2e-05 | ms/batch 426.07 | loss  3.88 | ppl    48.362
| epoch 189 step    82200 |    232 batches | lr 9.19e-05 | ms/batch 425.98 | loss  3.92 | ppl    50.250
| epoch 189 step    82250 |    282 batches | lr 9.17e-05 | ms/batch 424.29 | loss  3.92 | ppl    50.236
| epoch 189 step    82300 |    332 batches | lr 9.15e-05 | ms/batch 424.74 | loss  3.83 | ppl    46.112
| epoch 189 step    82350 |    382 batches | lr 9.13e-05 | ms/batch 425.00 | loss  3.91 | ppl    49.809
| epoch 189 step    82400 |    432 batches | lr 9.11e-05 | ms/batch 425.95 | loss  3.90 | ppl    49.405
----------------------------------------------------------------------------------------------------
| Eval 206 at step    82400 | time: 176.35s | valid loss  4.21 | valid ppl    67.100
----------------------------------------------------------------------------------------------------
| epoch 190 step    82450 |     46 batches | lr 9.09e-05 | ms/batch 543.31 | loss  3.87 | ppl    47.789
| epoch 190 step    82500 |     96 batches | lr 9.07e-05 | ms/batch 424.47 | loss  3.84 | ppl    46.704
| epoch 190 step    82550 |    146 batches | lr 9.05e-05 | ms/batch 424.30 | loss  3.91 | ppl    49.814
| epoch 190 step    82600 |    196 batches | lr 9.03e-05 | ms/batch 424.39 | loss  3.91 | ppl    49.680
| epoch 190 step    82650 |    246 batches | lr 9.01e-05 | ms/batch 424.39 | loss  3.93 | ppl    50.848
| epoch 190 step    82700 |    296 batches | lr 8.99e-05 | ms/batch 424.64 | loss  3.94 | ppl    51.230
| epoch 190 step    82750 |    346 batches | lr 8.97e-05 | ms/batch 423.92 | loss  3.80 | ppl    44.719
| epoch 190 step    82800 |    396 batches | lr 8.95e-05 | ms/batch 424.46 | loss  3.90 | ppl    49.557
----------------------------------------------------------------------------------------------------
| Eval 207 at step    82800 | time: 175.68s | valid loss  4.21 | valid ppl    67.047
----------------------------------------------------------------------------------------------------
| epoch 191 step    82850 |     10 batches | lr 8.93e-05 | ms/batch 543.64 | loss  3.93 | ppl    50.666
| epoch 191 step    82900 |     60 batches | lr 8.92e-05 | ms/batch 425.53 | loss  3.85 | ppl    46.880
| epoch 191 step    82950 |    110 batches | lr 8.9e-05 | ms/batch 425.55 | loss  3.87 | ppl    47.859
| epoch 191 step    83000 |    160 batches | lr 8.88e-05 | ms/batch 426.94 | loss  3.89 | ppl    49.029
| epoch 191 step    83050 |    210 batches | lr 8.86e-05 | ms/batch 424.26 | loss  3.91 | ppl    49.804
| epoch 191 step    83100 |    260 batches | lr 8.84e-05 | ms/batch 423.66 | loss  3.91 | ppl    49.980
| epoch 191 step    83150 |    310 batches | lr 8.82e-05 | ms/batch 424.00 | loss  3.88 | ppl    48.662
| epoch 191 step    83200 |    360 batches | lr 8.8e-05 | ms/batch 423.09 | loss  3.84 | ppl    46.402
----------------------------------------------------------------------------------------------------
| Eval 208 at step    83200 | time: 175.83s | valid loss  4.20 | valid ppl    66.911
----------------------------------------------------------------------------------------------------
| epoch 191 step    83250 |    410 batches | lr 8.78e-05 | ms/batch 550.67 | loss  3.89 | ppl    49.033
| epoch 192 step    83300 |     24 batches | lr 8.76e-05 | ms/batch 415.94 | loss  3.91 | ppl    50.007
| epoch 192 step    83350 |     74 batches | lr 8.74e-05 | ms/batch 423.77 | loss  3.81 | ppl    45.337
| epoch 192 step    83400 |    124 batches | lr 8.72e-05 | ms/batch 422.66 | loss  3.87 | ppl    48.077
| epoch 192 step    83450 |    174 batches | lr 8.71e-05 | ms/batch 422.97 | loss  3.89 | ppl    48.683
| epoch 192 step    83500 |    224 batches | lr 8.69e-05 | ms/batch 423.14 | loss  3.91 | ppl    49.972
| epoch 192 step    83550 |    274 batches | lr 8.67e-05 | ms/batch 424.08 | loss  3.94 | ppl    51.648
| epoch 192 step    83600 |    324 batches | lr 8.65e-05 | ms/batch 424.27 | loss  3.87 | ppl    47.866
----------------------------------------------------------------------------------------------------
| Eval 209 at step    83600 | time: 175.39s | valid loss  4.20 | valid ppl    66.739
----------------------------------------------------------------------------------------------------
| epoch 192 step    83650 |    374 batches | lr 8.63e-05 | ms/batch 551.26 | loss  3.89 | ppl    48.795
| epoch 192 step    83700 |    424 batches | lr 8.61e-05 | ms/batch 423.56 | loss  3.88 | ppl    48.639
| epoch 193 step    83750 |     38 batches | lr 8.59e-05 | ms/batch 415.21 | loss  3.91 | ppl    49.842
| epoch 193 step    83800 |     88 batches | lr 8.57e-05 | ms/batch 423.11 | loss  3.82 | ppl    45.552
| epoch 193 step    83850 |    138 batches | lr 8.55e-05 | ms/batch 424.01 | loss  3.89 | ppl    48.667
| epoch 193 step    83900 |    188 batches | lr 8.54e-05 | ms/batch 423.17 | loss  3.88 | ppl    48.299
| epoch 193 step    83950 |    238 batches | lr 8.52e-05 | ms/batch 425.95 | loss  3.90 | ppl    49.576
| epoch 193 step    84000 |    288 batches | lr 8.5e-05 | ms/batch 423.32 | loss  3.97 | ppl    52.825
----------------------------------------------------------------------------------------------------
| Eval 210 at step    84000 | time: 175.46s | valid loss  4.20 | valid ppl    66.728
----------------------------------------------------------------------------------------------------
| epoch 193 step    84050 |    338 batches | lr 8.48e-05 | ms/batch 550.37 | loss  3.83 | ppl    46.189
| epoch 193 step    84100 |    388 batches | lr 8.46e-05 | ms/batch 425.31 | loss  3.90 | ppl    49.274
| epoch 194 step    84150 |      2 batches | lr 8.44e-05 | ms/batch 415.12 | loss  3.90 | ppl    49.458
| epoch 194 step    84200 |     52 batches | lr 8.42e-05 | ms/batch 424.30 | loss  3.86 | ppl    47.269
| epoch 194 step    84250 |    102 batches | lr 8.4e-05 | ms/batch 425.42 | loss  3.88 | ppl    48.352
| epoch 194 step    84300 |    152 batches | lr 8.38e-05 | ms/batch 425.00 | loss  3.88 | ppl    48.204
| epoch 194 step    84350 |    202 batches | lr 8.37e-05 | ms/batch 426.17 | loss  3.91 | ppl    49.818
| epoch 194 step    84400 |    252 batches | lr 8.35e-05 | ms/batch 427.26 | loss  3.91 | ppl    49.800
----------------------------------------------------------------------------------------------------
| Eval 211 at step    84400 | time: 176.00s | valid loss  4.21 | valid ppl    67.091
----------------------------------------------------------------------------------------------------
| epoch 194 step    84450 |    302 batches | lr 8.33e-05 | ms/batch 553.48 | loss  3.92 | ppl    50.605
| epoch 194 step    84500 |    352 batches | lr 8.31e-05 | ms/batch 425.85 | loss  3.81 | ppl    45.335
| epoch 194 step    84550 |    402 batches | lr 8.29e-05 | ms/batch 426.96 | loss  3.89 | ppl    48.900
| epoch 195 step    84600 |     16 batches | lr 8.27e-05 | ms/batch 417.43 | loss  3.92 | ppl    50.288
| epoch 195 step    84650 |     66 batches | lr 8.25e-05 | ms/batch 426.22 | loss  3.81 | ppl    45.056
| epoch 195 step    84700 |    116 batches | lr 8.23e-05 | ms/batch 425.66 | loss  3.87 | ppl    48.088
| epoch 195 step    84750 |    166 batches | lr 8.22e-05 | ms/batch 425.26 | loss  3.89 | ppl    48.679
| epoch 195 step    84800 |    216 batches | lr 8.2e-05 | ms/batch 424.32 | loss  3.88 | ppl    48.391
----------------------------------------------------------------------------------------------------
| Eval 212 at step    84800 | time: 176.24s | valid loss  4.21 | valid ppl    67.035
----------------------------------------------------------------------------------------------------
| epoch 195 step    84850 |    266 batches | lr 8.18e-05 | ms/batch 552.02 | loss  3.89 | ppl    48.835
| epoch 195 step    84900 |    316 batches | lr 8.16e-05 | ms/batch 425.69 | loss  3.88 | ppl    48.287
| epoch 195 step    84950 |    366 batches | lr 8.14e-05 | ms/batch 424.20 | loss  3.84 | ppl    46.313
| epoch 195 step    85000 |    416 batches | lr 8.12e-05 | ms/batch 424.27 | loss  3.88 | ppl    48.419
| epoch 196 step    85050 |     30 batches | lr 8.1e-05 | ms/batch 415.81 | loss  3.91 | ppl    50.084
| epoch 196 step    85100 |     80 batches | lr 8.09e-05 | ms/batch 423.81 | loss  3.84 | ppl    46.514
| epoch 196 step    85150 |    130 batches | lr 8.07e-05 | ms/batch 425.48 | loss  3.86 | ppl    47.687
| epoch 196 step    85200 |    180 batches | lr 8.05e-05 | ms/batch 425.29 | loss  3.87 | ppl    48.015
----------------------------------------------------------------------------------------------------
| Eval 213 at step    85200 | time: 175.81s | valid loss  4.20 | valid ppl    66.776
----------------------------------------------------------------------------------------------------
| epoch 196 step    85250 |    230 batches | lr 8.03e-05 | ms/batch 551.24 | loss  3.91 | ppl    49.888
| epoch 196 step    85300 |    280 batches | lr 8.01e-05 | ms/batch 425.26 | loss  3.90 | ppl    49.447
| epoch 196 step    85350 |    330 batches | lr 7.99e-05 | ms/batch 426.04 | loss  3.80 | ppl    44.827
| epoch 196 step    85400 |    380 batches | lr 7.97e-05 | ms/batch 424.75 | loss  3.86 | ppl    47.530
| epoch 196 step    85450 |    430 batches | lr 7.96e-05 | ms/batch 423.99 | loss  3.91 | ppl    49.686
| epoch 197 step    85500 |     44 batches | lr 7.94e-05 | ms/batch 416.25 | loss  3.90 | ppl    49.346
| epoch 197 step    85550 |     94 batches | lr 7.92e-05 | ms/batch 426.26 | loss  3.84 | ppl    46.621
| epoch 197 step    85600 |    144 batches | lr 7.9e-05 | ms/batch 425.54 | loss  3.87 | ppl    48.013
----------------------------------------------------------------------------------------------------
| Eval 214 at step    85600 | time: 176.01s | valid loss  4.20 | valid ppl    66.538
----------------------------------------------------------------------------------------------------
| epoch 197 step    85650 |    194 batches | lr 7.88e-05 | ms/batch 579.74 | loss  3.88 | ppl    48.490
| epoch 197 step    85700 |    244 batches | lr 7.86e-05 | ms/batch 424.97 | loss  3.89 | ppl    49.075
| epoch 197 step    85750 |    294 batches | lr 7.85e-05 | ms/batch 426.00 | loss  3.93 | ppl    51.150
| epoch 197 step    85800 |    344 batches | lr 7.83e-05 | ms/batch 425.30 | loss  3.78 | ppl    43.895
| epoch 197 step    85850 |    394 batches | lr 7.81e-05 | ms/batch 425.04 | loss  3.87 | ppl    47.992
| epoch 198 step    85900 |      8 batches | lr 7.79e-05 | ms/batch 417.16 | loss  3.91 | ppl    49.927
| epoch 198 step    85950 |     58 batches | lr 7.77e-05 | ms/batch 425.56 | loss  3.83 | ppl    46.109
| epoch 198 step    86000 |    108 batches | lr 7.75e-05 | ms/batch 426.61 | loss  3.84 | ppl    46.607
----------------------------------------------------------------------------------------------------
| Eval 215 at step    86000 | time: 176.07s | valid loss  4.21 | valid ppl    67.225
----------------------------------------------------------------------------------------------------
| epoch 198 step    86050 |    158 batches | lr 7.74e-05 | ms/batch 554.20 | loss  3.89 | ppl    48.871
| epoch 198 step    86100 |    208 batches | lr 7.72e-05 | ms/batch 425.23 | loss  3.88 | ppl    48.259
| epoch 198 step    86150 |    258 batches | lr 7.7e-05 | ms/batch 424.01 | loss  3.90 | ppl    49.424
| epoch 198 step    86200 |    308 batches | lr 7.68e-05 | ms/batch 425.81 | loss  3.91 | ppl    49.998
| epoch 198 step    86250 |    358 batches | lr 7.66e-05 | ms/batch 424.51 | loss  3.83 | ppl    46.115
| epoch 198 step    86300 |    408 batches | lr 7.65e-05 | ms/batch 427.81 | loss  3.87 | ppl    48.076
| epoch 199 step    86350 |     22 batches | lr 7.63e-05 | ms/batch 418.77 | loss  3.89 | ppl    48.935
| epoch 199 step    86400 |     72 batches | lr 7.61e-05 | ms/batch 428.38 | loss  3.83 | ppl    46.006
----------------------------------------------------------------------------------------------------
| Eval 216 at step    86400 | time: 176.46s | valid loss  4.20 | valid ppl    66.673
----------------------------------------------------------------------------------------------------
| epoch 199 step    86450 |    122 batches | lr 7.59e-05 | ms/batch 552.23 | loss  3.85 | ppl    47.195
| epoch 199 step    86500 |    172 batches | lr 7.57e-05 | ms/batch 425.80 | loss  3.86 | ppl    47.249
| epoch 199 step    86550 |    222 batches | lr 7.55e-05 | ms/batch 424.94 | loss  3.90 | ppl    49.403
| epoch 199 step    86600 |    272 batches | lr 7.54e-05 | ms/batch 425.67 | loss  3.89 | ppl    48.885
| epoch 199 step    86650 |    322 batches | lr 7.52e-05 | ms/batch 423.38 | loss  3.85 | ppl    46.883
| epoch 199 step    86700 |    372 batches | lr 7.5e-05 | ms/batch 426.67 | loss  3.83 | ppl    45.954
| epoch 199 step    86750 |    422 batches | lr 7.48e-05 | ms/batch 423.67 | loss  3.85 | ppl    46.790
| epoch 200 step    86800 |     36 batches | lr 7.46e-05 | ms/batch 417.66 | loss  3.90 | ppl    49.448
----------------------------------------------------------------------------------------------------
| Eval 217 at step    86800 | time: 175.98s | valid loss  4.20 | valid ppl    66.696
----------------------------------------------------------------------------------------------------
| epoch 200 step    86850 |     86 batches | lr 7.45e-05 | ms/batch 552.59 | loss  3.81 | ppl    45.145
| epoch 200 step    86900 |    136 batches | lr 7.43e-05 | ms/batch 425.75 | loss  3.87 | ppl    48.062
| epoch 200 step    86950 |    186 batches | lr 7.41e-05 | ms/batch 426.38 | loss  3.85 | ppl    47.059
| epoch 200 step    87000 |    236 batches | lr 7.39e-05 | ms/batch 426.72 | loss  3.89 | ppl    48.928
| epoch 200 step    87050 |    286 batches | lr 7.37e-05 | ms/batch 425.33 | loss  3.92 | ppl    50.373
| epoch 200 step    87100 |    336 batches | lr 7.36e-05 | ms/batch 427.01 | loss  3.79 | ppl    44.384
| epoch 200 step    87150 |    386 batches | lr 7.34e-05 | ms/batch 425.64 | loss  3.88 | ppl    48.555
| epoch 200 step    87200 |    436 batches | lr 7.32e-05 | ms/batch 420.34 | loss  3.86 | ppl    47.571
----------------------------------------------------------------------------------------------------
| Eval 218 at step    87200 | time: 176.45s | valid loss  4.21 | valid ppl    67.106
----------------------------------------------------------------------------------------------------
| epoch 201 step    87250 |     50 batches | lr 7.3e-05 | ms/batch 549.99 | loss  3.84 | ppl    46.651
| epoch 201 step    87300 |    100 batches | lr 7.29e-05 | ms/batch 424.54 | loss  3.82 | ppl    45.509
| epoch 201 step    87350 |    150 batches | lr 7.27e-05 | ms/batch 424.22 | loss  3.87 | ppl    47.716
| epoch 201 step    87400 |    200 batches | lr 7.25e-05 | ms/batch 423.28 | loss  3.87 | ppl    47.799
| epoch 201 step    87450 |    250 batches | lr 7.23e-05 | ms/batch 424.21 | loss  3.90 | ppl    49.603
| epoch 201 step    87500 |    300 batches | lr 7.21e-05 | ms/batch 426.17 | loss  3.90 | ppl    49.445
| epoch 201 step    87550 |    350 batches | lr 7.2e-05 | ms/batch 425.09 | loss  3.79 | ppl    44.258
| epoch 201 step    87600 |    400 batches | lr 7.18e-05 | ms/batch 424.08 | loss  3.87 | ppl    48.157
----------------------------------------------------------------------------------------------------
| Eval 219 at step    87600 | time: 176.09s | valid loss  4.20 | valid ppl    66.514
----------------------------------------------------------------------------------------------------
| epoch 202 step    87650 |     14 batches | lr 7.16e-05 | ms/batch 568.94 | loss  3.91 | ppl    49.888
| epoch 202 step    87700 |     64 batches | lr 7.14e-05 | ms/batch 426.34 | loss  3.80 | ppl    44.640
| epoch 202 step    87750 |    114 batches | lr 7.13e-05 | ms/batch 424.70 | loss  3.84 | ppl    46.305
| epoch 202 step    87800 |    164 batches | lr 7.11e-05 | ms/batch 425.05 | loss  3.86 | ppl    47.555
| epoch 202 step    87850 |    214 batches | lr 7.09e-05 | ms/batch 424.01 | loss  3.88 | ppl    48.218
| epoch 202 step    87900 |    264 batches | lr 7.07e-05 | ms/batch 424.66 | loss  3.89 | ppl    48.772
| epoch 202 step    87950 |    314 batches | lr 7.05e-05 | ms/batch 426.20 | loss  3.87 | ppl    47.936
| epoch 202 step    88000 |    364 batches | lr 7.04e-05 | ms/batch 426.75 | loss  3.82 | ppl    45.538
----------------------------------------------------------------------------------------------------
| Eval 220 at step    88000 | time: 175.96s | valid loss  4.20 | valid ppl    66.586
----------------------------------------------------------------------------------------------------
| epoch 202 step    88050 |    414 batches | lr 7.02e-05 | ms/batch 552.79 | loss  3.85 | ppl    47.128
| epoch 203 step    88100 |     28 batches | lr 7e-05 | ms/batch 415.24 | loss  3.90 | ppl    49.389
| epoch 203 step    88150 |     78 batches | lr 6.98e-05 | ms/batch 425.86 | loss  3.80 | ppl    44.899
| epoch 203 step    88200 |    128 batches | lr 6.97e-05 | ms/batch 425.73 | loss  3.86 | ppl    47.379
| epoch 203 step    88250 |    178 batches | lr 6.95e-05 | ms/batch 425.03 | loss  3.87 | ppl    47.970
| epoch 203 step    88300 |    228 batches | lr 6.93e-05 | ms/batch 425.80 | loss  3.87 | ppl    47.906
| epoch 203 step    88350 |    278 batches | lr 6.91e-05 | ms/batch 425.79 | loss  3.91 | ppl    49.774
| epoch 203 step    88400 |    328 batches | lr 6.9e-05 | ms/batch 425.93 | loss  3.79 | ppl    44.256
----------------------------------------------------------------------------------------------------
| Eval 221 at step    88400 | time: 176.12s | valid loss  4.20 | valid ppl    66.771
----------------------------------------------------------------------------------------------------
| epoch 203 step    88450 |    378 batches | lr 6.88e-05 | ms/batch 549.60 | loss  3.84 | ppl    46.595
| epoch 203 step    88500 |    428 batches | lr 6.86e-05 | ms/batch 423.31 | loss  3.86 | ppl    47.660
| epoch 204 step    88550 |     42 batches | lr 6.84e-05 | ms/batch 414.08 | loss  3.84 | ppl    46.418
| epoch 204 step    88600 |     92 batches | lr 6.83e-05 | ms/batch 424.40 | loss  3.82 | ppl    45.393
| epoch 204 step    88650 |    142 batches | lr 6.81e-05 | ms/batch 424.45 | loss  3.83 | ppl    46.235
| epoch 204 step    88700 |    192 batches | lr 6.79e-05 | ms/batch 424.74 | loss  3.86 | ppl    47.412
| epoch 204 step    88750 |    242 batches | lr 6.77e-05 | ms/batch 423.83 | loss  3.89 | ppl    48.877
| epoch 204 step    88800 |    292 batches | lr 6.76e-05 | ms/batch 427.74 | loss  3.93 | ppl    50.692
----------------------------------------------------------------------------------------------------
| Eval 222 at step    88800 | time: 175.61s | valid loss  4.20 | valid ppl    66.658
----------------------------------------------------------------------------------------------------
| epoch 204 step    88850 |    342 batches | lr 6.74e-05 | ms/batch 549.44 | loss  3.77 | ppl    43.227
| epoch 204 step    88900 |    392 batches | lr 6.72e-05 | ms/batch 424.06 | loss  3.86 | ppl    47.572
| epoch 205 step    88950 |      6 batches | lr 6.7e-05 | ms/batch 416.22 | loss  3.89 | ppl    48.962
| epoch 205 step    89000 |     56 batches | lr 6.69e-05 | ms/batch 425.52 | loss  3.82 | ppl    45.383
| epoch 205 step    89050 |    106 batches | lr 6.67e-05 | ms/batch 425.90 | loss  3.83 | ppl    45.908
| epoch 205 step    89100 |    156 batches | lr 6.65e-05 | ms/batch 424.10 | loss  3.87 | ppl    47.846
| epoch 205 step    89150 |    206 batches | lr 6.64e-05 | ms/batch 424.62 | loss  3.85 | ppl    46.786
| epoch 205 step    89200 |    256 batches | lr 6.62e-05 | ms/batch 424.25 | loss  3.90 | ppl    49.404
----------------------------------------------------------------------------------------------------
| Eval 223 at step    89200 | time: 175.73s | valid loss  4.21 | valid ppl    67.250
----------------------------------------------------------------------------------------------------
| epoch 205 step    89250 |    306 batches | lr 6.6e-05 | ms/batch 552.90 | loss  3.90 | ppl    49.207
| epoch 205 step    89300 |    356 batches | lr 6.58e-05 | ms/batch 424.83 | loss  3.80 | ppl    44.694
| epoch 205 step    89350 |    406 batches | lr 6.57e-05 | ms/batch 425.60 | loss  3.85 | ppl    46.790
| epoch 206 step    89400 |     20 batches | lr 6.55e-05 | ms/batch 415.67 | loss  3.89 | ppl    48.719
| epoch 206 step    89450 |     70 batches | lr 6.53e-05 | ms/batch 424.28 | loss  3.79 | ppl    44.272
| epoch 206 step    89500 |    120 batches | lr 6.52e-05 | ms/batch 423.41 | loss  3.85 | ppl    46.859
| epoch 206 step    89550 |    170 batches | lr 6.5e-05 | ms/batch 423.73 | loss  3.86 | ppl    47.615
| epoch 206 step    89600 |    220 batches | lr 6.48e-05 | ms/batch 423.45 | loss  3.88 | ppl    48.472
----------------------------------------------------------------------------------------------------
| Eval 224 at step    89600 | time: 175.64s | valid loss  4.20 | valid ppl    66.688
----------------------------------------------------------------------------------------------------
| epoch 206 step    89650 |    270 batches | lr 6.46e-05 | ms/batch 549.71 | loss  3.87 | ppl    47.833
| epoch 206 step    89700 |    320 batches | lr 6.45e-05 | ms/batch 422.67 | loss  3.85 | ppl    46.978
| epoch 206 step    89750 |    370 batches | lr 6.43e-05 | ms/batch 424.34 | loss  3.81 | ppl    44.954
| epoch 206 step    89800 |    420 batches | lr 6.41e-05 | ms/batch 423.33 | loss  3.85 | ppl    46.989
| epoch 207 step    89850 |     34 batches | lr 6.4e-05 | ms/batch 415.81 | loss  3.89 | ppl    48.828
| epoch 207 step    89900 |     84 batches | lr 6.38e-05 | ms/batch 424.88 | loss  3.79 | ppl    44.366
| epoch 207 step    89950 |    134 batches | lr 6.36e-05 | ms/batch 423.14 | loss  3.87 | ppl    47.852
| epoch 207 step    90000 |    184 batches | lr 6.35e-05 | ms/batch 424.11 | loss  3.85 | ppl    46.946
----------------------------------------------------------------------------------------------------
| Eval 225 at step    90000 | time: 175.42s | valid loss  4.20 | valid ppl    66.871
----------------------------------------------------------------------------------------------------
| epoch 207 step    90050 |    234 batches | lr 6.33e-05 | ms/batch 549.42 | loss  3.87 | ppl    47.783
| epoch 207 step    90100 |    284 batches | lr 6.31e-05 | ms/batch 424.56 | loss  3.90 | ppl    49.641
| epoch 207 step    90150 |    334 batches | lr 6.29e-05 | ms/batch 424.73 | loss  3.81 | ppl    45.151
| epoch 207 step    90200 |    384 batches | lr 6.28e-05 | ms/batch 424.03 | loss  3.86 | ppl    47.446
| epoch 207 step    90250 |    434 batches | lr 6.26e-05 | ms/batch 422.86 | loss  3.88 | ppl    48.416
| epoch 208 step    90300 |     48 batches | lr 6.24e-05 | ms/batch 416.44 | loss  3.84 | ppl    46.457
| epoch 208 step    90350 |     98 batches | lr 6.23e-05 | ms/batch 425.31 | loss  3.79 | ppl    44.361
| epoch 208 step    90400 |    148 batches | lr 6.21e-05 | ms/batch 423.26 | loss  3.85 | ppl    47.217
----------------------------------------------------------------------------------------------------
| Eval 226 at step    90400 | time: 175.53s | valid loss  4.20 | valid ppl    66.882
----------------------------------------------------------------------------------------------------
| epoch 208 step    90450 |    198 batches | lr 6.19e-05 | ms/batch 549.39 | loss  3.87 | ppl    48.040
| epoch 208 step    90500 |    248 batches | lr 6.18e-05 | ms/batch 423.91 | loss  3.87 | ppl    48.158
| epoch 208 step    90550 |    298 batches | lr 6.16e-05 | ms/batch 424.19 | loss  3.89 | ppl    48.794
| epoch 208 step    90600 |    348 batches | lr 6.14e-05 | ms/batch 423.21 | loss  3.77 | ppl    43.421
| epoch 208 step    90650 |    398 batches | lr 6.13e-05 | ms/batch 425.37 | loss  3.84 | ppl    46.348
| epoch 209 step    90700 |     12 batches | lr 6.11e-05 | ms/batch 415.00 | loss  3.88 | ppl    48.262
| epoch 209 step    90750 |     62 batches | lr 6.09e-05 | ms/batch 425.69 | loss  3.78 | ppl    43.831
| epoch 209 step    90800 |    112 batches | lr 6.08e-05 | ms/batch 425.93 | loss  3.82 | ppl    45.796
----------------------------------------------------------------------------------------------------
| Eval 227 at step    90800 | time: 175.67s | valid loss  4.20 | valid ppl    66.987
----------------------------------------------------------------------------------------------------
| epoch 209 step    90850 |    162 batches | lr 6.06e-05 | ms/batch 550.74 | loss  3.84 | ppl    46.402
| epoch 209 step    90900 |    212 batches | lr 6.04e-05 | ms/batch 424.13 | loss  3.85 | ppl    47.171
| epoch 209 step    90950 |    262 batches | lr 6.03e-05 | ms/batch 425.60 | loss  3.86 | ppl    47.633
| epoch 209 step    91000 |    312 batches | lr 6.01e-05 | ms/batch 425.65 | loss  3.87 | ppl    48.166
| epoch 209 step    91050 |    362 batches | lr 5.99e-05 | ms/batch 424.17 | loss  3.79 | ppl    44.216
| epoch 209 step    91100 |    412 batches | lr 5.98e-05 | ms/batch 424.75 | loss  3.84 | ppl    46.542
| epoch 210 step    91150 |     26 batches | lr 5.96e-05 | ms/batch 416.02 | loss  3.88 | ppl    48.426
| epoch 210 step    91200 |     76 batches | lr 5.94e-05 | ms/batch 426.28 | loss  3.80 | ppl    44.582
----------------------------------------------------------------------------------------------------
| Eval 228 at step    91200 | time: 175.89s | valid loss  4.20 | valid ppl    66.365
----------------------------------------------------------------------------------------------------
| epoch 210 step    91250 |    126 batches | lr 5.93e-05 | ms/batch 581.67 | loss  3.87 | ppl    47.730
| epoch 210 step    91300 |    176 batches | lr 5.91e-05 | ms/batch 425.79 | loss  3.83 | ppl    46.119
| epoch 210 step    91350 |    226 batches | lr 5.89e-05 | ms/batch 426.03 | loss  3.86 | ppl    47.481
| epoch 210 step    91400 |    276 batches | lr 5.88e-05 | ms/batch 427.24 | loss  3.90 | ppl    49.494
| epoch 210 step    91450 |    326 batches | lr 5.86e-05 | ms/batch 426.10 | loss  3.79 | ppl    44.286
| epoch 210 step    91500 |    376 batches | lr 5.84e-05 | ms/batch 426.77 | loss  3.82 | ppl    45.774
| epoch 210 step    91550 |    426 batches | lr 5.83e-05 | ms/batch 424.15 | loss  3.85 | ppl    47.021
| epoch 211 step    91600 |     40 batches | lr 5.81e-05 | ms/batch 417.05 | loss  3.84 | ppl    46.748
----------------------------------------------------------------------------------------------------
| Eval 229 at step    91600 | time: 176.29s | valid loss  4.20 | valid ppl    66.775
----------------------------------------------------------------------------------------------------
| epoch 211 step    91650 |     90 batches | lr 5.8e-05 | ms/batch 552.71 | loss  3.78 | ppl    43.953
| epoch 211 step    91700 |    140 batches | lr 5.78e-05 | ms/batch 425.11 | loss  3.87 | ppl    47.739
| epoch 211 step    91750 |    190 batches | lr 5.76e-05 | ms/batch 423.40 | loss  3.84 | ppl    46.608
| epoch 211 step    91800 |    240 batches | lr 5.75e-05 | ms/batch 424.46 | loss  3.87 | ppl    47.775
| epoch 211 step    91850 |    290 batches | lr 5.73e-05 | ms/batch 424.54 | loss  3.88 | ppl    48.531
| epoch 211 step    91900 |    340 batches | lr 5.71e-05 | ms/batch 425.03 | loss  3.75 | ppl    42.656
| epoch 211 step    91950 |    390 batches | lr 5.7e-05 | ms/batch 424.35 | loss  3.84 | ppl    46.640
| epoch 212 step    92000 |      4 batches | lr 5.68e-05 | ms/batch 416.49 | loss  3.87 | ppl    48.032
----------------------------------------------------------------------------------------------------
| Eval 230 at step    92000 | time: 175.81s | valid loss  4.20 | valid ppl    66.722
----------------------------------------------------------------------------------------------------
| epoch 212 step    92050 |     54 batches | lr 5.67e-05 | ms/batch 551.11 | loss  3.80 | ppl    44.629
| epoch 212 step    92100 |    104 batches | lr 5.65e-05 | ms/batch 423.95 | loss  3.80 | ppl    44.895
| epoch 212 step    92150 |    154 batches | lr 5.63e-05 | ms/batch 423.46 | loss  3.85 | ppl    47.116
| epoch 212 step    92200 |    204 batches | lr 5.62e-05 | ms/batch 424.55 | loss  3.84 | ppl    46.467
| epoch 212 step    92250 |    254 batches | lr 5.6e-05 | ms/batch 424.12 | loss  3.86 | ppl    47.350
| epoch 212 step    92300 |    304 batches | lr 5.58e-05 | ms/batch 424.40 | loss  3.90 | ppl    49.484
| epoch 212 step    92350 |    354 batches | lr 5.57e-05 | ms/batch 424.61 | loss  3.78 | ppl    43.647
| epoch 212 step    92400 |    404 batches | lr 5.55e-05 | ms/batch 424.40 | loss  3.82 | ppl    45.692
----------------------------------------------------------------------------------------------------
| Eval 231 at step    92400 | time: 176.04s | valid loss  4.20 | valid ppl    66.521
----------------------------------------------------------------------------------------------------
| epoch 213 step    92450 |     18 batches | lr 5.54e-05 | ms/batch 543.83 | loss  3.85 | ppl    47.088
| epoch 213 step    92500 |     68 batches | lr 5.52e-05 | ms/batch 423.73 | loss  3.79 | ppl    44.278
| epoch 213 step    92550 |    118 batches | lr 5.5e-05 | ms/batch 423.94 | loss  3.84 | ppl    46.492
| epoch 213 step    92600 |    168 batches | lr 5.49e-05 | ms/batch 423.81 | loss  3.84 | ppl    46.560
| epoch 213 step    92650 |    218 batches | lr 5.47e-05 | ms/batch 427.00 | loss  3.84 | ppl    46.726
| epoch 213 step    92700 |    268 batches | lr 5.46e-05 | ms/batch 425.02 | loss  3.87 | ppl    47.825
| epoch 213 step    92750 |    318 batches | lr 5.44e-05 | ms/batch 425.00 | loss  3.84 | ppl    46.444
| epoch 213 step    92800 |    368 batches | lr 5.42e-05 | ms/batch 425.32 | loss  3.79 | ppl    44.263
----------------------------------------------------------------------------------------------------
| Eval 232 at step    92800 | time: 175.88s | valid loss  4.20 | valid ppl    66.735
----------------------------------------------------------------------------------------------------
| epoch 213 step    92850 |    418 batches | lr 5.41e-05 | ms/batch 550.27 | loss  3.85 | ppl    47.090
| epoch 214 step    92900 |     32 batches | lr 5.39e-05 | ms/batch 416.73 | loss  3.86 | ppl    47.298
| epoch 214 step    92950 |     82 batches | lr 5.38e-05 | ms/batch 424.83 | loss  3.77 | ppl    43.193
| epoch 214 step    93000 |    132 batches | lr 5.36e-05 | ms/batch 425.39 | loss  3.82 | ppl    45.792
| epoch 214 step    93050 |    182 batches | lr 5.35e-05 | ms/batch 425.62 | loss  3.81 | ppl    45.152
| epoch 214 step    93100 |    232 batches | lr 5.33e-05 | ms/batch 426.37 | loss  3.87 | ppl    47.755
| epoch 214 step    93150 |    282 batches | lr 5.31e-05 | ms/batch 424.64 | loss  3.87 | ppl    48.166
| epoch 214 step    93200 |    332 batches | lr 5.3e-05 | ms/batch 423.55 | loss  3.78 | ppl    43.976
----------------------------------------------------------------------------------------------------
| Eval 233 at step    93200 | time: 175.87s | valid loss  4.20 | valid ppl    67.016
----------------------------------------------------------------------------------------------------
| epoch 214 step    93250 |    382 batches | lr 5.28e-05 | ms/batch 553.57 | loss  3.84 | ppl    46.355
| epoch 214 step    93300 |    432 batches | lr 5.27e-05 | ms/batch 427.40 | loss  3.89 | ppl    49.020
| epoch 215 step    93350 |     46 batches | lr 5.25e-05 | ms/batch 417.17 | loss  3.81 | ppl    45.068
| epoch 215 step    93400 |     96 batches | lr 5.23e-05 | ms/batch 424.56 | loss  3.79 | ppl    44.077
| epoch 215 step    93450 |    146 batches | lr 5.22e-05 | ms/batch 423.40 | loss  3.84 | ppl    46.646
| epoch 215 step    93500 |    196 batches | lr 5.2e-05 | ms/batch 424.24 | loss  3.87 | ppl    47.755
| epoch 215 step    93550 |    246 batches | lr 5.19e-05 | ms/batch 424.40 | loss  3.88 | ppl    48.208
| epoch 215 step    93600 |    296 batches | lr 5.17e-05 | ms/batch 422.88 | loss  3.88 | ppl    48.490
----------------------------------------------------------------------------------------------------
| Eval 234 at step    93600 | time: 175.85s | valid loss  4.20 | valid ppl    66.706
----------------------------------------------------------------------------------------------------
| epoch 215 step    93650 |    346 batches | lr 5.16e-05 | ms/batch 548.96 | loss  3.76 | ppl    43.011
| epoch 215 step    93700 |    396 batches | lr 5.14e-05 | ms/batch 425.05 | loss  3.83 | ppl    45.898
| epoch 216 step    93750 |     10 batches | lr 5.13e-05 | ms/batch 415.82 | loss  3.87 | ppl    47.737
| epoch 216 step    93800 |     60 batches | lr 5.11e-05 | ms/batch 423.10 | loss  3.79 | ppl    44.410
| epoch 216 step    93850 |    110 batches | lr 5.09e-05 | ms/batch 422.46 | loss  3.80 | ppl    44.858
| epoch 216 step    93900 |    160 batches | lr 5.08e-05 | ms/batch 421.82 | loss  3.84 | ppl    46.691
| epoch 216 step    93950 |    210 batches | lr 5.06e-05 | ms/batch 422.88 | loss  3.81 | ppl    45.080
| epoch 216 step    94000 |    260 batches | lr 5.05e-05 | ms/batch 422.06 | loss  3.86 | ppl    47.661
----------------------------------------------------------------------------------------------------
| Eval 235 at step    94000 | time: 175.09s | valid loss  4.20 | valid ppl    66.911
----------------------------------------------------------------------------------------------------
| epoch 216 step    94050 |    310 batches | lr 5.03e-05 | ms/batch 547.33 | loss  3.84 | ppl    46.683
| epoch 216 step    94100 |    360 batches | lr 5.02e-05 | ms/batch 422.45 | loss  3.77 | ppl    43.578
| epoch 216 step    94150 |    410 batches | lr 5e-05 | ms/batch 422.86 | loss  3.84 | ppl    46.317
| epoch 217 step    94200 |     24 batches | lr 4.99e-05 | ms/batch 413.72 | loss  3.86 | ppl    47.695
| epoch 217 step    94250 |     74 batches | lr 4.97e-05 | ms/batch 420.99 | loss  3.78 | ppl    44.019
| epoch 217 step    94300 |    124 batches | lr 4.96e-05 | ms/batch 421.72 | loss  3.84 | ppl    46.557
| epoch 217 step    94350 |    174 batches | lr 4.94e-05 | ms/batch 423.62 | loss  3.84 | ppl    46.606
| epoch 217 step    94400 |    224 batches | lr 4.93e-05 | ms/batch 422.27 | loss  3.87 | ppl    47.900
----------------------------------------------------------------------------------------------------
| Eval 236 at step    94400 | time: 174.74s | valid loss  4.20 | valid ppl    66.364
----------------------------------------------------------------------------------------------------
| epoch 217 step    94450 |    274 batches | lr 4.91e-05 | ms/batch 574.89 | loss  3.87 | ppl    47.932
| epoch 217 step    94500 |    324 batches | lr 4.89e-05 | ms/batch 422.93 | loss  3.82 | ppl    45.773
| epoch 217 step    94550 |    374 batches | lr 4.88e-05 | ms/batch 422.91 | loss  3.79 | ppl    44.163
| epoch 217 step    94600 |    424 batches | lr 4.86e-05 | ms/batch 421.26 | loss  3.84 | ppl    46.620
| epoch 218 step    94650 |     38 batches | lr 4.85e-05 | ms/batch 414.60 | loss  3.85 | ppl    46.835
| epoch 218 step    94700 |     88 batches | lr 4.83e-05 | ms/batch 421.82 | loss  3.80 | ppl    44.759
| epoch 218 step    94750 |    138 batches | lr 4.82e-05 | ms/batch 422.21 | loss  3.83 | ppl    46.223
| epoch 218 step    94800 |    188 batches | lr 4.8e-05 | ms/batch 422.44 | loss  3.85 | ppl    46.769
----------------------------------------------------------------------------------------------------
| Eval 237 at step    94800 | time: 174.74s | valid loss  4.20 | valid ppl    66.382
----------------------------------------------------------------------------------------------------
| epoch 218 step    94850 |    238 batches | lr 4.79e-05 | ms/batch 546.71 | loss  3.86 | ppl    47.476
| epoch 218 step    94900 |    288 batches | lr 4.77e-05 | ms/batch 423.41 | loss  3.88 | ppl    48.452
| epoch 218 step    94950 |    338 batches | lr 4.76e-05 | ms/batch 424.36 | loss  3.75 | ppl    42.461
| epoch 218 step    95000 |    388 batches | lr 4.74e-05 | ms/batch 422.69 | loss  3.85 | ppl    47.052
| epoch 219 step    95050 |      2 batches | lr 4.73e-05 | ms/batch 415.44 | loss  3.88 | ppl    48.278
| epoch 219 step    95100 |     52 batches | lr 4.71e-05 | ms/batch 425.33 | loss  3.79 | ppl    44.444
| epoch 219 step    95150 |    102 batches | lr 4.7e-05 | ms/batch 423.42 | loss  3.80 | ppl    44.591
| epoch 219 step    95200 |    152 batches | lr 4.68e-05 | ms/batch 421.79 | loss  3.84 | ppl    46.473
----------------------------------------------------------------------------------------------------
| Eval 238 at step    95200 | time: 175.23s | valid loss  4.20 | valid ppl    66.608
----------------------------------------------------------------------------------------------------
| epoch 219 step    95250 |    202 batches | lr 4.67e-05 | ms/batch 550.91 | loss  3.84 | ppl    46.738
| epoch 219 step    95300 |    252 batches | lr 4.65e-05 | ms/batch 421.43 | loss  3.88 | ppl    48.214
| epoch 219 step    95350 |    302 batches | lr 4.64e-05 | ms/batch 421.53 | loss  3.89 | ppl    48.795
| epoch 219 step    95400 |    352 batches | lr 4.62e-05 | ms/batch 422.43 | loss  3.77 | ppl    43.530
| epoch 219 step    95450 |    402 batches | lr 4.61e-05 | ms/batch 421.17 | loss  3.85 | ppl    47.040
| epoch 220 step    95500 |     16 batches | lr 4.59e-05 | ms/batch 414.30 | loss  3.84 | ppl    46.426
| epoch 220 step    95550 |     66 batches | lr 4.58e-05 | ms/batch 422.87 | loss  3.78 | ppl    43.691
| epoch 220 step    95600 |    116 batches | lr 4.56e-05 | ms/batch 423.59 | loss  3.82 | ppl    45.545
----------------------------------------------------------------------------------------------------
| Eval 239 at step    95600 | time: 174.87s | valid loss  4.20 | valid ppl    66.982
----------------------------------------------------------------------------------------------------
| epoch 220 step    95650 |    166 batches | lr 4.55e-05 | ms/batch 548.85 | loss  3.81 | ppl    45.248
| epoch 220 step    95700 |    216 batches | lr 4.53e-05 | ms/batch 424.69 | loss  3.83 | ppl    45.893
| epoch 220 step    95750 |    266 batches | lr 4.52e-05 | ms/batch 424.74 | loss  3.86 | ppl    47.470
| epoch 220 step    95800 |    316 batches | lr 4.5e-05 | ms/batch 426.90 | loss  3.84 | ppl    46.511
| epoch 220 step    95850 |    366 batches | lr 4.49e-05 | ms/batch 424.89 | loss  3.79 | ppl    44.143
| epoch 220 step    95900 |    416 batches | lr 4.48e-05 | ms/batch 423.65 | loss  3.84 | ppl    46.333
| epoch 221 step    95950 |     30 batches | lr 4.46e-05 | ms/batch 415.47 | loss  3.83 | ppl    46.004
| epoch 221 step    96000 |     80 batches | lr 4.45e-05 | ms/batch 423.08 | loss  3.76 | ppl    43.018
----------------------------------------------------------------------------------------------------
| Eval 240 at step    96000 | time: 175.59s | valid loss  4.20 | valid ppl    67.003
----------------------------------------------------------------------------------------------------
| epoch 221 step    96050 |    130 batches | lr 4.43e-05 | ms/batch 549.25 | loss  3.81 | ppl    45.274
| epoch 221 step    96100 |    180 batches | lr 4.42e-05 | ms/batch 422.61 | loss  3.83 | ppl    46.175
| epoch 221 step    96150 |    230 batches | lr 4.4e-05 | ms/batch 422.40 | loss  3.85 | ppl    46.896
| epoch 221 step    96200 |    280 batches | lr 4.39e-05 | ms/batch 422.72 | loss  3.86 | ppl    47.601
| epoch 221 step    96250 |    330 batches | lr 4.37e-05 | ms/batch 421.70 | loss  3.78 | ppl    43.613
| epoch 221 step    96300 |    380 batches | lr 4.36e-05 | ms/batch 422.25 | loss  3.80 | ppl    44.496
| epoch 221 step    96350 |    430 batches | lr 4.34e-05 | ms/batch 420.12 | loss  3.85 | ppl    47.196
| epoch 222 step    96400 |     44 batches | lr 4.33e-05 | ms/batch 412.13 | loss  3.80 | ppl    44.718
----------------------------------------------------------------------------------------------------
| Eval 241 at step    96400 | time: 174.66s | valid loss  4.20 | valid ppl    66.655
----------------------------------------------------------------------------------------------------
| epoch 222 step    96450 |     94 batches | lr 4.32e-05 | ms/batch 545.60 | loss  3.75 | ppl    42.680
| epoch 222 step    96500 |    144 batches | lr 4.3e-05 | ms/batch 421.39 | loss  3.81 | ppl    45.216
| epoch 222 step    96550 |    194 batches | lr 4.29e-05 | ms/batch 419.91 | loss  3.85 | ppl    46.822
| epoch 222 step    96600 |    244 batches | lr 4.27e-05 | ms/batch 422.99 | loss  3.85 | ppl    47.062
| epoch 222 step    96650 |    294 batches | lr 4.26e-05 | ms/batch 421.92 | loss  3.88 | ppl    48.330
| epoch 222 step    96700 |    344 batches | lr 4.24e-05 | ms/batch 422.82 | loss  3.77 | ppl    43.538
| epoch 222 step    96750 |    394 batches | lr 4.23e-05 | ms/batch 420.68 | loss  3.83 | ppl    46.079
| epoch 223 step    96800 |      8 batches | lr 4.21e-05 | ms/batch 412.85 | loss  3.85 | ppl    47.052
----------------------------------------------------------------------------------------------------
| Eval 242 at step    96800 | time: 174.43s | valid loss  4.19 | valid ppl    66.294
----------------------------------------------------------------------------------------------------
| epoch 223 step    96850 |     58 batches | lr 4.2e-05 | ms/batch 573.27 | loss  3.78 | ppl    43.931
| epoch 223 step    96900 |    108 batches | lr 4.19e-05 | ms/batch 419.57 | loss  3.79 | ppl    44.045
| epoch 223 step    96950 |    158 batches | lr 4.17e-05 | ms/batch 423.76 | loss  3.84 | ppl    46.394
| epoch 223 step    97000 |    208 batches | lr 4.16e-05 | ms/batch 421.85 | loss  3.80 | ppl    44.872
| epoch 223 step    97050 |    258 batches | lr 4.14e-05 | ms/batch 420.57 | loss  3.86 | ppl    47.656
| epoch 223 step    97100 |    308 batches | lr 4.13e-05 | ms/batch 421.70 | loss  3.84 | ppl    46.386
| epoch 223 step    97150 |    358 batches | lr 4.11e-05 | ms/batch 422.05 | loss  3.79 | ppl    44.311
| epoch 223 step    97200 |    408 batches | lr 4.1e-05 | ms/batch 423.26 | loss  3.80 | ppl    44.868
----------------------------------------------------------------------------------------------------
| Eval 243 at step    97200 | time: 174.90s | valid loss  4.19 | valid ppl    66.220
----------------------------------------------------------------------------------------------------
| epoch 224 step    97250 |     22 batches | lr 4.09e-05 | ms/batch 563.88 | loss  3.85 | ppl    47.093
| epoch 224 step    97300 |     72 batches | lr 4.07e-05 | ms/batch 420.48 | loss  3.75 | ppl    42.551
| epoch 224 step    97350 |    122 batches | lr 4.06e-05 | ms/batch 419.87 | loss  3.80 | ppl    44.504
| epoch 224 step    97400 |    172 batches | lr 4.04e-05 | ms/batch 422.31 | loss  3.82 | ppl    45.418
| epoch 224 step    97450 |    222 batches | lr 4.03e-05 | ms/batch 422.65 | loss  3.84 | ppl    46.713
| epoch 224 step    97500 |    272 batches | lr 4.02e-05 | ms/batch 423.71 | loss  3.83 | ppl    46.026
| epoch 224 step    97550 |    322 batches | lr 4e-05 | ms/batch 423.83 | loss  3.81 | ppl    45.021
| epoch 224 step    97600 |    372 batches | lr 3.99e-05 | ms/batch 422.97 | loss  3.78 | ppl    43.709
----------------------------------------------------------------------------------------------------
| Eval 244 at step    97600 | time: 174.61s | valid loss  4.19 | valid ppl    66.215
----------------------------------------------------------------------------------------------------
| epoch 224 step    97650 |    422 batches | lr 3.97e-05 | ms/batch 576.06 | loss  3.79 | ppl    44.247
| epoch 225 step    97700 |     36 batches | lr 3.96e-05 | ms/batch 416.68 | loss  3.84 | ppl    46.379
| epoch 225 step    97750 |     86 batches | lr 3.95e-05 | ms/batch 422.91 | loss  3.77 | ppl    43.374
| epoch 225 step    97800 |    136 batches | lr 3.93e-05 | ms/batch 424.57 | loss  3.82 | ppl    45.495
| epoch 225 step    97850 |    186 batches | lr 3.92e-05 | ms/batch 424.16 | loss  3.82 | ppl    45.423
| epoch 225 step    97900 |    236 batches | lr 3.9e-05 | ms/batch 423.36 | loss  3.81 | ppl    45.164
| epoch 225 step    97950 |    286 batches | lr 3.89e-05 | ms/batch 420.83 | loss  3.87 | ppl    47.966
| epoch 225 step    98000 |    336 batches | lr 3.88e-05 | ms/batch 422.82 | loss  3.77 | ppl    43.197
----------------------------------------------------------------------------------------------------
| Eval 245 at step    98000 | time: 175.25s | valid loss  4.20 | valid ppl    66.444
----------------------------------------------------------------------------------------------------
| epoch 225 step    98050 |    386 batches | lr 3.86e-05 | ms/batch 548.37 | loss  3.82 | ppl    45.793
| epoch 225 step    98100 |    436 batches | lr 3.85e-05 | ms/batch 417.31 | loss  3.84 | ppl    46.590
| epoch 226 step    98150 |     50 batches | lr 3.84e-05 | ms/batch 421.14 | loss  3.77 | ppl    43.540
| epoch 226 step    98200 |    100 batches | lr 3.82e-05 | ms/batch 420.46 | loss  3.79 | ppl    44.365
| epoch 226 step    98250 |    150 batches | lr 3.81e-05 | ms/batch 421.62 | loss  3.82 | ppl    45.779
| epoch 226 step    98300 |    200 batches | lr 3.79e-05 | ms/batch 421.89 | loss  3.82 | ppl    45.694
| epoch 226 step    98350 |    250 batches | lr 3.78e-05 | ms/batch 419.67 | loss  3.84 | ppl    46.414
| epoch 226 step    98400 |    300 batches | lr 3.77e-05 | ms/batch 418.57 | loss  3.86 | ppl    47.679
----------------------------------------------------------------------------------------------------
| Eval 246 at step    98400 | time: 174.45s | valid loss  4.20 | valid ppl    66.445
----------------------------------------------------------------------------------------------------
| epoch 226 step    98450 |    350 batches | lr 3.75e-05 | ms/batch 544.39 | loss  3.73 | ppl    41.806
| epoch 226 step    98500 |    400 batches | lr 3.74e-05 | ms/batch 419.30 | loss  3.83 | ppl    46.208
| epoch 227 step    98550 |     14 batches | lr 3.73e-05 | ms/batch 411.25 | loss  3.85 | ppl    47.059
| epoch 227 step    98600 |     64 batches | lr 3.71e-05 | ms/batch 421.75 | loss  3.74 | ppl    42.304
| epoch 227 step    98650 |    114 batches | lr 3.7e-05 | ms/batch 420.12 | loss  3.78 | ppl    43.745
| epoch 227 step    98700 |    164 batches | lr 3.69e-05 | ms/batch 419.14 | loss  3.84 | ppl    46.297
| epoch 227 step    98750 |    214 batches | lr 3.67e-05 | ms/batch 419.64 | loss  3.83 | ppl    45.990
| epoch 227 step    98800 |    264 batches | lr 3.66e-05 | ms/batch 419.24 | loss  3.85 | ppl    46.960
----------------------------------------------------------------------------------------------------
| Eval 247 at step    98800 | time: 173.71s | valid loss  4.20 | valid ppl    66.856
----------------------------------------------------------------------------------------------------
| epoch 227 step    98850 |    314 batches | lr 3.65e-05 | ms/batch 546.91 | loss  3.80 | ppl    44.848
| epoch 227 step    98900 |    364 batches | lr 3.63e-05 | ms/batch 418.88 | loss  3.76 | ppl    42.806
| epoch 227 step    98950 |    414 batches | lr 3.62e-05 | ms/batch 420.16 | loss  3.78 | ppl    44.033
| epoch 228 step    99000 |     28 batches | lr 3.61e-05 | ms/batch 415.77 | loss  3.82 | ppl    45.791
| epoch 228 step    99050 |     78 batches | lr 3.59e-05 | ms/batch 422.06 | loss  3.78 | ppl    43.695
| epoch 228 step    99100 |    128 batches | lr 3.58e-05 | ms/batch 420.34 | loss  3.83 | ppl    46.074
| epoch 228 step    99150 |    178 batches | lr 3.57e-05 | ms/batch 421.51 | loss  3.80 | ppl    44.743
| epoch 228 step    99200 |    228 batches | lr 3.55e-05 | ms/batch 421.24 | loss  3.83 | ppl    46.134
----------------------------------------------------------------------------------------------------
| Eval 248 at step    99200 | time: 174.37s | valid loss  4.20 | valid ppl    66.612
----------------------------------------------------------------------------------------------------
| epoch 228 step    99250 |    278 batches | lr 3.54e-05 | ms/batch 545.48 | loss  3.87 | ppl    48.122
| epoch 228 step    99300 |    328 batches | lr 3.53e-05 | ms/batch 422.50 | loss  3.77 | ppl    43.279
| epoch 228 step    99350 |    378 batches | lr 3.51e-05 | ms/batch 419.90 | loss  3.80 | ppl    44.618
| epoch 228 step    99400 |    428 batches | lr 3.5e-05 | ms/batch 420.76 | loss  3.84 | ppl    46.679
| epoch 229 step    99450 |     42 batches | lr 3.49e-05 | ms/batch 412.28 | loss  3.79 | ppl    44.065
| epoch 229 step    99500 |     92 batches | lr 3.47e-05 | ms/batch 421.14 | loss  3.77 | ppl    43.417
| epoch 229 step    99550 |    142 batches | lr 3.46e-05 | ms/batch 421.60 | loss  3.81 | ppl    45.288
| epoch 229 step    99600 |    192 batches | lr 3.45e-05 | ms/batch 419.50 | loss  3.83 | ppl    46.017
----------------------------------------------------------------------------------------------------
| Eval 249 at step    99600 | time: 174.18s | valid loss  4.20 | valid ppl    66.572
----------------------------------------------------------------------------------------------------
| epoch 229 step    99650 |    242 batches | lr 3.43e-05 | ms/batch 548.22 | loss  3.83 | ppl    46.186
| epoch 229 step    99700 |    292 batches | lr 3.42e-05 | ms/batch 421.24 | loss  3.87 | ppl    47.775
| epoch 229 step    99750 |    342 batches | lr 3.41e-05 | ms/batch 420.08 | loss  3.73 | ppl    41.810
| epoch 229 step    99800 |    392 batches | lr 3.39e-05 | ms/batch 420.42 | loss  3.83 | ppl    45.857
| epoch 230 step    99850 |      6 batches | lr 3.38e-05 | ms/batch 410.85 | loss  3.85 | ppl    47.028
| epoch 230 step    99900 |     56 batches | lr 3.37e-05 | ms/batch 419.68 | loss  3.77 | ppl    43.333
| epoch 230 step    99950 |    106 batches | lr 3.36e-05 | ms/batch 420.94 | loss  3.78 | ppl    43.810
| epoch 230 step   100000 |    156 batches | lr 3.34e-05 | ms/batch 420.39 | loss  3.82 | ppl    45.557
----------------------------------------------------------------------------------------------------
| Eval 250 at step   100000 | time: 174.08s | valid loss  4.20 | valid ppl    66.560
----------------------------------------------------------------------------------------------------
| epoch 230 step   100050 |    206 batches | lr 3.33e-05 | ms/batch 544.80 | loss  3.81 | ppl    45.294
| epoch 230 step   100100 |    256 batches | lr 3.32e-05 | ms/batch 421.34 | loss  3.84 | ppl    46.430
| epoch 230 step   100150 |    306 batches | lr 3.3e-05 | ms/batch 421.47 | loss  3.85 | ppl    46.898
| epoch 230 step   100200 |    356 batches | lr 3.29e-05 | ms/batch 420.45 | loss  3.76 | ppl    42.808
| epoch 230 step   100250 |    406 batches | lr 3.28e-05 | ms/batch 422.72 | loss  3.81 | ppl    45.067
| epoch 231 step   100300 |     20 batches | lr 3.27e-05 | ms/batch 415.42 | loss  3.83 | ppl    46.234
| epoch 231 step   100350 |     70 batches | lr 3.25e-05 | ms/batch 420.81 | loss  3.75 | ppl    42.316
| epoch 231 step   100400 |    120 batches | lr 3.24e-05 | ms/batch 421.10 | loss  3.82 | ppl    45.817
----------------------------------------------------------------------------------------------------
| Eval 251 at step   100400 | time: 174.44s | valid loss  4.20 | valid ppl    66.643
----------------------------------------------------------------------------------------------------
| epoch 231 step   100450 |    170 batches | lr 3.23e-05 | ms/batch 546.90 | loss  3.81 | ppl    44.930
| epoch 231 step   100500 |    220 batches | lr 3.21e-05 | ms/batch 419.34 | loss  3.84 | ppl    46.530
| epoch 231 step   100550 |    270 batches | lr 3.2e-05 | ms/batch 419.61 | loss  3.83 | ppl    46.138
| epoch 231 step   100600 |    320 batches | lr 3.19e-05 | ms/batch 421.83 | loss  3.79 | ppl    44.315
| epoch 231 step   100650 |    370 batches | lr 3.18e-05 | ms/batch 421.10 | loss  3.79 | ppl    44.184
| epoch 231 step   100700 |    420 batches | lr 3.16e-05 | ms/batch 421.25 | loss  3.79 | ppl    44.378
| epoch 232 step   100750 |     34 batches | lr 3.15e-05 | ms/batch 411.18 | loss  3.84 | ppl    46.440
| epoch 232 step   100800 |     84 batches | lr 3.14e-05 | ms/batch 422.73 | loss  3.75 | ppl    42.335
----------------------------------------------------------------------------------------------------
| Eval 252 at step   100800 | time: 174.21s | valid loss  4.20 | valid ppl    66.380
----------------------------------------------------------------------------------------------------
| epoch 232 step   100850 |    134 batches | lr 3.13e-05 | ms/batch 545.16 | loss  3.80 | ppl    44.892
| epoch 232 step   100900 |    184 batches | lr 3.11e-05 | ms/batch 421.85 | loss  3.79 | ppl    44.183
| epoch 232 step   100950 |    234 batches | lr 3.1e-05 | ms/batch 420.14 | loss  3.83 | ppl    46.281
| epoch 232 step   101000 |    284 batches | lr 3.09e-05 | ms/batch 420.52 | loss  3.85 | ppl    46.895
| epoch 232 step   101050 |    334 batches | lr 3.08e-05 | ms/batch 420.33 | loss  3.75 | ppl    42.627
| epoch 232 step   101100 |    384 batches | lr 3.06e-05 | ms/batch 420.57 | loss  3.81 | ppl    45.003
| epoch 232 step   101150 |    434 batches | lr 3.05e-05 | ms/batch 420.76 | loss  3.85 | ppl    47.036
| epoch 233 step   101200 |     48 batches | lr 3.04e-05 | ms/batch 414.44 | loss  3.80 | ppl    44.484
----------------------------------------------------------------------------------------------------
| Eval 253 at step   101200 | time: 174.20s | valid loss  4.20 | valid ppl    66.809
----------------------------------------------------------------------------------------------------
| epoch 233 step   101250 |     98 batches | lr 3.03e-05 | ms/batch 547.06 | loss  3.74 | ppl    42.243
| epoch 233 step   101300 |    148 batches | lr 3.01e-05 | ms/batch 421.71 | loss  3.82 | ppl    45.427
| epoch 233 step   101350 |    198 batches | lr 3e-05 | ms/batch 421.84 | loss  3.82 | ppl    45.654
| epoch 233 step   101400 |    248 batches | lr 2.99e-05 | ms/batch 422.23 | loss  3.84 | ppl    46.342
| epoch 233 step   101450 |    298 batches | lr 2.98e-05 | ms/batch 423.04 | loss  3.87 | ppl    47.979
| epoch 233 step   101500 |    348 batches | lr 2.96e-05 | ms/batch 421.95 | loss  3.71 | ppl    40.984
| epoch 233 step   101550 |    398 batches | lr 2.95e-05 | ms/batch 421.77 | loss  3.78 | ppl    43.819
| epoch 234 step   101600 |     12 batches | lr 2.94e-05 | ms/batch 411.27 | loss  3.84 | ppl    46.658
----------------------------------------------------------------------------------------------------
| Eval 254 at step   101600 | time: 174.53s | valid loss  4.20 | valid ppl    66.393
----------------------------------------------------------------------------------------------------
| epoch 234 step   101650 |     62 batches | lr 2.93e-05 | ms/batch 544.15 | loss  3.75 | ppl    42.601
| epoch 234 step   101700 |    112 batches | lr 2.92e-05 | ms/batch 421.88 | loss  3.80 | ppl    44.826
| epoch 234 step   101750 |    162 batches | lr 2.9e-05 | ms/batch 419.66 | loss  3.81 | ppl    45.279
| epoch 234 step   101800 |    212 batches | lr 2.89e-05 | ms/batch 419.10 | loss  3.81 | ppl    45.253
| epoch 234 step   101850 |    262 batches | lr 2.88e-05 | ms/batch 420.24 | loss  3.84 | ppl    46.349
| epoch 234 step   101900 |    312 batches | lr 2.87e-05 | ms/batch 418.98 | loss  3.82 | ppl    45.527
| epoch 234 step   101950 |    362 batches | lr 2.86e-05 | ms/batch 421.54 | loss  3.76 | ppl    42.932
| epoch 234 step   102000 |    412 batches | lr 2.84e-05 | ms/batch 420.13 | loss  3.80 | ppl    44.857
----------------------------------------------------------------------------------------------------
| Eval 255 at step   102000 | time: 174.28s | valid loss  4.19 | valid ppl    65.980
----------------------------------------------------------------------------------------------------
| epoch 235 step   102050 |     26 batches | lr 2.83e-05 | ms/batch 568.74 | loss  3.84 | ppl    46.588
| epoch 235 step   102100 |     76 batches | lr 2.82e-05 | ms/batch 420.75 | loss  3.74 | ppl    42.148
| epoch 235 step   102150 |    126 batches | lr 2.81e-05 | ms/batch 423.10 | loss  3.79 | ppl    44.351
| epoch 235 step   102200 |    176 batches | lr 2.8e-05 | ms/batch 421.77 | loss  3.80 | ppl    44.919
| epoch 235 step   102250 |    226 batches | lr 2.78e-05 | ms/batch 423.75 | loss  3.83 | ppl    46.132
| epoch 235 step   102300 |    276 batches | lr 2.77e-05 | ms/batch 423.12 | loss  3.82 | ppl    45.710
| epoch 235 step   102350 |    326 batches | lr 2.76e-05 | ms/batch 422.34 | loss  3.78 | ppl    43.951
| epoch 235 step   102400 |    376 batches | lr 2.75e-05 | ms/batch 421.83 | loss  3.78 | ppl    43.874
----------------------------------------------------------------------------------------------------
| Eval 256 at step   102400 | time: 174.89s | valid loss  4.19 | valid ppl    65.979
----------------------------------------------------------------------------------------------------
| epoch 235 step   102450 |    426 batches | lr 2.74e-05 | ms/batch 575.08 | loss  3.82 | ppl    45.391
| epoch 236 step   102500 |     40 batches | lr 2.72e-05 | ms/batch 411.15 | loss  3.77 | ppl    43.538
| epoch 236 step   102550 |     90 batches | lr 2.71e-05 | ms/batch 421.47 | loss  3.75 | ppl    42.369
| epoch 236 step   102600 |    140 batches | lr 2.7e-05 | ms/batch 421.55 | loss  3.81 | ppl    44.950
| epoch 236 step   102650 |    190 batches | lr 2.69e-05 | ms/batch 420.91 | loss  3.82 | ppl    45.468
| epoch 236 step   102700 |    240 batches | lr 2.68e-05 | ms/batch 421.47 | loss  3.83 | ppl    45.990
| epoch 236 step   102750 |    290 batches | lr 2.67e-05 | ms/batch 422.43 | loss  3.87 | ppl    47.815
| epoch 236 step   102800 |    340 batches | lr 2.65e-05 | ms/batch 420.78 | loss  3.72 | ppl    41.309
----------------------------------------------------------------------------------------------------
| Eval 257 at step   102800 | time: 174.32s | valid loss  4.20 | valid ppl    66.362
----------------------------------------------------------------------------------------------------
| epoch 236 step   102850 |    390 batches | lr 2.64e-05 | ms/batch 547.60 | loss  3.80 | ppl    44.845
| epoch 237 step   102900 |      4 batches | lr 2.63e-05 | ms/batch 412.98 | loss  3.84 | ppl    46.391
| epoch 237 step   102950 |     54 batches | lr 2.62e-05 | ms/batch 420.85 | loss  3.78 | ppl    43.946
| epoch 237 step   103000 |    104 batches | lr 2.61e-05 | ms/batch 421.59 | loss  3.74 | ppl    42.067
| epoch 237 step   103050 |    154 batches | lr 2.6e-05 | ms/batch 421.28 | loss  3.82 | ppl    45.386
| epoch 237 step   103100 |    204 batches | lr 2.58e-05 | ms/batch 420.49 | loss  3.82 | ppl    45.530
| epoch 237 step   103150 |    254 batches | lr 2.57e-05 | ms/batch 422.15 | loss  3.83 | ppl    45.871
| epoch 237 step   103200 |    304 batches | lr 2.56e-05 | ms/batch 422.18 | loss  3.83 | ppl    46.283
----------------------------------------------------------------------------------------------------
| Eval 258 at step   103200 | time: 174.47s | valid loss  4.19 | valid ppl    66.272
----------------------------------------------------------------------------------------------------
| epoch 237 step   103250 |    354 batches | lr 2.55e-05 | ms/batch 545.56 | loss  3.74 | ppl    42.053
| epoch 237 step   103300 |    404 batches | lr 2.54e-05 | ms/batch 422.22 | loss  3.80 | ppl    44.682
| epoch 238 step   103350 |     18 batches | lr 2.53e-05 | ms/batch 412.29 | loss  3.83 | ppl    46.197
| epoch 238 step   103400 |     68 batches | lr 2.52e-05 | ms/batch 419.59 | loss  3.76 | ppl    42.819
| epoch 238 step   103450 |    118 batches | lr 2.5e-05 | ms/batch 420.13 | loss  3.78 | ppl    43.945
| epoch 238 step   103500 |    168 batches | lr 2.49e-05 | ms/batch 420.29 | loss  3.79 | ppl    44.373
| epoch 238 step   103550 |    218 batches | lr 2.48e-05 | ms/batch 421.52 | loss  3.82 | ppl    45.427
| epoch 238 step   103600 |    268 batches | lr 2.47e-05 | ms/batch 421.06 | loss  3.83 | ppl    45.983
----------------------------------------------------------------------------------------------------
| Eval 259 at step   103600 | time: 174.15s | valid loss  4.20 | valid ppl    66.605
----------------------------------------------------------------------------------------------------
| epoch 238 step   103650 |    318 batches | lr 2.46e-05 | ms/batch 547.59 | loss  3.81 | ppl    45.356
| epoch 238 step   103700 |    368 batches | lr 2.45e-05 | ms/batch 422.75 | loss  3.76 | ppl    43.020
| epoch 238 step   103750 |    418 batches | lr 2.44e-05 | ms/batch 421.18 | loss  3.79 | ppl    44.364
| epoch 239 step   103800 |     32 batches | lr 2.43e-05 | ms/batch 410.95 | loss  3.83 | ppl    46.153
| epoch 239 step   103850 |     82 batches | lr 2.41e-05 | ms/batch 421.84 | loss  3.72 | ppl    41.267
| epoch 239 step   103900 |    132 batches | lr 2.4e-05 | ms/batch 419.66 | loss  3.79 | ppl    44.339
| epoch 239 step   103950 |    182 batches | lr 2.39e-05 | ms/batch 421.04 | loss  3.78 | ppl    43.887
| epoch 239 step   104000 |    232 batches | lr 2.38e-05 | ms/batch 420.55 | loss  3.83 | ppl    45.884
----------------------------------------------------------------------------------------------------
| Eval 260 at step   104000 | time: 174.25s | valid loss  4.19 | valid ppl    66.156
----------------------------------------------------------------------------------------------------
| epoch 239 step   104050 |    282 batches | lr 2.37e-05 | ms/batch 544.73 | loss  3.84 | ppl    46.338
| epoch 239 step   104100 |    332 batches | lr 2.36e-05 | ms/batch 421.21 | loss  3.74 | ppl    42.062
| epoch 239 step   104150 |    382 batches | lr 2.35e-05 | ms/batch 420.84 | loss  3.79 | ppl    44.360
| epoch 239 step   104200 |    432 batches | lr 2.34e-05 | ms/batch 421.44 | loss  3.82 | ppl    45.711
| epoch 240 step   104250 |     46 batches | lr 2.33e-05 | ms/batch 414.44 | loss  3.78 | ppl    43.948
| epoch 240 step   104300 |     96 batches | lr 2.32e-05 | ms/batch 421.09 | loss  3.75 | ppl    42.685
| epoch 240 step   104350 |    146 batches | lr 2.3e-05 | ms/batch 419.75 | loss  3.80 | ppl    44.481
| epoch 240 step   104400 |    196 batches | lr 2.29e-05 | ms/batch 421.65 | loss  3.84 | ppl    46.445
----------------------------------------------------------------------------------------------------
| Eval 261 at step   104400 | time: 174.25s | valid loss  4.20 | valid ppl    66.379
----------------------------------------------------------------------------------------------------
| epoch 240 step   104450 |    246 batches | lr 2.28e-05 | ms/batch 546.10 | loss  3.81 | ppl    45.187
| epoch 240 step   104500 |    296 batches | lr 2.27e-05 | ms/batch 420.49 | loss  3.86 | ppl    47.294
| epoch 240 step   104550 |    346 batches | lr 2.26e-05 | ms/batch 423.23 | loss  3.71 | ppl    41.050
| epoch 240 step   104600 |    396 batches | lr 2.25e-05 | ms/batch 419.32 | loss  3.79 | ppl    44.360
| epoch 241 step   104650 |     10 batches | lr 2.24e-05 | ms/batch 411.86 | loss  3.84 | ppl    46.580
| epoch 241 step   104700 |     60 batches | lr 2.23e-05 | ms/batch 420.09 | loss  3.77 | ppl    43.353
| epoch 241 step   104750 |    110 batches | lr 2.22e-05 | ms/batch 421.91 | loss  3.78 | ppl    43.897
| epoch 241 step   104800 |    160 batches | lr 2.21e-05 | ms/batch 420.32 | loss  3.79 | ppl    44.340
----------------------------------------------------------------------------------------------------
| Eval 262 at step   104800 | time: 174.19s | valid loss  4.19 | valid ppl    66.345
----------------------------------------------------------------------------------------------------
| epoch 241 step   104850 |    210 batches | lr 2.2e-05 | ms/batch 548.63 | loss  3.79 | ppl    44.326
| epoch 241 step   104900 |    260 batches | lr 2.19e-05 | ms/batch 420.21 | loss  3.82 | ppl    45.458
| epoch 241 step   104950 |    310 batches | lr 2.18e-05 | ms/batch 420.57 | loss  3.81 | ppl    45.069
| epoch 241 step   105000 |    360 batches | lr 2.16e-05 | ms/batch 420.20 | loss  3.77 | ppl    43.428
| epoch 241 step   105050 |    410 batches | lr 2.15e-05 | ms/batch 421.74 | loss  3.79 | ppl    44.153
| epoch 242 step   105100 |     24 batches | lr 2.14e-05 | ms/batch 413.24 | loss  3.84 | ppl    46.427
| epoch 242 step   105150 |     74 batches | lr 2.13e-05 | ms/batch 423.30 | loss  3.75 | ppl    42.416
| epoch 242 step   105200 |    124 batches | lr 2.12e-05 | ms/batch 422.00 | loss  3.81 | ppl    45.310
----------------------------------------------------------------------------------------------------
| Eval 263 at step   105200 | time: 174.52s | valid loss  4.20 | valid ppl    66.452
----------------------------------------------------------------------------------------------------
| epoch 242 step   105250 |    174 batches | lr 2.11e-05 | ms/batch 547.44 | loss  3.81 | ppl    45.169
| epoch 242 step   105300 |    224 batches | lr 2.1e-05 | ms/batch 421.55 | loss  3.82 | ppl    45.448
| epoch 242 step   105350 |    274 batches | lr 2.09e-05 | ms/batch 425.16 | loss  3.84 | ppl    46.364
| epoch 242 step   105400 |    324 batches | lr 2.08e-05 | ms/batch 420.59 | loss  3.78 | ppl    43.802
| epoch 242 step   105450 |    374 batches | lr 2.07e-05 | ms/batch 420.09 | loss  3.78 | ppl    43.821
| epoch 242 step   105500 |    424 batches | lr 2.06e-05 | ms/batch 418.92 | loss  3.79 | ppl    44.425
| epoch 243 step   105550 |     38 batches | lr 2.05e-05 | ms/batch 410.23 | loss  3.82 | ppl    45.491
| epoch 243 step   105600 |     88 batches | lr 2.04e-05 | ms/batch 420.06 | loss  3.74 | ppl    42.286
----------------------------------------------------------------------------------------------------
| Eval 264 at step   105600 | time: 174.17s | valid loss  4.19 | valid ppl    66.145
----------------------------------------------------------------------------------------------------
| epoch 243 step   105650 |    138 batches | lr 2.03e-05 | ms/batch 547.03 | loss  3.80 | ppl    44.904
| epoch 243 step   105700 |    188 batches | lr 2.02e-05 | ms/batch 422.53 | loss  3.80 | ppl    44.785
| epoch 243 step   105750 |    238 batches | lr 2.01e-05 | ms/batch 421.37 | loss  3.81 | ppl    45.184
| epoch 243 step   105800 |    288 batches | lr 2e-05 | ms/batch 421.04 | loss  3.84 | ppl    46.431
| epoch 243 step   105850 |    338 batches | lr 1.99e-05 | ms/batch 421.33 | loss  3.71 | ppl    40.660
| epoch 243 step   105900 |    388 batches | lr 1.98e-05 | ms/batch 420.40 | loss  3.78 | ppl    43.811
| epoch 244 step   105950 |      2 batches | lr 1.97e-05 | ms/batch 414.42 | loss  3.82 | ppl    45.804
| epoch 244 step   106000 |     52 batches | lr 1.96e-05 | ms/batch 419.94 | loss  3.75 | ppl    42.471
----------------------------------------------------------------------------------------------------
| Eval 265 at step   106000 | time: 174.40s | valid loss  4.19 | valid ppl    66.331
----------------------------------------------------------------------------------------------------
| epoch 244 step   106050 |    102 batches | lr 1.95e-05 | ms/batch 544.36 | loss  3.75 | ppl    42.710
| epoch 244 step   106100 |    152 batches | lr 1.94e-05 | ms/batch 421.93 | loss  3.82 | ppl    45.436
| epoch 244 step   106150 |    202 batches | lr 1.93e-05 | ms/batch 422.94 | loss  3.82 | ppl    45.450
| epoch 244 step   106200 |    252 batches | lr 1.92e-05 | ms/batch 421.19 | loss  3.83 | ppl    45.921
| epoch 244 step   106250 |    302 batches | lr 1.91e-05 | ms/batch 423.40 | loss  3.84 | ppl    46.302
| epoch 244 step   106300 |    352 batches | lr 1.9e-05 | ms/batch 420.99 | loss  3.73 | ppl    41.478
| epoch 244 step   106350 |    402 batches | lr 1.89e-05 | ms/batch 421.33 | loss  3.81 | ppl    45.171
| epoch 245 step   106400 |     16 batches | lr 1.88e-05 | ms/batch 413.73 | loss  3.84 | ppl    46.359
----------------------------------------------------------------------------------------------------
| Eval 266 at step   106400 | time: 174.50s | valid loss  4.19 | valid ppl    66.068
----------------------------------------------------------------------------------------------------
| epoch 245 step   106450 |     66 batches | lr 1.87e-05 | ms/batch 546.24 | loss  3.74 | ppl    42.104
| epoch 245 step   106500 |    116 batches | lr 1.86e-05 | ms/batch 422.05 | loss  3.77 | ppl    43.470
| epoch 245 step   106550 |    166 batches | lr 1.85e-05 | ms/batch 422.70 | loss  3.80 | ppl    44.781
| epoch 245 step   106600 |    216 batches | lr 1.84e-05 | ms/batch 420.11 | loss  3.80 | ppl    44.837
| epoch 245 step   106650 |    266 batches | lr 1.83e-05 | ms/batch 420.24 | loss  3.83 | ppl    45.936
| epoch 245 step   106700 |    316 batches | lr 1.82e-05 | ms/batch 420.86 | loss  3.80 | ppl    44.666
| epoch 245 step   106750 |    366 batches | lr 1.81e-05 | ms/batch 421.59 | loss  3.75 | ppl    42.530
| epoch 245 step   106800 |    416 batches | lr 1.8e-05 | ms/batch 420.99 | loss  3.80 | ppl    44.585
----------------------------------------------------------------------------------------------------
| Eval 267 at step   106800 | time: 174.74s | valid loss  4.19 | valid ppl    66.097
----------------------------------------------------------------------------------------------------
| epoch 246 step   106850 |     30 batches | lr 1.79e-05 | ms/batch 535.24 | loss  3.82 | ppl    45.482
| epoch 246 step   106900 |     80 batches | lr 1.78e-05 | ms/batch 420.04 | loss  3.74 | ppl    42.086
| epoch 246 step   106950 |    130 batches | lr 1.77e-05 | ms/batch 419.61 | loss  3.79 | ppl    44.321
| epoch 246 step   107000 |    180 batches | lr 1.76e-05 | ms/batch 420.90 | loss  3.78 | ppl    43.716
| epoch 246 step   107050 |    230 batches | lr 1.75e-05 | ms/batch 422.26 | loss  3.82 | ppl    45.652
| epoch 246 step   107100 |    280 batches | lr 1.74e-05 | ms/batch 421.14 | loss  3.82 | ppl    45.724
| epoch 246 step   107150 |    330 batches | lr 1.73e-05 | ms/batch 419.92 | loss  3.74 | ppl    42.152
| epoch 246 step   107200 |    380 batches | lr 1.72e-05 | ms/batch 422.65 | loss  3.78 | ppl    43.667
----------------------------------------------------------------------------------------------------
| Eval 268 at step   107200 | time: 174.10s | valid loss  4.19 | valid ppl    65.941
----------------------------------------------------------------------------------------------------
| epoch 246 step   107250 |    430 batches | lr 1.71e-05 | ms/batch 575.45 | loss  3.81 | ppl    45.177
| epoch 247 step   107300 |     44 batches | lr 1.7e-05 | ms/batch 413.13 | loss  3.77 | ppl    43.411
| epoch 247 step   107350 |     94 batches | lr 1.69e-05 | ms/batch 420.63 | loss  3.74 | ppl    42.165
| epoch 247 step   107400 |    144 batches | lr 1.68e-05 | ms/batch 420.25 | loss  3.80 | ppl    44.628
| epoch 247 step   107450 |    194 batches | lr 1.67e-05 | ms/batch 419.65 | loss  3.79 | ppl    44.401
| epoch 247 step   107500 |    244 batches | lr 1.67e-05 | ms/batch 420.18 | loss  3.83 | ppl    45.999
| epoch 247 step   107550 |    294 batches | lr 1.66e-05 | ms/batch 422.42 | loss  3.84 | ppl    46.360
| epoch 247 step   107600 |    344 batches | lr 1.65e-05 | ms/batch 421.81 | loss  3.71 | ppl    40.955
----------------------------------------------------------------------------------------------------
| Eval 269 at step   107600 | time: 174.27s | valid loss  4.19 | valid ppl    66.166
----------------------------------------------------------------------------------------------------
| epoch 247 step   107650 |    394 batches | lr 1.64e-05 | ms/batch 546.50 | loss  3.80 | ppl    44.705
| epoch 248 step   107700 |      8 batches | lr 1.63e-05 | ms/batch 414.93 | loss  3.84 | ppl    46.310
| epoch 248 step   107750 |     58 batches | lr 1.62e-05 | ms/batch 421.91 | loss  3.75 | ppl    42.522
| epoch 248 step   107800 |    108 batches | lr 1.61e-05 | ms/batch 422.63 | loss  3.77 | ppl    43.426
| epoch 248 step   107850 |    158 batches | lr 1.6e-05 | ms/batch 422.68 | loss  3.78 | ppl    44.021
| epoch 248 step   107900 |    208 batches | lr 1.59e-05 | ms/batch 421.49 | loss  3.76 | ppl    43.150
| epoch 248 step   107950 |    258 batches | lr 1.58e-05 | ms/batch 422.98 | loss  3.82 | ppl    45.824
| epoch 248 step   108000 |    308 batches | lr 1.57e-05 | ms/batch 421.94 | loss  3.82 | ppl    45.824
----------------------------------------------------------------------------------------------------
| Eval 270 at step   108000 | time: 174.74s | valid loss  4.19 | valid ppl    66.155
----------------------------------------------------------------------------------------------------
| epoch 248 step   108050 |    358 batches | lr 1.56e-05 | ms/batch 547.83 | loss  3.77 | ppl    43.411
| epoch 248 step   108100 |    408 batches | lr 1.55e-05 | ms/batch 421.68 | loss  3.76 | ppl    43.042
| epoch 249 step   108150 |     22 batches | lr 1.55e-05 | ms/batch 414.96 | loss  3.84 | ppl    46.407
| epoch 249 step   108200 |     72 batches | lr 1.54e-05 | ms/batch 422.17 | loss  3.74 | ppl    42.209
| epoch 249 step   108250 |    122 batches | lr 1.53e-05 | ms/batch 421.94 | loss  3.78 | ppl    43.940
| epoch 249 step   108300 |    172 batches | lr 1.52e-05 | ms/batch 420.80 | loss  3.78 | ppl    43.834
| epoch 249 step   108350 |    222 batches | lr 1.51e-05 | ms/batch 421.33 | loss  3.82 | ppl    45.464
| epoch 249 step   108400 |    272 batches | lr 1.5e-05 | ms/batch 420.22 | loss  3.83 | ppl    46.232
----------------------------------------------------------------------------------------------------
| Eval 271 at step   108400 | time: 174.54s | valid loss  4.20 | valid ppl    66.354
----------------------------------------------------------------------------------------------------
| epoch 249 step   108450 |    322 batches | lr 1.49e-05 | ms/batch 545.88 | loss  3.79 | ppl    44.120
| epoch 249 step   108500 |    372 batches | lr 1.48e-05 | ms/batch 419.97 | loss  3.76 | ppl    42.889
| epoch 249 step   108550 |    422 batches | lr 1.47e-05 | ms/batch 422.17 | loss  3.79 | ppl    44.269
| epoch 250 step   108600 |     36 batches | lr 1.47e-05 | ms/batch 411.48 | loss  3.81 | ppl    45.010
| epoch 250 step   108650 |     86 batches | lr 1.46e-05 | ms/batch 420.42 | loss  3.73 | ppl    41.790
| epoch 250 step   108700 |    136 batches | lr 1.45e-05 | ms/batch 420.36 | loss  3.81 | ppl    45.174
| epoch 250 step   108750 |    186 batches | lr 1.44e-05 | ms/batch 420.30 | loss  3.80 | ppl    44.713
| epoch 250 step   108800 |    236 batches | lr 1.43e-05 | ms/batch 419.16 | loss  3.81 | ppl    45.021
----------------------------------------------------------------------------------------------------
| Eval 272 at step   108800 | time: 173.96s | valid loss  4.19 | valid ppl    66.235
----------------------------------------------------------------------------------------------------
| epoch 250 step   108850 |    286 batches | lr 1.42e-05 | ms/batch 544.72 | loss  3.85 | ppl    47.174
| epoch 250 step   108900 |    336 batches | lr 1.41e-05 | ms/batch 419.59 | loss  3.72 | ppl    41.319
| epoch 250 step   108950 |    386 batches | lr 1.4e-05 | ms/batch 420.69 | loss  3.79 | ppl    44.414
| epoch 250 step   109000 |    436 batches | lr 1.4e-05 | ms/batch 413.86 | loss  3.81 | ppl    45.315
| epoch 251 step   109050 |     50 batches | lr 1.39e-05 | ms/batch 417.31 | loss  3.77 | ppl    43.250
| epoch 251 step   109100 |    100 batches | lr 1.38e-05 | ms/batch 419.19 | loss  3.76 | ppl    42.916
| epoch 251 step   109150 |    150 batches | lr 1.37e-05 | ms/batch 419.57 | loss  3.79 | ppl    44.152
| epoch 251 step   109200 |    200 batches | lr 1.36e-05 | ms/batch 420.14 | loss  3.80 | ppl    44.622
----------------------------------------------------------------------------------------------------
| Eval 273 at step   109200 | time: 173.77s | valid loss  4.19 | valid ppl    66.180
----------------------------------------------------------------------------------------------------
| epoch 251 step   109250 |    250 batches | lr 1.35e-05 | ms/batch 545.21 | loss  3.83 | ppl    45.967
| epoch 251 step   109300 |    300 batches | lr 1.34e-05 | ms/batch 422.23 | loss  3.84 | ppl    46.340
| epoch 251 step   109350 |    350 batches | lr 1.34e-05 | ms/batch 419.34 | loss  3.71 | ppl    41.005
| epoch 251 step   109400 |    400 batches | lr 1.33e-05 | ms/batch 419.72 | loss  3.80 | ppl    44.557
| epoch 252 step   109450 |     14 batches | lr 1.32e-05 | ms/batch 412.85 | loss  3.83 | ppl    45.863
| epoch 252 step   109500 |     64 batches | lr 1.31e-05 | ms/batch 420.41 | loss  3.73 | ppl    41.525
| epoch 252 step   109550 |    114 batches | lr 1.3e-05 | ms/batch 420.95 | loss  3.77 | ppl    43.245
| epoch 252 step   109600 |    164 batches | lr 1.29e-05 | ms/batch 420.11 | loss  3.81 | ppl    45.152
----------------------------------------------------------------------------------------------------
| Eval 274 at step   109600 | time: 174.04s | valid loss  4.19 | valid ppl    66.060
----------------------------------------------------------------------------------------------------
| epoch 252 step   109650 |    214 batches | lr 1.29e-05 | ms/batch 544.09 | loss  3.80 | ppl    44.780
| epoch 252 step   109700 |    264 batches | lr 1.28e-05 | ms/batch 419.72 | loss  3.80 | ppl    44.674
| epoch 252 step   109750 |    314 batches | lr 1.27e-05 | ms/batch 419.20 | loss  3.80 | ppl    44.770
| epoch 252 step   109800 |    364 batches | lr 1.26e-05 | ms/batch 419.59 | loss  3.74 | ppl    42.160
| epoch 252 step   109850 |    414 batches | lr 1.25e-05 | ms/batch 419.04 | loss  3.80 | ppl    44.693
| epoch 253 step   109900 |     28 batches | lr 1.25e-05 | ms/batch 411.89 | loss  3.83 | ppl    45.854
| epoch 253 step   109950 |     78 batches | lr 1.24e-05 | ms/batch 420.46 | loss  3.75 | ppl    42.565
| epoch 253 step   110000 |    128 batches | lr 1.23e-05 | ms/batch 421.29 | loss  3.81 | ppl    45.026
----------------------------------------------------------------------------------------------------
| Eval 275 at step   110000 | time: 173.79s | valid loss  4.19 | valid ppl    66.241
----------------------------------------------------------------------------------------------------
| epoch 253 step   110050 |    178 batches | lr 1.22e-05 | ms/batch 544.24 | loss  3.79 | ppl    44.316
| epoch 253 step   110100 |    228 batches | lr 1.21e-05 | ms/batch 420.26 | loss  3.80 | ppl    44.623
| epoch 253 step   110150 |    278 batches | lr 1.2e-05 | ms/batch 419.78 | loss  3.87 | ppl    47.727
| epoch 253 step   110200 |    328 batches | lr 1.2e-05 | ms/batch 420.32 | loss  3.75 | ppl    42.641
| epoch 253 step   110250 |    378 batches | lr 1.19e-05 | ms/batch 422.03 | loss  3.77 | ppl    43.483
| epoch 253 step   110300 |    428 batches | lr 1.18e-05 | ms/batch 421.10 | loss  3.82 | ppl    45.397
| epoch 254 step   110350 |     42 batches | lr 1.17e-05 | ms/batch 410.88 | loss  3.74 | ppl    42.263
| epoch 254 step   110400 |     92 batches | lr 1.16e-05 | ms/batch 418.97 | loss  3.74 | ppl    42.290
----------------------------------------------------------------------------------------------------
| Eval 276 at step   110400 | time: 173.86s | valid loss  4.19 | valid ppl    66.052
----------------------------------------------------------------------------------------------------
| epoch 254 step   110450 |    142 batches | lr 1.16e-05 | ms/batch 545.11 | loss  3.82 | ppl    45.432
| epoch 254 step   110500 |    192 batches | lr 1.15e-05 | ms/batch 420.09 | loss  3.78 | ppl    43.967
| epoch 254 step   110550 |    242 batches | lr 1.14e-05 | ms/batch 419.78 | loss  3.83 | ppl    46.018
| epoch 254 step   110600 |    292 batches | lr 1.13e-05 | ms/batch 423.19 | loss  3.83 | ppl    46.235
| epoch 254 step   110650 |    342 batches | lr 1.13e-05 | ms/batch 421.30 | loss  3.71 | ppl    40.737
| epoch 254 step   110700 |    392 batches | lr 1.12e-05 | ms/batch 420.11 | loss  3.79 | ppl    44.410
| epoch 255 step   110750 |      6 batches | lr 1.11e-05 | ms/batch 411.79 | loss  3.81 | ppl    45.017
| epoch 255 step   110800 |     56 batches | lr 1.1e-05 | ms/batch 422.05 | loss  3.76 | ppl    43.109
----------------------------------------------------------------------------------------------------
| Eval 277 at step   110800 | time: 174.18s | valid loss  4.19 | valid ppl    66.153
----------------------------------------------------------------------------------------------------
| epoch 255 step   110850 |    106 batches | lr 1.1e-05 | ms/batch 545.90 | loss  3.75 | ppl    42.364
| epoch 255 step   110900 |    156 batches | lr 1.09e-05 | ms/batch 422.37 | loss  3.79 | ppl    44.380
| epoch 255 step   110950 |    206 batches | lr 1.08e-05 | ms/batch 423.10 | loss  3.80 | ppl    44.572
| epoch 255 step   111000 |    256 batches | lr 1.07e-05 | ms/batch 422.35 | loss  3.84 | ppl    46.634
| epoch 255 step   111050 |    306 batches | lr 1.06e-05 | ms/batch 423.36 | loss  3.82 | ppl    45.421
| epoch 255 step   111100 |    356 batches | lr 1.06e-05 | ms/batch 422.77 | loss  3.73 | ppl    41.704
| epoch 255 step   111150 |    406 batches | lr 1.05e-05 | ms/batch 422.91 | loss  3.78 | ppl    43.732
| epoch 256 step   111200 |     20 batches | lr 1.04e-05 | ms/batch 412.76 | loss  3.82 | ppl    45.508
----------------------------------------------------------------------------------------------------
| Eval 278 at step   111200 | time: 174.79s | valid loss  4.19 | valid ppl    65.947
----------------------------------------------------------------------------------------------------
| epoch 256 step   111250 |     70 batches | lr 1.03e-05 | ms/batch 546.15 | loss  3.73 | ppl    41.689
| epoch 256 step   111300 |    120 batches | lr 1.03e-05 | ms/batch 420.07 | loss  3.77 | ppl    43.195
| epoch 256 step   111350 |    170 batches | lr 1.02e-05 | ms/batch 420.48 | loss  3.78 | ppl    44.012
| epoch 256 step   111400 |    220 batches | lr 1.01e-05 | ms/batch 419.92 | loss  3.80 | ppl    44.848
| epoch 256 step   111450 |    270 batches | lr 1e-05 | ms/batch 421.57 | loss  3.80 | ppl    44.683
| epoch 256 step   111500 |    320 batches | lr 9.98e-06 | ms/batch 421.59 | loss  3.79 | ppl    44.407
| epoch 256 step   111550 |    370 batches | lr 9.9e-06 | ms/batch 423.89 | loss  3.77 | ppl    43.392
| epoch 256 step   111600 |    420 batches | lr 9.83e-06 | ms/batch 422.99 | loss  3.79 | ppl    44.389
----------------------------------------------------------------------------------------------------
| Eval 279 at step   111600 | time: 174.85s | valid loss  4.19 | valid ppl    65.939
----------------------------------------------------------------------------------------------------
| epoch 257 step   111650 |     34 batches | lr 9.76e-06 | ms/batch 569.98 | loss  3.81 | ppl    45.150
| epoch 257 step   111700 |     84 batches | lr 9.69e-06 | ms/batch 420.99 | loss  3.75 | ppl    42.590
| epoch 257 step   111750 |    134 batches | lr 9.61e-06 | ms/batch 421.09 | loss  3.81 | ppl    45.073
| epoch 257 step   111800 |    184 batches | lr 9.54e-06 | ms/batch 419.83 | loss  3.77 | ppl    43.290
| epoch 257 step   111850 |    234 batches | lr 9.47e-06 | ms/batch 421.12 | loss  3.81 | ppl    45.237
| epoch 257 step   111900 |    284 batches | lr 9.4e-06 | ms/batch 422.30 | loss  3.82 | ppl    45.797
| epoch 257 step   111950 |    334 batches | lr 9.33e-06 | ms/batch 421.93 | loss  3.73 | ppl    41.803
| epoch 257 step   112000 |    384 batches | lr 9.26e-06 | ms/batch 420.08 | loss  3.78 | ppl    43.807
----------------------------------------------------------------------------------------------------
| Eval 280 at step   112000 | time: 174.39s | valid loss  4.19 | valid ppl    65.952
----------------------------------------------------------------------------------------------------
| epoch 257 step   112050 |    434 batches | lr 9.19e-06 | ms/batch 547.18 | loss  3.83 | ppl    46.000
| epoch 258 step   112100 |     48 batches | lr 9.12e-06 | ms/batch 411.05 | loss  3.77 | ppl    43.441
| epoch 258 step   112150 |     98 batches | lr 9.05e-06 | ms/batch 420.25 | loss  3.76 | ppl    42.976
| epoch 258 step   112200 |    148 batches | lr 8.98e-06 | ms/batch 420.43 | loss  3.78 | ppl    43.958
| epoch 258 step   112250 |    198 batches | lr 8.91e-06 | ms/batch 421.69 | loss  3.80 | ppl    44.727
| epoch 258 step   112300 |    248 batches | lr 8.84e-06 | ms/batch 421.36 | loss  3.82 | ppl    45.535
| epoch 258 step   112350 |    298 batches | lr 8.77e-06 | ms/batch 420.18 | loss  3.82 | ppl    45.745
| epoch 258 step   112400 |    348 batches | lr 8.7e-06 | ms/batch 418.40 | loss  3.70 | ppl    40.551
----------------------------------------------------------------------------------------------------
| Eval 281 at step   112400 | time: 173.98s | valid loss  4.19 | valid ppl    66.064
----------------------------------------------------------------------------------------------------
| epoch 258 step   112450 |    398 batches | lr 8.63e-06 | ms/batch 547.05 | loss  3.80 | ppl    44.651
| epoch 259 step   112500 |     12 batches | lr 8.57e-06 | ms/batch 414.86 | loss  3.80 | ppl    44.695
| epoch 259 step   112550 |     62 batches | lr 8.5e-06 | ms/batch 420.73 | loss  3.72 | ppl    41.285
| epoch 259 step   112600 |    112 batches | lr 8.43e-06 | ms/batch 422.43 | loss  3.80 | ppl    44.495
| epoch 259 step   112650 |    162 batches | lr 8.36e-06 | ms/batch 421.31 | loss  3.80 | ppl    44.738
| epoch 259 step   112700 |    212 batches | lr 8.3e-06 | ms/batch 419.61 | loss  3.79 | ppl    44.090
| epoch 259 step   112750 |    262 batches | lr 8.23e-06 | ms/batch 419.71 | loss  3.83 | ppl    45.992
| epoch 259 step   112800 |    312 batches | lr 8.16e-06 | ms/batch 421.03 | loss  3.80 | ppl    44.717
----------------------------------------------------------------------------------------------------
| Eval 282 at step   112800 | time: 174.41s | valid loss  4.19 | valid ppl    66.023
----------------------------------------------------------------------------------------------------
| epoch 259 step   112850 |    362 batches | lr 8.1e-06 | ms/batch 550.52 | loss  3.72 | ppl    41.447
| epoch 259 step   112900 |    412 batches | lr 8.03e-06 | ms/batch 421.37 | loss  3.78 | ppl    43.873
| epoch 260 step   112950 |     26 batches | lr 7.96e-06 | ms/batch 414.17 | loss  3.82 | ppl    45.630
| epoch 260 step   113000 |     76 batches | lr 7.9e-06 | ms/batch 423.43 | loss  3.73 | ppl    41.726
| epoch 260 step   113050 |    126 batches | lr 7.83e-06 | ms/batch 422.87 | loss  3.78 | ppl    43.871
| epoch 260 step   113100 |    176 batches | lr 7.77e-06 | ms/batch 421.96 | loss  3.80 | ppl    44.866
| epoch 260 step   113150 |    226 batches | lr 7.7e-06 | ms/batch 421.01 | loss  3.81 | ppl    45.102
| epoch 260 step   113200 |    276 batches | lr 7.64e-06 | ms/batch 419.57 | loss  3.81 | ppl    45.355
----------------------------------------------------------------------------------------------------
| Eval 283 at step   113200 | time: 174.72s | valid loss  4.19 | valid ppl    66.103
----------------------------------------------------------------------------------------------------
| epoch 260 step   113250 |    326 batches | lr 7.58e-06 | ms/batch 544.55 | loss  3.75 | ppl    42.494
| epoch 260 step   113300 |    376 batches | lr 7.51e-06 | ms/batch 423.40 | loss  3.76 | ppl    43.062
| epoch 260 step   113350 |    426 batches | lr 7.45e-06 | ms/batch 420.09 | loss  3.81 | ppl    44.993
| epoch 261 step   113400 |     40 batches | lr 7.38e-06 | ms/batch 412.34 | loss  3.77 | ppl    43.526
| epoch 261 step   113450 |     90 batches | lr 7.32e-06 | ms/batch 421.14 | loss  3.73 | ppl    41.599
| epoch 261 step   113500 |    140 batches | lr 7.26e-06 | ms/batch 421.21 | loss  3.79 | ppl    44.353
| epoch 261 step   113550 |    190 batches | lr 7.2e-06 | ms/batch 423.26 | loss  3.80 | ppl    44.494
| epoch 261 step   113600 |    240 batches | lr 7.13e-06 | ms/batch 422.62 | loss  3.82 | ppl    45.402
----------------------------------------------------------------------------------------------------
| Eval 284 at step   113600 | time: 174.45s | valid loss  4.19 | valid ppl    65.974
----------------------------------------------------------------------------------------------------
| epoch 261 step   113650 |    290 batches | lr 7.07e-06 | ms/batch 547.85 | loss  3.84 | ppl    46.329
| epoch 261 step   113700 |    340 batches | lr 7.01e-06 | ms/batch 422.91 | loss  3.69 | ppl    40.146
| epoch 261 step   113750 |    390 batches | lr 6.95e-06 | ms/batch 421.72 | loss  3.78 | ppl    43.789
| epoch 262 step   113800 |      4 batches | lr 6.89e-06 | ms/batch 414.56 | loss  3.81 | ppl    45.287
| epoch 262 step   113850 |     54 batches | lr 6.83e-06 | ms/batch 421.27 | loss  3.74 | ppl    42.235
| epoch 262 step   113900 |    104 batches | lr 6.77e-06 | ms/batch 423.07 | loss  3.74 | ppl    41.988
| epoch 262 step   113950 |    154 batches | lr 6.71e-06 | ms/batch 422.08 | loss  3.79 | ppl    44.037
| epoch 262 step   114000 |    204 batches | lr 6.65e-06 | ms/batch 421.44 | loss  3.78 | ppl    43.972
----------------------------------------------------------------------------------------------------
| Eval 285 at step   114000 | time: 174.74s | valid loss  4.19 | valid ppl    65.920
----------------------------------------------------------------------------------------------------
| epoch 262 step   114050 |    254 batches | lr 6.59e-06 | ms/batch 573.73 | loss  3.78 | ppl    43.963
| epoch 262 step   114100 |    304 batches | lr 6.53e-06 | ms/batch 421.41 | loss  3.83 | ppl    45.955
| epoch 262 step   114150 |    354 batches | lr 6.47e-06 | ms/batch 422.54 | loss  3.72 | ppl    41.097
| epoch 262 step   114200 |    404 batches | lr 6.41e-06 | ms/batch 421.10 | loss  3.78 | ppl    43.933
| epoch 263 step   114250 |     18 batches | lr 6.35e-06 | ms/batch 413.01 | loss  3.81 | ppl    44.939
| epoch 263 step   114300 |     68 batches | lr 6.29e-06 | ms/batch 419.28 | loss  3.74 | ppl    41.968
| epoch 263 step   114350 |    118 batches | lr 6.23e-06 | ms/batch 423.01 | loss  3.78 | ppl    43.937
| epoch 263 step   114400 |    168 batches | lr 6.17e-06 | ms/batch 422.88 | loss  3.78 | ppl    43.763
----------------------------------------------------------------------------------------------------
| Eval 286 at step   114400 | time: 174.57s | valid loss  4.19 | valid ppl    65.925
----------------------------------------------------------------------------------------------------
| epoch 263 step   114450 |    218 batches | lr 6.12e-06 | ms/batch 548.89 | loss  3.80 | ppl    44.819
| epoch 263 step   114500 |    268 batches | lr 6.06e-06 | ms/batch 421.79 | loss  3.81 | ppl    45.283
| epoch 263 step   114550 |    318 batches | lr 6e-06 | ms/batch 422.57 | loss  3.80 | ppl    44.580
| epoch 263 step   114600 |    368 batches | lr 5.94e-06 | ms/batch 422.45 | loss  3.73 | ppl    41.594
| epoch 263 step   114650 |    418 batches | lr 5.89e-06 | ms/batch 422.13 | loss  3.79 | ppl    44.334
| epoch 264 step   114700 |     32 batches | lr 5.83e-06 | ms/batch 412.33 | loss  3.81 | ppl    45.060
| epoch 264 step   114750 |     82 batches | lr 5.77e-06 | ms/batch 419.60 | loss  3.72 | ppl    41.414
| epoch 264 step   114800 |    132 batches | lr 5.72e-06 | ms/batch 419.09 | loss  3.80 | ppl    44.898
----------------------------------------------------------------------------------------------------
| Eval 287 at step   114800 | time: 174.45s | valid loss  4.19 | valid ppl    65.986
----------------------------------------------------------------------------------------------------
| epoch 264 step   114850 |    182 batches | lr 5.66e-06 | ms/batch 547.19 | loss  3.78 | ppl    43.666
| epoch 264 step   114900 |    232 batches | lr 5.61e-06 | ms/batch 421.82 | loss  3.80 | ppl    44.602
| epoch 264 step   114950 |    282 batches | lr 5.55e-06 | ms/batch 422.19 | loss  3.83 | ppl    46.133
| epoch 264 step   115000 |    332 batches | lr 5.5e-06 | ms/batch 420.36 | loss  3.74 | ppl    41.983
| epoch 264 step   115050 |    382 batches | lr 5.44e-06 | ms/batch 423.18 | loss  3.80 | ppl    44.757
| epoch 264 step   115100 |    432 batches | lr 5.39e-06 | ms/batch 422.02 | loss  3.81 | ppl    44.975
| epoch 265 step   115150 |     46 batches | lr 5.34e-06 | ms/batch 411.97 | loss  3.75 | ppl    42.593
| epoch 265 step   115200 |     96 batches | lr 5.28e-06 | ms/batch 419.51 | loss  3.73 | ppl    41.703
----------------------------------------------------------------------------------------------------
| Eval 288 at step   115200 | time: 174.34s | valid loss  4.19 | valid ppl    65.989
----------------------------------------------------------------------------------------------------
| epoch 265 step   115250 |    146 batches | lr 5.23e-06 | ms/batch 545.31 | loss  3.80 | ppl    44.919
| epoch 265 step   115300 |    196 batches | lr 5.17e-06 | ms/batch 420.38 | loss  3.82 | ppl    45.406
| epoch 265 step   115350 |    246 batches | lr 5.12e-06 | ms/batch 419.57 | loss  3.82 | ppl    45.498
| epoch 265 step   115400 |    296 batches | lr 5.07e-06 | ms/batch 417.66 | loss  3.86 | ppl    47.350
| epoch 265 step   115450 |    346 batches | lr 5.02e-06 | ms/batch 420.24 | loss  3.68 | ppl    39.644
| epoch 265 step   115500 |    396 batches | lr 4.96e-06 | ms/batch 422.74 | loss  3.79 | ppl    44.143
| epoch 266 step   115550 |     10 batches | lr 4.91e-06 | ms/batch 412.71 | loss  3.83 | ppl    45.877
| epoch 266 step   115600 |     60 batches | lr 4.86e-06 | ms/batch 422.43 | loss  3.73 | ppl    41.857
----------------------------------------------------------------------------------------------------
| Eval 289 at step   115600 | time: 174.10s | valid loss  4.19 | valid ppl    65.931
----------------------------------------------------------------------------------------------------
| epoch 266 step   115650 |    110 batches | lr 4.81e-06 | ms/batch 546.58 | loss  3.75 | ppl    42.585
| epoch 266 step   115700 |    160 batches | lr 4.76e-06 | ms/batch 422.47 | loss  3.79 | ppl    44.460
| epoch 266 step   115750 |    210 batches | lr 4.71e-06 | ms/batch 422.11 | loss  3.79 | ppl    44.128
| epoch 266 step   115800 |    260 batches | lr 4.66e-06 | ms/batch 420.47 | loss  3.82 | ppl    45.593
| epoch 266 step   115850 |    310 batches | lr 4.61e-06 | ms/batch 420.80 | loss  3.80 | ppl    44.692
| epoch 266 step   115900 |    360 batches | lr 4.56e-06 | ms/batch 422.85 | loss  3.73 | ppl    41.746
| epoch 266 step   115950 |    410 batches | lr 4.51e-06 | ms/batch 421.20 | loss  3.79 | ppl    44.275
| epoch 267 step   116000 |     24 batches | lr 4.46e-06 | ms/batch 413.31 | loss  3.82 | ppl    45.594
----------------------------------------------------------------------------------------------------
| Eval 290 at step   116000 | time: 174.47s | valid loss  4.19 | valid ppl    65.836
----------------------------------------------------------------------------------------------------
| epoch 267 step   116050 |     74 batches | lr 4.41e-06 | ms/batch 577.13 | loss  3.73 | ppl    41.515
| epoch 267 step   116100 |    124 batches | lr 4.36e-06 | ms/batch 421.88 | loss  3.80 | ppl    44.613
| epoch 267 step   116150 |    174 batches | lr 4.31e-06 | ms/batch 420.92 | loss  3.78 | ppl    43.869
| epoch 267 step   116200 |    224 batches | lr 4.26e-06 | ms/batch 422.40 | loss  3.82 | ppl    45.648
| epoch 267 step   116250 |    274 batches | lr 4.21e-06 | ms/batch 420.13 | loss  3.81 | ppl    45.315
| epoch 267 step   116300 |    324 batches | lr 4.17e-06 | ms/batch 422.48 | loss  3.76 | ppl    42.913
| epoch 267 step   116350 |    374 batches | lr 4.12e-06 | ms/batch 420.90 | loss  3.76 | ppl    43.149
| epoch 267 step   116400 |    424 batches | lr 4.07e-06 | ms/batch 419.51 | loss  3.78 | ppl    43.970
----------------------------------------------------------------------------------------------------
| Eval 291 at step   116400 | time: 174.87s | valid loss  4.19 | valid ppl    65.933
----------------------------------------------------------------------------------------------------
| epoch 268 step   116450 |     38 batches | lr 4.02e-06 | ms/batch 539.54 | loss  3.79 | ppl    44.338
| epoch 268 step   116500 |     88 batches | lr 3.98e-06 | ms/batch 420.01 | loss  3.71 | ppl    40.823
| epoch 268 step   116550 |    138 batches | lr 3.93e-06 | ms/batch 419.79 | loss  3.80 | ppl    44.531
| epoch 268 step   116600 |    188 batches | lr 3.89e-06 | ms/batch 422.38 | loss  3.79 | ppl    44.071
| epoch 268 step   116650 |    238 batches | lr 3.84e-06 | ms/batch 422.32 | loss  3.80 | ppl    44.856
| epoch 268 step   116700 |    288 batches | lr 3.79e-06 | ms/batch 423.54 | loss  3.85 | ppl    46.888
| epoch 268 step   116750 |    338 batches | lr 3.75e-06 | ms/batch 421.70 | loss  3.69 | ppl    39.902
| epoch 268 step   116800 |    388 batches | lr 3.7e-06 | ms/batch 422.30 | loss  3.80 | ppl    44.634
----------------------------------------------------------------------------------------------------
| Eval 292 at step   116800 | time: 174.57s | valid loss  4.19 | valid ppl    65.951
----------------------------------------------------------------------------------------------------
| epoch 269 step   116850 |      2 batches | lr 3.66e-06 | ms/batch 538.14 | loss  3.83 | ppl    45.966
| epoch 269 step   116900 |     52 batches | lr 3.61e-06 | ms/batch 420.89 | loss  3.73 | ppl    41.706
| epoch 269 step   116950 |    102 batches | lr 3.57e-06 | ms/batch 422.76 | loss  3.76 | ppl    42.847
| epoch 269 step   117000 |    152 batches | lr 3.53e-06 | ms/batch 423.16 | loss  3.77 | ppl    43.338
| epoch 269 step   117050 |    202 batches | lr 3.48e-06 | ms/batch 421.41 | loss  3.77 | ppl    43.430
| epoch 269 step   117100 |    252 batches | lr 3.44e-06 | ms/batch 422.12 | loss  3.81 | ppl    45.068
| epoch 269 step   117150 |    302 batches | lr 3.39e-06 | ms/batch 422.29 | loss  3.83 | ppl    45.854
| epoch 269 step   117200 |    352 batches | lr 3.35e-06 | ms/batch 422.88 | loss  3.71 | ppl    40.808
----------------------------------------------------------------------------------------------------
| Eval 293 at step   117200 | time: 174.67s | valid loss  4.19 | valid ppl    66.007
----------------------------------------------------------------------------------------------------
| epoch 269 step   117250 |    402 batches | lr 3.31e-06 | ms/batch 547.71 | loss  3.79 | ppl    44.367
| epoch 270 step   117300 |     16 batches | lr 3.27e-06 | ms/batch 412.76 | loss  3.81 | ppl    45.208
| epoch 270 step   117350 |     66 batches | lr 3.22e-06 | ms/batch 420.97 | loss  3.72 | ppl    41.274
| epoch 270 step   117400 |    116 batches | lr 3.18e-06 | ms/batch 420.98 | loss  3.76 | ppl    43.029
| epoch 270 step   117450 |    166 batches | lr 3.14e-06 | ms/batch 420.64 | loss  3.76 | ppl    42.761
| epoch 270 step   117500 |    216 batches | lr 3.1e-06 | ms/batch 420.61 | loss  3.81 | ppl    45.169
| epoch 270 step   117550 |    266 batches | lr 3.06e-06 | ms/batch 420.03 | loss  3.80 | ppl    44.496
| epoch 270 step   117600 |    316 batches | lr 3.02e-06 | ms/batch 419.15 | loss  3.78 | ppl    43.682
----------------------------------------------------------------------------------------------------
| Eval 294 at step   117600 | time: 174.12s | valid loss  4.19 | valid ppl    65.931
----------------------------------------------------------------------------------------------------
| epoch 270 step   117650 |    366 batches | lr 2.98e-06 | ms/batch 545.94 | loss  3.73 | ppl    41.473
| epoch 270 step   117700 |    416 batches | lr 2.94e-06 | ms/batch 420.38 | loss  3.79 | ppl    44.179
| epoch 271 step   117750 |     30 batches | lr 2.9e-06 | ms/batch 412.31 | loss  3.82 | ppl    45.662
| epoch 271 step   117800 |     80 batches | lr 2.86e-06 | ms/batch 419.42 | loss  3.72 | ppl    41.179
| epoch 271 step   117850 |    130 batches | lr 2.82e-06 | ms/batch 421.56 | loss  3.77 | ppl    43.519
| epoch 271 step   117900 |    180 batches | lr 2.78e-06 | ms/batch 419.18 | loss  3.77 | ppl    43.366
| epoch 271 step   117950 |    230 batches | lr 2.74e-06 | ms/batch 420.95 | loss  3.83 | ppl    45.903
| epoch 271 step   118000 |    280 batches | lr 2.7e-06 | ms/batch 419.68 | loss  3.81 | ppl    45.305
----------------------------------------------------------------------------------------------------
| Eval 295 at step   118000 | time: 174.03s | valid loss  4.19 | valid ppl    65.989
----------------------------------------------------------------------------------------------------
| epoch 271 step   118050 |    330 batches | lr 2.66e-06 | ms/batch 546.00 | loss  3.72 | ppl    41.446
| epoch 271 step   118100 |    380 batches | lr 2.62e-06 | ms/batch 420.28 | loss  3.77 | ppl    43.194
| epoch 271 step   118150 |    430 batches | lr 2.59e-06 | ms/batch 421.70 | loss  3.79 | ppl    44.468
| epoch 272 step   118200 |     44 batches | lr 2.55e-06 | ms/batch 414.60 | loss  3.75 | ppl    42.558
| epoch 272 step   118250 |     94 batches | lr 2.51e-06 | ms/batch 422.38 | loss  3.73 | ppl    41.739
| epoch 272 step   118300 |    144 batches | lr 2.48e-06 | ms/batch 421.36 | loss  3.80 | ppl    44.482
| epoch 272 step   118350 |    194 batches | lr 2.44e-06 | ms/batch 421.31 | loss  3.78 | ppl    43.637
| epoch 272 step   118400 |    244 batches | lr 2.4e-06 | ms/batch 424.38 | loss  3.81 | ppl    45.232
----------------------------------------------------------------------------------------------------
| Eval 296 at step   118400 | time: 174.61s | valid loss  4.19 | valid ppl    65.982
----------------------------------------------------------------------------------------------------
| epoch 272 step   118450 |    294 batches | lr 2.37e-06 | ms/batch 549.99 | loss  3.85 | ppl    47.200
| epoch 272 step   118500 |    344 batches | lr 2.33e-06 | ms/batch 420.10 | loss  3.69 | ppl    40.243
| epoch 272 step   118550 |    394 batches | lr 2.29e-06 | ms/batch 419.90 | loss  3.77 | ppl    43.541
| epoch 273 step   118600 |      8 batches | lr 2.26e-06 | ms/batch 411.93 | loss  3.82 | ppl    45.714
| epoch 273 step   118650 |     58 batches | lr 2.22e-06 | ms/batch 420.57 | loss  3.73 | ppl    41.867
| epoch 273 step   118700 |    108 batches | lr 2.19e-06 | ms/batch 420.18 | loss  3.75 | ppl    42.665
| epoch 273 step   118750 |    158 batches | lr 2.15e-06 | ms/batch 420.56 | loss  3.78 | ppl    43.928
| epoch 273 step   118800 |    208 batches | lr 2.12e-06 | ms/batch 422.25 | loss  3.77 | ppl    43.567
----------------------------------------------------------------------------------------------------
| Eval 297 at step   118800 | time: 174.23s | valid loss  4.19 | valid ppl    65.911
----------------------------------------------------------------------------------------------------
| epoch 273 step   118850 |    258 batches | lr 2.09e-06 | ms/batch 545.04 | loss  3.82 | ppl    45.716
| epoch 273 step   118900 |    308 batches | lr 2.05e-06 | ms/batch 420.41 | loss  3.81 | ppl    45.363
| epoch 273 step   118950 |    358 batches | lr 2.02e-06 | ms/batch 421.28 | loss  3.73 | ppl    41.851
| epoch 273 step   119000 |    408 batches | lr 1.99e-06 | ms/batch 422.93 | loss  3.79 | ppl    44.184
| epoch 274 step   119050 |     22 batches | lr 1.95e-06 | ms/batch 413.30 | loss  3.81 | ppl    45.054
| epoch 274 step   119100 |     72 batches | lr 1.92e-06 | ms/batch 421.88 | loss  3.74 | ppl    42.168
| epoch 274 step   119150 |    122 batches | lr 1.89e-06 | ms/batch 419.10 | loss  3.77 | ppl    43.422
| epoch 274 step   119200 |    172 batches | lr 1.86e-06 | ms/batch 420.88 | loss  3.80 | ppl    44.647
----------------------------------------------------------------------------------------------------
| Eval 298 at step   119200 | time: 174.22s | valid loss  4.19 | valid ppl    65.883
----------------------------------------------------------------------------------------------------
| epoch 274 step   119250 |    222 batches | lr 1.82e-06 | ms/batch 544.63 | loss  3.82 | ppl    45.428
| epoch 274 step   119300 |    272 batches | lr 1.79e-06 | ms/batch 419.61 | loss  3.83 | ppl    45.857
| epoch 274 step   119350 |    322 batches | lr 1.76e-06 | ms/batch 419.47 | loss  3.76 | ppl    43.013
| epoch 274 step   119400 |    372 batches | lr 1.73e-06 | ms/batch 420.47 | loss  3.74 | ppl    42.242
| epoch 274 step   119450 |    422 batches | lr 1.7e-06 | ms/batch 422.06 | loss  3.78 | ppl    43.917
| epoch 275 step   119500 |     36 batches | lr 1.67e-06 | ms/batch 412.37 | loss  3.78 | ppl    43.759
| epoch 275 step   119550 |     86 batches | lr 1.64e-06 | ms/batch 420.37 | loss  3.71 | ppl    40.767
| epoch 275 step   119600 |    136 batches | lr 1.61e-06 | ms/batch 420.43 | loss  3.78 | ppl    43.770
----------------------------------------------------------------------------------------------------
| Eval 299 at step   119600 | time: 174.00s | valid loss  4.19 | valid ppl    65.935
----------------------------------------------------------------------------------------------------
| epoch 275 step   119650 |    186 batches | lr 1.58e-06 | ms/batch 545.66 | loss  3.77 | ppl    43.279
| epoch 275 step   119700 |    236 batches | lr 1.55e-06 | ms/batch 419.46 | loss  3.81 | ppl    45.149
| epoch 275 step   119750 |    286 batches | lr 1.52e-06 | ms/batch 420.64 | loss  3.83 | ppl    46.173
| epoch 275 step   119800 |    336 batches | lr 1.49e-06 | ms/batch 420.11 | loss  3.70 | ppl    40.603
| epoch 275 step   119850 |    386 batches | lr 1.46e-06 | ms/batch 420.80 | loss  3.78 | ppl    43.827
| epoch 275 step   119900 |    436 batches | lr 1.44e-06 | ms/batch 413.51 | loss  3.79 | ppl    44.235
| epoch 276 step   119950 |     50 batches | lr 1.41e-06 | ms/batch 416.77 | loss  3.74 | ppl    42.098
| epoch 276 step   120000 |    100 batches | lr 1.38e-06 | ms/batch 418.07 | loss  3.77 | ppl    43.367
----------------------------------------------------------------------------------------------------
| Eval 300 at step   120000 | time: 173.72s | valid loss  4.19 | valid ppl    65.925
----------------------------------------------------------------------------------------------------
| epoch 276 step   120050 |    150 batches | lr 1.35e-06 | ms/batch 544.17 | loss  3.78 | ppl    43.710
| epoch 276 step   120100 |    200 batches | lr 1.33e-06 | ms/batch 421.11 | loss  3.78 | ppl    43.912
| epoch 276 step   120150 |    250 batches | lr 1.3e-06 | ms/batch 419.92 | loss  3.82 | ppl    45.519
| epoch 276 step   120200 |    300 batches | lr 1.27e-06 | ms/batch 419.76 | loss  3.82 | ppl    45.461
| epoch 276 step   120250 |    350 batches | lr 1.25e-06 | ms/batch 421.60 | loss  3.69 | ppl    39.997
| epoch 276 step   120300 |    400 batches | lr 1.22e-06 | ms/batch 420.01 | loss  3.81 | ppl    44.974
| epoch 277 step   120350 |     14 batches | lr 1.19e-06 | ms/batch 413.45 | loss  3.81 | ppl    45.091
| epoch 277 step   120400 |     64 batches | lr 1.17e-06 | ms/batch 419.69 | loss  3.75 | ppl    42.364
----------------------------------------------------------------------------------------------------
| Eval 301 at step   120400 | time: 174.02s | valid loss  4.19 | valid ppl    65.927
----------------------------------------------------------------------------------------------------
| epoch 277 step   120450 |    114 batches | lr 1.14e-06 | ms/batch 546.13 | loss  3.77 | ppl    43.198
| epoch 277 step   120500 |    164 batches | lr 1.12e-06 | ms/batch 419.74 | loss  3.78 | ppl    43.810
| epoch 277 step   120550 |    214 batches | lr 1.09e-06 | ms/batch 421.31 | loss  3.79 | ppl    44.087
| epoch 277 step   120600 |    264 batches | lr 1.07e-06 | ms/batch 419.62 | loss  3.82 | ppl    45.573
| epoch 277 step   120650 |    314 batches | lr 1.04e-06 | ms/batch 419.82 | loss  3.80 | ppl    44.697
| epoch 277 step   120700 |    364 batches | lr 1.02e-06 | ms/batch 420.50 | loss  3.73 | ppl    41.722
| epoch 277 step   120750 |    414 batches | lr 9.97e-07 | ms/batch 420.31 | loss  3.80 | ppl    44.592
| epoch 278 step   120800 |     28 batches | lr 9.74e-07 | ms/batch 413.72 | loss  3.81 | ppl    45.300
----------------------------------------------------------------------------------------------------
| Eval 302 at step   120800 | time: 174.09s | valid loss  4.19 | valid ppl    65.882
----------------------------------------------------------------------------------------------------
| epoch 278 step   120850 |     78 batches | lr 9.51e-07 | ms/batch 546.94 | loss  3.74 | ppl    41.922
| epoch 278 step   120900 |    128 batches | lr 9.28e-07 | ms/batch 420.33 | loss  3.78 | ppl    43.724
| epoch 278 step   120950 |    178 batches | lr 9.06e-07 | ms/batch 420.18 | loss  3.78 | ppl    44.022
| epoch 278 step   121000 |    228 batches | lr 8.84e-07 | ms/batch 418.78 | loss  3.80 | ppl    44.759
| epoch 278 step   121050 |    278 batches | lr 8.62e-07 | ms/batch 421.08 | loss  3.83 | ppl    46.052
| epoch 278 step   121100 |    328 batches | lr 8.4e-07 | ms/batch 420.10 | loss  3.73 | ppl    41.864
| epoch 278 step   121150 |    378 batches | lr 8.19e-07 | ms/batch 421.11 | loss  3.78 | ppl    43.605
| epoch 278 step   121200 |    428 batches | lr 7.97e-07 | ms/batch 421.06 | loss  3.77 | ppl    43.254
----------------------------------------------------------------------------------------------------
| Eval 303 at step   121200 | time: 174.47s | valid loss  4.19 | valid ppl    65.919
----------------------------------------------------------------------------------------------------
| epoch 279 step   121250 |     42 batches | lr 7.77e-07 | ms/batch 538.09 | loss  3.77 | ppl    43.223
| epoch 279 step   121300 |     92 batches | lr 7.56e-07 | ms/batch 419.50 | loss  3.71 | ppl    41.050
| epoch 279 step   121350 |    142 batches | lr 7.36e-07 | ms/batch 420.55 | loss  3.79 | ppl    44.324
| epoch 279 step   121400 |    192 batches | lr 7.16e-07 | ms/batch 421.97 | loss  3.79 | ppl    44.085
| epoch 279 step   121450 |    242 batches | lr 6.96e-07 | ms/batch 421.49 | loss  3.81 | ppl    45.023
| epoch 279 step   121500 |    292 batches | lr 6.77e-07 | ms/batch 421.72 | loss  3.86 | ppl    47.330
| epoch 279 step   121550 |    342 batches | lr 6.57e-07 | ms/batch 419.96 | loss  3.70 | ppl    40.525
| epoch 279 step   121600 |    392 batches | lr 6.39e-07 | ms/batch 419.50 | loss  3.78 | ppl    43.854
----------------------------------------------------------------------------------------------------
| Eval 304 at step   121600 | time: 174.12s | valid loss  4.19 | valid ppl    65.920
----------------------------------------------------------------------------------------------------
| epoch 280 step   121650 |      6 batches | lr 6.2e-07 | ms/batch 536.68 | loss  3.82 | ppl    45.633
| epoch 280 step   121700 |     56 batches | lr 6.02e-07 | ms/batch 421.76 | loss  3.74 | ppl    41.972
| epoch 280 step   121750 |    106 batches | lr 5.83e-07 | ms/batch 418.99 | loss  3.74 | ppl    42.294
| epoch 280 step   121800 |    156 batches | lr 5.66e-07 | ms/batch 419.18 | loss  3.79 | ppl    44.423
| epoch 280 step   121850 |    206 batches | lr 5.48e-07 | ms/batch 419.04 | loss  3.76 | ppl    43.115
| epoch 280 step   121900 |    256 batches | lr 5.31e-07 | ms/batch 419.46 | loss  3.82 | ppl    45.556
| epoch 280 step   121950 |    306 batches | lr 5.14e-07 | ms/batch 419.63 | loss  3.80 | ppl    44.873
| epoch 280 step   122000 |    356 batches | lr 4.97e-07 | ms/batch 420.84 | loss  3.72 | ppl    41.068
----------------------------------------------------------------------------------------------------
| Eval 305 at step   122000 | time: 173.78s | valid loss  4.19 | valid ppl    65.936
----------------------------------------------------------------------------------------------------
| epoch 280 step   122050 |    406 batches | lr 4.81e-07 | ms/batch 545.60 | loss  3.77 | ppl    43.461
| epoch 281 step   122100 |     20 batches | lr 4.65e-07 | ms/batch 413.73 | loss  3.81 | ppl    45.136
| epoch 281 step   122150 |     70 batches | lr 4.49e-07 | ms/batch 420.76 | loss  3.73 | ppl    41.553
| epoch 281 step   122200 |    120 batches | lr 4.33e-07 | ms/batch 421.59 | loss  3.79 | ppl    44.297
| epoch 281 step   122250 |    170 batches | lr 4.18e-07 | ms/batch 419.41 | loss  3.79 | ppl    44.236
| epoch 281 step   122300 |    220 batches | lr 4.03e-07 | ms/batch 421.59 | loss  3.81 | ppl    44.992
| epoch 281 step   122350 |    270 batches | lr 3.88e-07 | ms/batch 420.37 | loss  3.77 | ppl    43.330
| epoch 281 step   122400 |    320 batches | lr 3.73e-07 | ms/batch 420.16 | loss  3.79 | ppl    44.211
----------------------------------------------------------------------------------------------------
| Eval 306 at step   122400 | time: 174.17s | valid loss  4.19 | valid ppl    65.935
----------------------------------------------------------------------------------------------------
| epoch 281 step   122450 |    370 batches | lr 3.59e-07 | ms/batch 545.19 | loss  3.74 | ppl    42.056
| epoch 281 step   122500 |    420 batches | lr 3.45e-07 | ms/batch 421.78 | loss  3.78 | ppl    43.995
| epoch 282 step   122550 |     34 batches | lr 3.32e-07 | ms/batch 412.53 | loss  3.81 | ppl    45.145
| epoch 282 step   122600 |     84 batches | lr 3.18e-07 | ms/batch 419.14 | loss  3.70 | ppl    40.558
| epoch 282 step   122650 |    134 batches | lr 3.05e-07 | ms/batch 418.88 | loss  3.77 | ppl    43.501
| epoch 282 step   122700 |    184 batches | lr 2.92e-07 | ms/batch 417.96 | loss  3.78 | ppl    43.781
| epoch 282 step   122750 |    234 batches | lr 2.8e-07 | ms/batch 419.90 | loss  3.81 | ppl    45.358
| epoch 282 step   122800 |    284 batches | lr 2.67e-07 | ms/batch 420.13 | loss  3.82 | ppl    45.552
----------------------------------------------------------------------------------------------------
| Eval 307 at step   122800 | time: 173.78s | valid loss  4.19 | valid ppl    65.935
----------------------------------------------------------------------------------------------------
| epoch 282 step   122850 |    334 batches | lr 2.55e-07 | ms/batch 545.88 | loss  3.70 | ppl    40.510
| epoch 282 step   122900 |    384 batches | lr 2.44e-07 | ms/batch 421.46 | loss  3.79 | ppl    44.289
| epoch 282 step   122950 |    434 batches | lr 2.32e-07 | ms/batch 420.05 | loss  3.80 | ppl    44.728
| epoch 283 step   123000 |     48 batches | lr 2.21e-07 | ms/batch 410.93 | loss  3.74 | ppl    42.263
| epoch 283 step   123050 |     98 batches | lr 2.1e-07 | ms/batch 421.22 | loss  3.73 | ppl    41.810
| epoch 283 step   123100 |    148 batches | lr 1.99e-07 | ms/batch 420.42 | loss  3.79 | ppl    44.341
| epoch 283 step   123150 |    198 batches | lr 1.89e-07 | ms/batch 422.60 | loss  3.80 | ppl    44.481
| epoch 283 step   123200 |    248 batches | lr 1.79e-07 | ms/batch 422.29 | loss  3.79 | ppl    44.264
----------------------------------------------------------------------------------------------------
| Eval 308 at step   123200 | time: 174.26s | valid loss  4.19 | valid ppl    65.926
----------------------------------------------------------------------------------------------------
| epoch 283 step   123250 |    298 batches | lr 1.69e-07 | ms/batch 548.37 | loss  3.82 | ppl    45.786
| epoch 283 step   123300 |    348 batches | lr 1.6e-07 | ms/batch 421.90 | loss  3.69 | ppl    39.933
| epoch 283 step   123350 |    398 batches | lr 1.5e-07 | ms/batch 421.00 | loss  3.77 | ppl    43.508
| epoch 284 step   123400 |     12 batches | lr 1.41e-07 | ms/batch 412.83 | loss  3.82 | ppl    45.705
| epoch 284 step   123450 |     62 batches | lr 1.33e-07 | ms/batch 421.22 | loss  3.73 | ppl    41.709
| epoch 284 step   123500 |    112 batches | lr 1.24e-07 | ms/batch 420.16 | loss  3.76 | ppl    43.017
| epoch 284 step   123550 |    162 batches | lr 1.16e-07 | ms/batch 419.35 | loss  3.79 | ppl    44.050
| epoch 284 step   123600 |    212 batches | lr 1.08e-07 | ms/batch 421.01 | loss  3.79 | ppl    44.413
----------------------------------------------------------------------------------------------------
| Eval 309 at step   123600 | time: 174.29s | valid loss  4.19 | valid ppl    65.930
----------------------------------------------------------------------------------------------------
| epoch 284 step   123650 |    262 batches | lr 1.01e-07 | ms/batch 546.92 | loss  3.79 | ppl    44.202
| epoch 284 step   123700 |    312 batches | lr 9.34e-08 | ms/batch 419.80 | loss  3.80 | ppl    44.671
| epoch 284 step   123750 |    362 batches | lr 8.64e-08 | ms/batch 423.47 | loss  3.71 | ppl    40.840
| epoch 284 step   123800 |    412 batches | lr 7.96e-08 | ms/batch 421.09 | loss  3.76 | ppl    42.954
| epoch 285 step   123850 |     26 batches | lr 7.31e-08 | ms/batch 412.23 | loss  3.81 | ppl    45.302
| epoch 285 step   123900 |     76 batches | lr 6.69e-08 | ms/batch 420.13 | loss  3.71 | ppl    40.806
| epoch 285 step   123950 |    126 batches | lr 6.09e-08 | ms/batch 420.26 | loss  3.79 | ppl    44.388
| epoch 285 step   124000 |    176 batches | lr 5.53e-08 | ms/batch 420.94 | loss  3.76 | ppl    43.146
----------------------------------------------------------------------------------------------------
| Eval 310 at step   124000 | time: 174.24s | valid loss  4.19 | valid ppl    65.926
----------------------------------------------------------------------------------------------------
| epoch 285 step   124050 |    226 batches | lr 4.99e-08 | ms/batch 545.84 | loss  3.80 | ppl    44.804
| epoch 285 step   124100 |    276 batches | lr 4.48e-08 | ms/batch 421.31 | loss  3.81 | ppl    44.959
| epoch 285 step   124150 |    326 batches | lr 3.99e-08 | ms/batch 420.96 | loss  3.74 | ppl    42.011
| epoch 285 step   124200 |    376 batches | lr 3.54e-08 | ms/batch 419.81 | loss  3.76 | ppl    43.057
| epoch 285 step   124250 |    426 batches | lr 3.11e-08 | ms/batch 425.44 | loss  3.80 | ppl    44.569
| epoch 286 step   124300 |     40 batches | lr 2.71e-08 | ms/batch 413.54 | loss  3.78 | ppl    43.846
| epoch 286 step   124350 |     90 batches | lr 2.34e-08 | ms/batch 421.22 | loss  3.72 | ppl    41.274
| epoch 286 step   124400 |    140 batches | lr 1.99e-08 | ms/batch 421.84 | loss  3.79 | ppl    44.257
----------------------------------------------------------------------------------------------------
| Eval 311 at step   124400 | time: 174.51s | valid loss  4.19 | valid ppl    65.928
----------------------------------------------------------------------------------------------------
| epoch 286 step   124450 |    190 batches | lr 1.67e-08 | ms/batch 545.63 | loss  3.80 | ppl    44.614
| epoch 286 step   124500 |    240 batches | lr 1.38e-08 | ms/batch 419.81 | loss  3.80 | ppl    44.790
| epoch 286 step   124550 |    290 batches | lr 1.12e-08 | ms/batch 418.83 | loss  3.84 | ppl    46.486
| epoch 286 step   124600 |    340 batches | lr 8.84e-09 | ms/batch 419.76 | loss  3.67 | ppl    39.277
| epoch 286 step   124650 |    390 batches | lr 6.77e-09 | ms/batch 420.70 | loss  3.79 | ppl    44.343
| epoch 287 step   124700 |      4 batches | lr 4.97e-09 | ms/batch 412.61 | loss  3.80 | ppl    44.771
| epoch 287 step   124750 |     54 batches | lr 3.45e-09 | ms/batch 418.63 | loss  3.74 | ppl    42.164
| epoch 287 step   124800 |    104 batches | lr 2.21e-09 | ms/batch 418.56 | loss  3.75 | ppl    42.521
----------------------------------------------------------------------------------------------------
| Eval 312 at step   124800 | time: 173.69s | valid loss  4.19 | valid ppl    65.927
----------------------------------------------------------------------------------------------------
| epoch 287 step   124850 |    154 batches | lr 1.24e-09 | ms/batch 543.64 | loss  3.78 | ppl    43.834
| epoch 287 step   124900 |    204 batches | lr 5.53e-10 | ms/batch 419.67 | loss  3.81 | ppl    44.938
| epoch 287 step   124950 |    254 batches | lr 1.38e-10 | ms/batch 418.40 | loss  3.80 | ppl    44.692
| epoch 287 step   125000 |    304 batches | lr 0 | ms/batch 418.82 | loss  3.80 | ppl    44.876
----------------------------------------------------------------------------------------------------
End of training
====================================================================================================
| End of training | test loss  4.15 | test ppl    63.131
====================================================================================================
