====================================================================================================
    - data : ../data/wikitext-2/
    - dataset : wt103
    - n_layer : 16
    - n_head : 10
    - d_head : 40
    - d_embed : 400
    - d_model : 400
    - d_inner : 900
    - dropout : 0.2
    - dropoute : 0.2
    - dropouto : 0.5
    - dropouti : 0.6
    - dropatt : 0.2
    - init : normal
    - emb_init : normal
    - init_range : 0.1
    - emb_init_range : 0.01
    - init_std : 0.02
    - proj_init_std : 0.01
    - optim : adam
    - lr : 0.00035
    - mom : 0.0
    - scheduler : cosine
    - warmup_step : 3000
    - decay_rate : 0.5
    - lr_min : 0.0
    - clip : 0.25
    - clip_nonemb : False
    - max_step : 125000
    - batch_size : 32
    - batch_chunk : 1
    - tgt_len : 150
    - eval_tgt_len : 150
    - ext_len : 0
    - mem_len : 150
    - not_tied : False
    - seed : 555
    - cuda : True
    - adaptive : True
    - div_val : 1
    - pre_lnorm : False
    - varlen : False
    - multi_gpu : True
    - log_interval : 50
    - eval_interval : 400
    - work_dir : LM-TFM-wt103-555
    - restart : False
    - restart_dir : 
    - debug : False
    - same_length : False
    - attn_type : 0
    - clamp_len : -1
    - eta_min : 0.0
    - gpu0_bsz : 1
    - max_eval_steps : -1
    - sample_softmax : -1
    - patience : 0
    - finetune_v2 : False
    - finetune_v3 : False
    - fp16 : True
    - static_loss_scale : 1
    - dynamic_loss_scale : True
    - wdecay : 1.2e-06
    - tied : True
    - n_token : 33278
    - n_all_param : 37712881
    - n_nonemb_param : 24366400
====================================================================================================
#params = 37712881
#non emb params = 24366400
====================================================================================================
    - data : ../data/wikitext-2/
    - dataset : wt103
    - n_layer : 16
    - n_head : 10
    - d_head : 40
    - d_embed : 400
    - d_model : 400
    - d_inner : 900
    - dropout : 0.2
    - dropoute : 0.2
    - dropouto : 0.5
    - dropouti : 0.6
    - dropatt : 0.2
    - init : normal
    - emb_init : normal
    - init_range : 0.1
    - emb_init_range : 0.01
    - init_std : 0.02
    - proj_init_std : 0.01
    - optim : adam
    - lr : 0.00035
    - mom : 0.0
    - scheduler : cosine
    - warmup_step : 3000
    - decay_rate : 0.5
    - lr_min : 0.0
    - clip : 0.25
    - clip_nonemb : False
    - max_step : 125000
    - batch_size : 32
    - batch_chunk : 1
    - tgt_len : 150
    - eval_tgt_len : 150
    - ext_len : 0
    - mem_len : 150
    - not_tied : False
    - seed : 555
    - cuda : True
    - adaptive : True
    - div_val : 1
    - pre_lnorm : False
    - varlen : False
    - multi_gpu : True
    - log_interval : 50
    - eval_interval : 400
    - work_dir : LM-TFM-wt103-555
    - restart : False
    - restart_dir : 
    - debug : False
    - same_length : False
    - attn_type : 0
    - clamp_len : -1
    - eta_min : 0.0
    - gpu0_bsz : 0
    - max_eval_steps : -1
    - sample_softmax : -1
    - patience : 0
    - finetune_v2 : False
    - finetune_v3 : False
    - fp16 : True
    - static_loss_scale : 1
    - dynamic_loss_scale : True
    - wdecay : 1.2e-06
    - tied : True
    - n_token : 33278
    - n_all_param : 37712881
    - n_nonemb_param : 24366400
====================================================================================================
#params = 37712881
#non emb params = 24366400
====================================================================================================
    - data : ../data/wikitext-2/
    - dataset : wt103
    - n_layer : 16
    - n_head : 10
    - d_head : 40
    - d_embed : 400
    - d_model : 400
    - d_inner : 900
    - dropout : 0.2
    - dropoute : 0.2
    - dropouto : 0.5
    - dropouti : 0.6
    - dropatt : 0.2
    - init : normal
    - emb_init : normal
    - init_range : 0.1
    - emb_init_range : 0.01
    - init_std : 0.02
    - proj_init_std : 0.01
    - optim : adam
    - lr : 0.00035
    - mom : 0.0
    - scheduler : cosine
    - warmup_step : 3000
    - decay_rate : 0.5
    - lr_min : 0.0
    - clip : 0.25
    - clip_nonemb : False
    - max_step : 125000
    - batch_size : 32
    - batch_chunk : 1
    - tgt_len : 150
    - eval_tgt_len : 150
    - ext_len : 0
    - mem_len : 150
    - not_tied : False
    - seed : 555
    - cuda : True
    - adaptive : True
    - div_val : 1
    - pre_lnorm : False
    - varlen : False
    - multi_gpu : False
    - log_interval : 50
    - eval_interval : 400
    - work_dir : LM-TFM-wt103-555
    - restart : False
    - restart_dir : 
    - debug : False
    - same_length : False
    - attn_type : 0
    - clamp_len : -1
    - eta_min : 0.0
    - gpu0_bsz : 1
    - max_eval_steps : -1
    - sample_softmax : -1
    - patience : 0
    - finetune_v2 : False
    - finetune_v3 : False
    - fp16 : True
    - static_loss_scale : 1
    - dynamic_loss_scale : True
    - wdecay : 1.2e-06
    - tied : True
    - n_token : 33278
    - n_all_param : 37712881
    - n_nonemb_param : 24366400
====================================================================================================
#params = 37712881
#non emb params = 24366400
| epoch   1 step       50 |     50 batches | lr 5.83e-06 | ms/batch 333.07 | loss 10.23 | ppl 27696.532
| epoch   1 step      100 |    100 batches | lr 1.17e-05 | ms/batch 329.01 | loss  9.71 | ppl 16448.158
| epoch   1 step      150 |    150 batches | lr 1.75e-05 | ms/batch 329.26 | loss  9.28 | ppl 10768.441
| epoch   1 step      200 |    200 batches | lr 2.33e-05 | ms/batch 329.66 | loss  8.97 | ppl  7842.742
| epoch   1 step      250 |    250 batches | lr 2.92e-05 | ms/batch 329.42 | loss  8.59 | ppl  5356.648
| epoch   1 step      300 |    300 batches | lr 3.5e-05 | ms/batch 330.57 | loss  8.18 | ppl  3564.953
| epoch   1 step      350 |    350 batches | lr 4.08e-05 | ms/batch 330.30 | loss  7.75 | ppl  2311.257
| epoch   1 step      400 |    400 batches | lr 4.67e-05 | ms/batch 332.02 | loss  7.41 | ppl  1659.542
----------------------------------------------------------------------------------------------------
| Eval   1 at step      400 | time: 137.49s | valid loss  6.89 | valid ppl   984.020
----------------------------------------------------------------------------------------------------
| epoch   2 step      450 |     14 batches | lr 5.25e-05 | ms/batch 473.13 | loss  7.19 | ppl  1332.646
| epoch   2 step      500 |     64 batches | lr 5.83e-05 | ms/batch 340.07 | loss  7.05 | ppl  1158.275
| epoch   2 step      550 |    114 batches | lr 6.42e-05 | ms/batch 331.28 | loss  7.01 | ppl  1103.078
| epoch   2 step      600 |    164 batches | lr 7e-05 | ms/batch 330.89 | loss  6.98 | ppl  1070.979
| epoch   2 step      650 |    214 batches | lr 7.58e-05 | ms/batch 332.63 | loss  6.96 | ppl  1048.870
| epoch   2 step      700 |    264 batches | lr 8.17e-05 | ms/batch 332.25 | loss  6.88 | ppl   977.120
| epoch   2 step      750 |    314 batches | lr 8.75e-05 | ms/batch 332.28 | loss  6.88 | ppl   969.970
| epoch   2 step      800 |    364 batches | lr 9.33e-05 | ms/batch 331.14 | loss  6.80 | ppl   902.066
----------------------------------------------------------------------------------------------------
| Eval   2 at step      800 | time: 138.48s | valid loss  6.31 | valid ppl   550.188
----------------------------------------------------------------------------------------------------
| epoch   2 step      850 |    414 batches | lr 9.92e-05 | ms/batch 501.86 | loss  6.78 | ppl   880.069
| epoch   3 step      900 |     28 batches | lr 0.000105 | ms/batch 321.40 | loss  6.71 | ppl   823.911
| epoch   3 step      950 |     78 batches | lr 0.000111 | ms/batch 329.01 | loss  6.68 | ppl   800.123
| epoch   3 step     1000 |    128 batches | lr 0.000117 | ms/batch 329.72 | loss  6.60 | ppl   737.396
| epoch   3 step     1050 |    178 batches | lr 0.000122 | ms/batch 330.10 | loss  6.59 | ppl   730.173
| epoch   3 step     1100 |    228 batches | lr 0.000128 | ms/batch 330.39 | loss  6.59 | ppl   730.173
| epoch   3 step     1150 |    278 batches | lr 0.000134 | ms/batch 329.02 | loss  6.53 | ppl   687.866
| epoch   3 step     1200 |    328 batches | lr 0.00014 | ms/batch 330.28 | loss  6.51 | ppl   674.350
----------------------------------------------------------------------------------------------------
| Eval   3 at step     1200 | time: 136.68s | valid loss  5.96 | valid ppl   389.210
----------------------------------------------------------------------------------------------------
| epoch   3 step     1250 |    378 batches | lr 0.000146 | ms/batch 532.57 | loss  6.46 | ppl   637.465
| epoch   3 step     1300 |    428 batches | lr 0.000152 | ms/batch 325.62 | loss  6.44 | ppl   627.043
| epoch   4 step     1350 |     42 batches | lr 0.000157 | ms/batch 321.78 | loss  6.37 | ppl   585.474
| epoch   4 step     1400 |     92 batches | lr 0.000163 | ms/batch 330.18 | loss  6.38 | ppl   589.283
| epoch   4 step     1450 |    142 batches | lr 0.000169 | ms/batch 330.33 | loss  6.30 | ppl   545.764
| epoch   4 step     1500 |    192 batches | lr 0.000175 | ms/batch 330.21 | loss  6.32 | ppl   555.356
| epoch   4 step     1550 |    242 batches | lr 0.000181 | ms/batch 331.36 | loss  6.30 | ppl   542.406
| epoch   4 step     1600 |    292 batches | lr 0.000187 | ms/batch 328.64 | loss  6.26 | ppl   523.914
----------------------------------------------------------------------------------------------------
| Eval   4 at step     1600 | time: 136.54s | valid loss  5.72 | valid ppl   306.239
----------------------------------------------------------------------------------------------------
| epoch   4 step     1650 |    342 batches | lr 0.000193 | ms/batch 462.18 | loss  6.19 | ppl   489.488
| epoch   4 step     1700 |    392 batches | lr 0.000198 | ms/batch 326.64 | loss  6.19 | ppl   487.541
| epoch   5 step     1750 |      6 batches | lr 0.000204 | ms/batch 319.16 | loss  6.20 | ppl   494.987
| epoch   5 step     1800 |     56 batches | lr 0.00021 | ms/batch 332.21 | loss  6.14 | ppl   461.811
| epoch   5 step     1850 |    106 batches | lr 0.000216 | ms/batch 332.38 | loss  6.13 | ppl   457.180
| epoch   5 step     1900 |    156 batches | lr 0.000222 | ms/batch 330.52 | loss  6.06 | ppl   427.974
| epoch   5 step     1950 |    206 batches | lr 0.000228 | ms/batch 332.06 | loss  6.10 | ppl   445.753
| epoch   5 step     2000 |    256 batches | lr 0.000233 | ms/batch 330.68 | loss  6.04 | ppl   421.076
----------------------------------------------------------------------------------------------------
| Eval   5 at step     2000 | time: 136.87s | valid loss  5.53 | valid ppl   251.152
----------------------------------------------------------------------------------------------------
| epoch   5 step     2050 |    306 batches | lr 0.000239 | ms/batch 474.50 | loss  6.04 | ppl   419.172
| epoch   5 step     2100 |    356 batches | lr 0.000245 | ms/batch 326.17 | loss  5.94 | ppl   380.678
| epoch   5 step     2150 |    406 batches | lr 0.000251 | ms/batch 326.13 | loss  5.97 | ppl   391.567
| epoch   6 step     2200 |     20 batches | lr 0.000257 | ms/batch 323.82 | loss  5.99 | ppl   397.578
| epoch   6 step     2250 |     70 batches | lr 0.000262 | ms/batch 330.03 | loss  5.91 | ppl   368.332
| epoch   6 step     2300 |    120 batches | lr 0.000268 | ms/batch 331.69 | loss  5.90 | ppl   364.496
| epoch   6 step     2350 |    170 batches | lr 0.000274 | ms/batch 330.33 | loss  5.87 | ppl   355.719
| epoch   6 step     2400 |    220 batches | lr 0.00028 | ms/batch 334.10 | loss  5.91 | ppl   366.896
----------------------------------------------------------------------------------------------------
| Eval   6 at step     2400 | time: 137.19s | valid loss  5.40 | valid ppl   222.282
----------------------------------------------------------------------------------------------------
| epoch   6 step     2450 |    270 batches | lr 0.000286 | ms/batch 485.91 | loss  5.86 | ppl   350.423
| epoch   6 step     2500 |    320 batches | lr 0.000292 | ms/batch 329.10 | loss  5.83 | ppl   342.038
| epoch   6 step     2550 |    370 batches | lr 0.000297 | ms/batch 325.13 | loss  5.79 | ppl   326.681
| epoch   6 step     2600 |    420 batches | lr 0.000303 | ms/batch 339.98 | loss  5.82 | ppl   338.159
| epoch   7 step     2650 |     34 batches | lr 0.000309 | ms/batch 323.39 | loss  5.82 | ppl   338.317
| epoch   7 step     2700 |     84 batches | lr 0.000315 | ms/batch 330.24 | loss  5.74 | ppl   311.892
| epoch   7 step     2750 |    134 batches | lr 0.000321 | ms/batch 328.39 | loss  5.74 | ppl   310.482
| epoch   7 step     2800 |    184 batches | lr 0.000327 | ms/batch 330.49 | loss  5.75 | ppl   314.756
----------------------------------------------------------------------------------------------------
| Eval   7 at step     2800 | time: 139.40s | valid loss  5.26 | valid ppl   191.889
----------------------------------------------------------------------------------------------------
| epoch   7 step     2850 |    234 batches | lr 0.000333 | ms/batch 480.11 | loss  5.73 | ppl   308.933
| epoch   7 step     2900 |    284 batches | lr 0.000338 | ms/batch 323.88 | loss  5.73 | ppl   306.793
| epoch   7 step     2950 |    334 batches | lr 0.000344 | ms/batch 324.79 | loss  5.64 | ppl   282.520
| epoch   7 step     3000 |    384 batches | lr 0.00035 | ms/batch 324.78 | loss  5.65 | ppl   285.025
| epoch   7 step     3050 |    434 batches | lr 0.000349 | ms/batch 328.86 | loss  5.69 | ppl   297.330
| epoch   8 step     3100 |     48 batches | lr 0.000349 | ms/batch 328.91 | loss  5.61 | ppl   274.406
| epoch   8 step     3150 |     98 batches | lr 0.000349 | ms/batch 337.24 | loss  5.63 | ppl   278.227
| epoch   8 step     3200 |    148 batches | lr 0.000349 | ms/batch 341.38 | loss  5.58 | ppl   266.109
----------------------------------------------------------------------------------------------------
| Eval   8 at step     3200 | time: 137.73s | valid loss  5.20 | valid ppl   181.409
----------------------------------------------------------------------------------------------------
| epoch   8 step     3250 |    198 batches | lr 0.000349 | ms/batch 482.39 | loss  5.63 | ppl   279.491
| epoch   8 step     3300 |    248 batches | lr 0.000349 | ms/batch 326.95 | loss  5.60 | ppl   271.294
| epoch   8 step     3350 |    298 batches | lr 0.000349 | ms/batch 325.05 | loss  5.59 | ppl   268.448
| epoch   8 step     3400 |    348 batches | lr 0.000349 | ms/batch 324.70 | loss  5.48 | ppl   238.986
| epoch   8 step     3450 |    398 batches | lr 0.000349 | ms/batch 326.31 | loss  5.55 | ppl   257.338
| epoch   9 step     3500 |     12 batches | lr 0.000349 | ms/batch 320.99 | loss  5.58 | ppl   264.616
| epoch   9 step     3550 |     62 batches | lr 0.000349 | ms/batch 331.27 | loss  5.49 | ppl   242.295
| epoch   9 step     3600 |    112 batches | lr 0.000349 | ms/batch 331.16 | loss  5.50 | ppl   245.765
----------------------------------------------------------------------------------------------------
| Eval   9 at step     3600 | time: 135.77s | valid loss  5.07 | valid ppl   159.891
----------------------------------------------------------------------------------------------------
| epoch   9 step     3650 |    162 batches | lr 0.000349 | ms/batch 464.01 | loss  5.49 | ppl   241.200
| epoch   9 step     3700 |    212 batches | lr 0.000349 | ms/batch 325.35 | loss  5.52 | ppl   248.681
| epoch   9 step     3750 |    262 batches | lr 0.000349 | ms/batch 325.40 | loss  5.46 | ppl   234.419
| epoch   9 step     3800 |    312 batches | lr 0.000349 | ms/batch 325.54 | loss  5.48 | ppl   239.454
| epoch   9 step     3850 |    362 batches | lr 0.000349 | ms/batch 324.57 | loss  5.43 | ppl   228.363
| epoch   9 step     3900 |    412 batches | lr 0.000349 | ms/batch 329.45 | loss  5.44 | ppl   229.526
| epoch  10 step     3950 |     26 batches | lr 0.000349 | ms/batch 320.30 | loss  5.47 | ppl   236.571
| epoch  10 step     4000 |     76 batches | lr 0.000349 | ms/batch 329.24 | loss  5.40 | ppl   220.888
----------------------------------------------------------------------------------------------------
| Eval  10 at step     4000 | time: 135.78s | valid loss  5.02 | valid ppl   151.887
----------------------------------------------------------------------------------------------------
| epoch  10 step     4050 |    126 batches | lr 0.000349 | ms/batch 470.62 | loss  5.39 | ppl   218.178
| epoch  10 step     4100 |    176 batches | lr 0.000349 | ms/batch 325.42 | loss  5.41 | ppl   223.911
| epoch  10 step     4150 |    226 batches | lr 0.000349 | ms/batch 326.05 | loss  5.41 | ppl   223.824
| epoch  10 step     4200 |    276 batches | lr 0.000349 | ms/batch 325.31 | loss  5.39 | ppl   219.426
| epoch  10 step     4250 |    326 batches | lr 0.000349 | ms/batch 326.15 | loss  5.34 | ppl   207.862
| epoch  10 step     4300 |    376 batches | lr 0.000349 | ms/batch 325.81 | loss  5.35 | ppl   211.070
| epoch  10 step     4350 |    426 batches | lr 0.000349 | ms/batch 324.79 | loss  5.36 | ppl   212.492
| epoch  11 step     4400 |     40 batches | lr 0.000349 | ms/batch 322.59 | loss  5.35 | ppl   210.132
----------------------------------------------------------------------------------------------------
| Eval  11 at step     4400 | time: 135.40s | valid loss  4.95 | valid ppl   141.271
----------------------------------------------------------------------------------------------------
| epoch  11 step     4450 |     90 batches | lr 0.000349 | ms/batch 463.86 | loss  5.33 | ppl   205.987
| epoch  11 step     4500 |    140 batches | lr 0.000349 | ms/batch 325.11 | loss  5.31 | ppl   203.047
| epoch  11 step     4550 |    190 batches | lr 0.000349 | ms/batch 325.94 | loss  5.35 | ppl   209.836
| epoch  11 step     4600 |    240 batches | lr 0.000349 | ms/batch 326.58 | loss  5.33 | ppl   205.746
| epoch  11 step     4650 |    290 batches | lr 0.000349 | ms/batch 325.86 | loss  5.36 | ppl   213.141
| epoch  11 step     4700 |    340 batches | lr 0.000349 | ms/batch 329.04 | loss  5.22 | ppl   184.256
| epoch  11 step     4750 |    390 batches | lr 0.000349 | ms/batch 342.61 | loss  5.30 | ppl   200.776
| epoch  12 step     4800 |      4 batches | lr 0.000349 | ms/batch 319.39 | loss  5.31 | ppl   201.766
----------------------------------------------------------------------------------------------------
| Eval  12 at step     4800 | time: 136.14s | valid loss  4.93 | valid ppl   138.104
----------------------------------------------------------------------------------------------------
| epoch  12 step     4850 |     54 batches | lr 0.000349 | ms/batch 470.90 | loss  5.25 | ppl   190.820
| epoch  12 step     4900 |    104 batches | lr 0.000349 | ms/batch 326.81 | loss  5.25 | ppl   191.462
| epoch  12 step     4950 |    154 batches | lr 0.000349 | ms/batch 328.74 | loss  5.24 | ppl   188.008
| epoch  12 step     5000 |    204 batches | lr 0.000349 | ms/batch 325.86 | loss  5.28 | ppl   196.662
| epoch  12 step     5050 |    254 batches | lr 0.000349 | ms/batch 324.67 | loss  5.27 | ppl   194.294
| epoch  12 step     5100 |    304 batches | lr 0.000349 | ms/batch 324.35 | loss  5.28 | ppl   195.497
| epoch  12 step     5150 |    354 batches | lr 0.000349 | ms/batch 327.21 | loss  5.17 | ppl   175.366
| epoch  12 step     5200 |    404 batches | lr 0.000349 | ms/batch 324.76 | loss  5.23 | ppl   186.589
----------------------------------------------------------------------------------------------------
| Eval  13 at step     5200 | time: 135.94s | valid loss  4.87 | valid ppl   130.483
----------------------------------------------------------------------------------------------------
| epoch  13 step     5250 |     18 batches | lr 0.000348 | ms/batch 474.38 | loss  5.26 | ppl   192.271
| epoch  13 step     5300 |     68 batches | lr 0.000348 | ms/batch 324.72 | loss  5.18 | ppl   177.808
| epoch  13 step     5350 |    118 batches | lr 0.000348 | ms/batch 325.65 | loss  5.21 | ppl   183.180
| epoch  13 step     5400 |    168 batches | lr 0.000348 | ms/batch 325.11 | loss  5.18 | ppl   177.752
| epoch  13 step     5450 |    218 batches | lr 0.000348 | ms/batch 326.55 | loss  5.22 | ppl   185.441
| epoch  13 step     5500 |    268 batches | lr 0.000348 | ms/batch 324.87 | loss  5.18 | ppl   177.405
| epoch  13 step     5550 |    318 batches | lr 0.000348 | ms/batch 326.71 | loss  5.18 | ppl   177.794
| epoch  13 step     5600 |    368 batches | lr 0.000348 | ms/batch 326.21 | loss  5.16 | ppl   173.404
----------------------------------------------------------------------------------------------------
| Eval  14 at step     5600 | time: 135.99s | valid loss  4.84 | valid ppl   126.438
----------------------------------------------------------------------------------------------------
| epoch  13 step     5650 |    418 batches | lr 0.000348 | ms/batch 465.84 | loss  5.17 | ppl   176.259
| epoch  14 step     5700 |     32 batches | lr 0.000348 | ms/batch 320.06 | loss  5.20 | ppl   181.527
| epoch  14 step     5750 |     82 batches | lr 0.000348 | ms/batch 325.81 | loss  5.13 | ppl   168.925
| epoch  14 step     5800 |    132 batches | lr 0.000348 | ms/batch 328.21 | loss  5.13 | ppl   169.652
| epoch  14 step     5850 |    182 batches | lr 0.000348 | ms/batch 325.77 | loss  5.15 | ppl   171.974
| epoch  14 step     5900 |    232 batches | lr 0.000348 | ms/batch 324.93 | loss  5.15 | ppl   172.863
| epoch  14 step     5950 |    282 batches | lr 0.000348 | ms/batch 331.87 | loss  5.15 | ppl   172.351
| epoch  14 step     6000 |    332 batches | lr 0.000348 | ms/batch 330.58 | loss  5.06 | ppl   157.148
----------------------------------------------------------------------------------------------------
| Eval  15 at step     6000 | time: 136.04s | valid loss  4.79 | valid ppl   120.587
----------------------------------------------------------------------------------------------------
| epoch  14 step     6050 |    382 batches | lr 0.000348 | ms/batch 468.20 | loss  5.11 | ppl   165.774
| epoch  14 step     6100 |    432 batches | lr 0.000348 | ms/batch 330.62 | loss  5.15 | ppl   172.270
| epoch  15 step     6150 |     46 batches | lr 0.000348 | ms/batch 321.80 | loss  5.12 | ppl   167.244
| epoch  15 step     6200 |     96 batches | lr 0.000348 | ms/batch 325.65 | loss  5.08 | ppl   160.837
| epoch  15 step     6250 |    146 batches | lr 0.000348 | ms/batch 325.62 | loss  5.09 | ppl   162.441
| epoch  15 step     6300 |    196 batches | lr 0.000348 | ms/batch 326.54 | loss  5.13 | ppl   168.661
| epoch  15 step     6350 |    246 batches | lr 0.000348 | ms/batch 325.76 | loss  5.11 | ppl   166.137
| epoch  15 step     6400 |    296 batches | lr 0.000348 | ms/batch 323.84 | loss  5.14 | ppl   170.542
----------------------------------------------------------------------------------------------------
| Eval  16 at step     6400 | time: 135.76s | valid loss  4.76 | valid ppl   117.227
----------------------------------------------------------------------------------------------------
| epoch  15 step     6450 |    346 batches | lr 0.000348 | ms/batch 468.01 | loss  5.01 | ppl   149.239
| epoch  15 step     6500 |    396 batches | lr 0.000348 | ms/batch 329.51 | loss  5.08 | ppl   160.072
| epoch  16 step     6550 |     10 batches | lr 0.000348 | ms/batch 323.69 | loss  5.12 | ppl   166.722
| epoch  16 step     6600 |     60 batches | lr 0.000348 | ms/batch 325.52 | loss  5.05 | ppl   156.437
| epoch  16 step     6650 |    110 batches | lr 0.000348 | ms/batch 326.35 | loss  5.04 | ppl   154.349
| epoch  16 step     6700 |    160 batches | lr 0.000348 | ms/batch 326.27 | loss  5.05 | ppl   155.962
| epoch  16 step     6750 |    210 batches | lr 0.000347 | ms/batch 326.23 | loss  5.08 | ppl   161.151
| epoch  16 step     6800 |    260 batches | lr 0.000347 | ms/batch 324.89 | loss  5.07 | ppl   159.560
----------------------------------------------------------------------------------------------------
| Eval  17 at step     6800 | time: 135.86s | valid loss  4.74 | valid ppl   114.804
----------------------------------------------------------------------------------------------------
| epoch  16 step     6850 |    310 batches | lr 0.000347 | ms/batch 500.81 | loss  5.08 | ppl   160.573
| epoch  16 step     6900 |    360 batches | lr 0.000347 | ms/batch 331.87 | loss  4.99 | ppl   147.097
| epoch  16 step     6950 |    410 batches | lr 0.000347 | ms/batch 333.38 | loss  5.03 | ppl   152.361
| epoch  17 step     7000 |     24 batches | lr 0.000347 | ms/batch 321.54 | loss  5.06 | ppl   157.948
| epoch  17 step     7050 |     74 batches | lr 0.000347 | ms/batch 324.03 | loss  4.97 | ppl   144.263
| epoch  17 step     7100 |    124 batches | lr 0.000347 | ms/batch 324.72 | loss  4.99 | ppl   146.444
| epoch  17 step     7150 |    174 batches | lr 0.000347 | ms/batch 325.55 | loss  5.03 | ppl   153.412
| epoch  17 step     7200 |    224 batches | lr 0.000347 | ms/batch 325.57 | loss  5.03 | ppl   153.484
----------------------------------------------------------------------------------------------------
| Eval  18 at step     7200 | time: 136.16s | valid loss  4.70 | valid ppl   110.254
----------------------------------------------------------------------------------------------------
| epoch  17 step     7250 |    274 batches | lr 0.000347 | ms/batch 467.05 | loss  5.03 | ppl   153.160
| epoch  17 step     7300 |    324 batches | lr 0.000347 | ms/batch 329.78 | loss  4.97 | ppl   144.016
| epoch  17 step     7350 |    374 batches | lr 0.000347 | ms/batch 329.81 | loss  4.98 | ppl   145.474
| epoch  17 step     7400 |    424 batches | lr 0.000347 | ms/batch 329.57 | loss  5.00 | ppl   147.950
| epoch  18 step     7450 |     38 batches | lr 0.000347 | ms/batch 319.38 | loss  5.00 | ppl   147.696
| epoch  18 step     7500 |     88 batches | lr 0.000347 | ms/batch 326.42 | loss  4.95 | ppl   141.385
| epoch  18 step     7550 |    138 batches | lr 0.000347 | ms/batch 326.85 | loss  4.98 | ppl   145.634
| epoch  18 step     7600 |    188 batches | lr 0.000347 | ms/batch 326.00 | loss  4.97 | ppl   144.252
----------------------------------------------------------------------------------------------------
| Eval  19 at step     7600 | time: 137.05s | valid loss  4.70 | valid ppl   110.331
----------------------------------------------------------------------------------------------------
| epoch  18 step     7650 |    238 batches | lr 0.000347 | ms/batch 433.76 | loss  5.00 | ppl   148.773
| epoch  18 step     7700 |    288 batches | lr 0.000347 | ms/batch 327.94 | loss  5.03 | ppl   152.945
| epoch  18 step     7750 |    338 batches | lr 0.000347 | ms/batch 328.94 | loss  4.91 | ppl   135.047
| epoch  18 step     7800 |    388 batches | lr 0.000347 | ms/batch 327.75 | loss  4.96 | ppl   142.895
| epoch  19 step     7850 |      2 batches | lr 0.000347 | ms/batch 321.85 | loss  5.00 | ppl   147.719
| epoch  19 step     7900 |     52 batches | lr 0.000347 | ms/batch 325.28 | loss  4.93 | ppl   138.325
| epoch  19 step     7950 |    102 batches | lr 0.000347 | ms/batch 325.67 | loss  4.94 | ppl   139.399
| epoch  19 step     8000 |    152 batches | lr 0.000346 | ms/batch 323.70 | loss  4.94 | ppl   139.879
----------------------------------------------------------------------------------------------------
| Eval  20 at step     8000 | time: 135.70s | valid loss  4.67 | valid ppl   106.398
----------------------------------------------------------------------------------------------------
| epoch  19 step     8050 |    202 batches | lr 0.000346 | ms/batch 463.56 | loss  4.97 | ppl   144.185
| epoch  19 step     8100 |    252 batches | lr 0.000346 | ms/batch 329.99 | loss  4.97 | ppl   144.049
| epoch  19 step     8150 |    302 batches | lr 0.000346 | ms/batch 329.72 | loss  4.98 | ppl   144.930
| epoch  19 step     8200 |    352 batches | lr 0.000346 | ms/batch 339.85 | loss  4.86 | ppl   129.075
| epoch  19 step     8250 |    402 batches | lr 0.000346 | ms/batch 343.42 | loss  4.95 | ppl   141.263
| epoch  20 step     8300 |     16 batches | lr 0.000346 | ms/batch 337.01 | loss  4.97 | ppl   144.557
| epoch  20 step     8350 |     66 batches | lr 0.000346 | ms/batch 325.25 | loss  4.89 | ppl   133.016
| epoch  20 step     8400 |    116 batches | lr 0.000346 | ms/batch 325.83 | loss  4.91 | ppl   135.894
----------------------------------------------------------------------------------------------------
| Eval  21 at step     8400 | time: 138.20s | valid loss  4.65 | valid ppl   104.885
----------------------------------------------------------------------------------------------------
| epoch  20 step     8450 |    166 batches | lr 0.000346 | ms/batch 465.71 | loss  4.91 | ppl   136.032
| epoch  20 step     8500 |    216 batches | lr 0.000346 | ms/batch 330.22 | loss  4.95 | ppl   140.822
| epoch  20 step     8550 |    266 batches | lr 0.000346 | ms/batch 328.50 | loss  4.91 | ppl   135.735
| epoch  20 step     8600 |    316 batches | lr 0.000346 | ms/batch 335.49 | loss  4.90 | ppl   134.868
| epoch  20 step     8650 |    366 batches | lr 0.000346 | ms/batch 329.72 | loss  4.88 | ppl   131.682
| epoch  20 step     8700 |    416 batches | lr 0.000346 | ms/batch 328.98 | loss  4.90 | ppl   134.290
| epoch  21 step     8750 |     30 batches | lr 0.000346 | ms/batch 319.98 | loss  4.93 | ppl   137.905
| epoch  21 step     8800 |     80 batches | lr 0.000346 | ms/batch 323.92 | loss  4.85 | ppl   128.020
----------------------------------------------------------------------------------------------------
| Eval  22 at step     8800 | time: 136.57s | valid loss  4.64 | valid ppl   103.724
----------------------------------------------------------------------------------------------------
| epoch  21 step     8850 |    130 batches | lr 0.000346 | ms/batch 482.96 | loss  4.87 | ppl   130.036
| epoch  21 step     8900 |    180 batches | lr 0.000346 | ms/batch 330.12 | loss  4.86 | ppl   129.600
| epoch  21 step     8950 |    230 batches | lr 0.000346 | ms/batch 329.19 | loss  4.89 | ppl   133.505
| epoch  21 step     9000 |    280 batches | lr 0.000346 | ms/batch 330.04 | loss  4.92 | ppl   137.496
| epoch  21 step     9050 |    330 batches | lr 0.000345 | ms/batch 330.56 | loss  4.84 | ppl   126.756
| epoch  21 step     9100 |    380 batches | lr 0.000345 | ms/batch 330.05 | loss  4.86 | ppl   129.539
| epoch  21 step     9150 |    430 batches | lr 0.000345 | ms/batch 329.06 | loss  4.90 | ppl   134.227
| epoch  22 step     9200 |     44 batches | lr 0.000345 | ms/batch 321.05 | loss  4.86 | ppl   128.783
----------------------------------------------------------------------------------------------------
| Eval  23 at step     9200 | time: 137.36s | valid loss  4.62 | valid ppl   101.286
----------------------------------------------------------------------------------------------------
| epoch  22 step     9250 |     94 batches | lr 0.000345 | ms/batch 464.62 | loss  4.82 | ppl   124.421
| epoch  22 step     9300 |    144 batches | lr 0.000345 | ms/batch 339.15 | loss  4.87 | ppl   130.392
| epoch  22 step     9350 |    194 batches | lr 0.000345 | ms/batch 332.14 | loss  4.86 | ppl   129.095
| epoch  22 step     9400 |    244 batches | lr 0.000345 | ms/batch 329.95 | loss  4.88 | ppl   132.032
| epoch  22 step     9450 |    294 batches | lr 0.000345 | ms/batch 330.54 | loss  4.93 | ppl   137.765
| epoch  22 step     9500 |    344 batches | lr 0.000345 | ms/batch 329.16 | loss  4.77 | ppl   118.039
| epoch  22 step     9550 |    394 batches | lr 0.000345 | ms/batch 329.89 | loss  4.86 | ppl   129.246
| epoch  23 step     9600 |      8 batches | lr 0.000345 | ms/batch 323.24 | loss  4.88 | ppl   132.063
----------------------------------------------------------------------------------------------------
| Eval  24 at step     9600 | time: 137.35s | valid loss  4.60 | valid ppl    99.833
----------------------------------------------------------------------------------------------------
| epoch  23 step     9650 |     58 batches | lr 0.000345 | ms/batch 465.74 | loss  4.82 | ppl   123.820
| epoch  23 step     9700 |    108 batches | lr 0.000345 | ms/batch 329.35 | loss  4.80 | ppl   121.302
| epoch  23 step     9750 |    158 batches | lr 0.000345 | ms/batch 329.82 | loss  4.83 | ppl   124.898
| epoch  23 step     9800 |    208 batches | lr 0.000345 | ms/batch 330.27 | loss  4.86 | ppl   128.762
| epoch  23 step     9850 |    258 batches | lr 0.000345 | ms/batch 329.48 | loss  4.86 | ppl   128.964
| epoch  23 step     9900 |    308 batches | lr 0.000345 | ms/batch 330.16 | loss  4.84 | ppl   126.806
| epoch  23 step     9950 |    358 batches | lr 0.000345 | ms/batch 330.60 | loss  4.77 | ppl   117.947
| epoch  23 step    10000 |    408 batches | lr 0.000345 | ms/batch 329.74 | loss  4.80 | ppl   121.986
----------------------------------------------------------------------------------------------------
| Eval  25 at step    10000 | time: 137.81s | valid loss  4.60 | valid ppl    99.699
----------------------------------------------------------------------------------------------------
| epoch  24 step    10050 |     22 batches | lr 0.000344 | ms/batch 463.77 | loss  4.85 | ppl   127.511
| epoch  24 step    10100 |     72 batches | lr 0.000344 | ms/batch 329.50 | loss  4.78 | ppl   118.575
| epoch  24 step    10150 |    122 batches | lr 0.000344 | ms/batch 330.38 | loss  4.80 | ppl   121.491
| epoch  24 step    10200 |    172 batches | lr 0.000344 | ms/batch 331.15 | loss  4.83 | ppl   124.820
| epoch  24 step    10250 |    222 batches | lr 0.000344 | ms/batch 330.87 | loss  4.81 | ppl   122.186
| epoch  24 step    10300 |    272 batches | lr 0.000344 | ms/batch 329.94 | loss  4.83 | ppl   125.485
| epoch  24 step    10350 |    322 batches | lr 0.000344 | ms/batch 329.67 | loss  4.78 | ppl   119.393
| epoch  24 step    10400 |    372 batches | lr 0.000344 | ms/batch 329.77 | loss  4.76 | ppl   117.276
----------------------------------------------------------------------------------------------------
| Eval  26 at step    10400 | time: 137.06s | valid loss  4.58 | valid ppl    97.153
----------------------------------------------------------------------------------------------------
| epoch  24 step    10450 |    422 batches | lr 0.000344 | ms/batch 482.73 | loss  4.80 | ppl   121.624
| epoch  25 step    10500 |     36 batches | lr 0.000344 | ms/batch 324.13 | loss  4.82 | ppl   123.424
| epoch  25 step    10550 |     86 batches | lr 0.000344 | ms/batch 330.51 | loss  4.77 | ppl   117.469
| epoch  25 step    10600 |    136 batches | lr 0.000344 | ms/batch 333.47 | loss  4.78 | ppl   119.683
| epoch  25 step    10650 |    186 batches | lr 0.000344 | ms/batch 331.29 | loss  4.80 | ppl   121.948
| epoch  25 step    10700 |    236 batches | lr 0.000344 | ms/batch 329.28 | loss  4.80 | ppl   121.368
| epoch  25 step    10750 |    286 batches | lr 0.000344 | ms/batch 331.35 | loss  4.83 | ppl   125.358
| epoch  25 step    10800 |    336 batches | lr 0.000344 | ms/batch 329.71 | loss  4.73 | ppl   112.775
----------------------------------------------------------------------------------------------------
| Eval  27 at step    10800 | time: 137.11s | valid loss  4.57 | valid ppl    96.202
----------------------------------------------------------------------------------------------------
| epoch  25 step    10850 |    386 batches | lr 0.000344 | ms/batch 462.67 | loss  4.79 | ppl   120.716
| epoch  25 step    10900 |    436 batches | lr 0.000343 | ms/batch 322.36 | loss  4.81 | ppl   122.454
| epoch  26 step    10950 |     50 batches | lr 0.000343 | ms/batch 328.60 | loss  4.77 | ppl   117.478
| epoch  26 step    11000 |    100 batches | lr 0.000343 | ms/batch 331.12 | loss  4.76 | ppl   117.130
| epoch  26 step    11050 |    150 batches | lr 0.000343 | ms/batch 331.83 | loss  4.74 | ppl   114.461
| epoch  26 step    11100 |    200 batches | lr 0.000343 | ms/batch 340.94 | loss  4.81 | ppl   122.502
| epoch  26 step    11150 |    250 batches | lr 0.000343 | ms/batch 337.15 | loss  4.81 | ppl   123.048
| epoch  26 step    11200 |    300 batches | lr 0.000343 | ms/batch 330.28 | loss  4.82 | ppl   123.501
----------------------------------------------------------------------------------------------------
| Eval  28 at step    11200 | time: 137.59s | valid loss  4.55 | valid ppl    94.537
----------------------------------------------------------------------------------------------------
| epoch  26 step    11250 |    350 batches | lr 0.000343 | ms/batch 515.21 | loss  4.68 | ppl   108.150
| epoch  26 step    11300 |    400 batches | lr 0.000343 | ms/batch 327.64 | loss  4.76 | ppl   116.965
| epoch  27 step    11350 |     14 batches | lr 0.000343 | ms/batch 322.79 | loss  4.79 | ppl   119.935
| epoch  27 step    11400 |     64 batches | lr 0.000343 | ms/batch 331.04 | loss  4.73 | ppl   113.766
| epoch  27 step    11450 |    114 batches | lr 0.000343 | ms/batch 331.16 | loss  4.76 | ppl   116.254
| epoch  27 step    11500 |    164 batches | lr 0.000343 | ms/batch 331.44 | loss  4.75 | ppl   115.503
| epoch  27 step    11550 |    214 batches | lr 0.000343 | ms/batch 332.16 | loss  4.77 | ppl   117.882
| epoch  27 step    11600 |    264 batches | lr 0.000343 | ms/batch 331.89 | loss  4.76 | ppl   116.309
----------------------------------------------------------------------------------------------------
| Eval  29 at step    11600 | time: 136.99s | valid loss  4.55 | valid ppl    95.061
----------------------------------------------------------------------------------------------------
| epoch  27 step    11650 |    314 batches | lr 0.000343 | ms/batch 431.88 | loss  4.76 | ppl   117.194
| epoch  27 step    11700 |    364 batches | lr 0.000342 | ms/batch 328.82 | loss  4.71 | ppl   110.585
| epoch  27 step    11750 |    414 batches | lr 0.000342 | ms/batch 328.35 | loss  4.72 | ppl   112.063
| epoch  28 step    11800 |     28 batches | lr 0.000342 | ms/batch 324.02 | loss  4.77 | ppl   117.975
| epoch  28 step    11850 |     78 batches | lr 0.000342 | ms/batch 329.91 | loss  4.72 | ppl   112.388
| epoch  28 step    11900 |    128 batches | lr 0.000342 | ms/batch 331.54 | loss  4.73 | ppl   113.766
| epoch  28 step    11950 |    178 batches | lr 0.000342 | ms/batch 332.65 | loss  4.74 | ppl   114.006
| epoch  28 step    12000 |    228 batches | lr 0.000342 | ms/batch 328.92 | loss  4.75 | ppl   115.566
----------------------------------------------------------------------------------------------------
| Eval  30 at step    12000 | time: 136.76s | valid loss  4.54 | valid ppl    93.392
----------------------------------------------------------------------------------------------------
| epoch  28 step    12050 |    278 batches | lr 0.000342 | ms/batch 483.02 | loss  4.78 | ppl   119.337
| epoch  28 step    12100 |    328 batches | lr 0.000342 | ms/batch 326.50 | loss  4.70 | ppl   109.501
| epoch  28 step    12150 |    378 batches | lr 0.000342 | ms/batch 325.83 | loss  4.72 | ppl   111.626
| epoch  28 step    12200 |    428 batches | lr 0.000342 | ms/batch 324.95 | loss  4.75 | ppl   115.188
| epoch  29 step    12250 |     42 batches | lr 0.000342 | ms/batch 321.81 | loss  4.73 | ppl   113.624
| epoch  29 step    12300 |     92 batches | lr 0.000342 | ms/batch 329.93 | loss  4.69 | ppl   108.463
| epoch  29 step    12350 |    142 batches | lr 0.000342 | ms/batch 329.82 | loss  4.70 | ppl   110.291
| epoch  29 step    12400 |    192 batches | lr 0.000342 | ms/batch 329.18 | loss  4.74 | ppl   114.336
----------------------------------------------------------------------------------------------------
| Eval  31 at step    12400 | time: 136.02s | valid loss  4.53 | valid ppl    92.818
----------------------------------------------------------------------------------------------------
| epoch  29 step    12450 |    242 batches | lr 0.000342 | ms/batch 463.54 | loss  4.74 | ppl   114.024
| epoch  29 step    12500 |    292 batches | lr 0.000341 | ms/batch 326.28 | loss  4.78 | ppl   119.030
| epoch  29 step    12550 |    342 batches | lr 0.000341 | ms/batch 325.94 | loss  4.64 | ppl   103.415
| epoch  29 step    12600 |    392 batches | lr 0.000341 | ms/batch 325.12 | loss  4.72 | ppl   112.054
| epoch  30 step    12650 |      6 batches | lr 0.000341 | ms/batch 319.85 | loss  4.74 | ppl   114.425
| epoch  30 step    12700 |     56 batches | lr 0.000341 | ms/batch 335.81 | loss  4.68 | ppl   108.006
| epoch  30 step    12750 |    106 batches | lr 0.000341 | ms/batch 343.39 | loss  4.68 | ppl   107.644
| epoch  30 step    12800 |    156 batches | lr 0.000341 | ms/batch 334.87 | loss  4.72 | ppl   112.212
----------------------------------------------------------------------------------------------------
| Eval  32 at step    12800 | time: 137.04s | valid loss  4.51 | valid ppl    91.088
----------------------------------------------------------------------------------------------------
| epoch  30 step    12850 |    206 batches | lr 0.000341 | ms/batch 477.20 | loss  4.72 | ppl   111.897
| epoch  30 step    12900 |    256 batches | lr 0.000341 | ms/batch 324.99 | loss  4.70 | ppl   110.421
| epoch  30 step    12950 |    306 batches | lr 0.000341 | ms/batch 325.81 | loss  4.75 | ppl   115.711
| epoch  30 step    13000 |    356 batches | lr 0.000341 | ms/batch 325.12 | loss  4.63 | ppl   102.362
| epoch  30 step    13050 |    406 batches | lr 0.000341 | ms/batch 324.50 | loss  4.69 | ppl   108.726
| epoch  31 step    13100 |     20 batches | lr 0.000341 | ms/batch 321.91 | loss  4.73 | ppl   112.942
| epoch  31 step    13150 |     70 batches | lr 0.000341 | ms/batch 331.74 | loss  4.66 | ppl   105.942
| epoch  31 step    13200 |    120 batches | lr 0.00034 | ms/batch 334.56 | loss  4.68 | ppl   108.099
----------------------------------------------------------------------------------------------------
| Eval  33 at step    13200 | time: 136.17s | valid loss  4.51 | valid ppl    91.124
----------------------------------------------------------------------------------------------------
| epoch  31 step    13250 |    170 batches | lr 0.00034 | ms/batch 437.06 | loss  4.69 | ppl   108.700
| epoch  31 step    13300 |    220 batches | lr 0.00034 | ms/batch 324.40 | loss  4.70 | ppl   110.093
| epoch  31 step    13350 |    270 batches | lr 0.00034 | ms/batch 325.46 | loss  4.69 | ppl   109.075
| epoch  31 step    13400 |    320 batches | lr 0.00034 | ms/batch 325.93 | loss  4.69 | ppl   108.412
| epoch  31 step    13450 |    370 batches | lr 0.00034 | ms/batch 325.32 | loss  4.65 | ppl   104.153
| epoch  31 step    13500 |    420 batches | lr 0.00034 | ms/batch 326.93 | loss  4.69 | ppl   108.497
| epoch  32 step    13550 |     34 batches | lr 0.00034 | ms/batch 323.19 | loss  4.70 | ppl   110.102
| epoch  32 step    13600 |     84 batches | lr 0.00034 | ms/batch 331.76 | loss  4.65 | ppl   104.560
----------------------------------------------------------------------------------------------------
| Eval  34 at step    13600 | time: 135.76s | valid loss  4.50 | valid ppl    89.883
----------------------------------------------------------------------------------------------------
| epoch  32 step    13650 |    134 batches | lr 0.00034 | ms/batch 481.15 | loss  4.66 | ppl   105.735
| epoch  32 step    13700 |    184 batches | lr 0.00034 | ms/batch 327.38 | loss  4.67 | ppl   106.990
| epoch  32 step    13750 |    234 batches | lr 0.00034 | ms/batch 326.90 | loss  4.68 | ppl   108.175
| epoch  32 step    13800 |    284 batches | lr 0.00034 | ms/batch 326.57 | loss  4.69 | ppl   109.075
| epoch  32 step    13850 |    334 batches | lr 0.00034 | ms/batch 327.17 | loss  4.61 | ppl   100.571
| epoch  32 step    13900 |    384 batches | lr 0.000339 | ms/batch 326.39 | loss  4.64 | ppl   103.674
| epoch  32 step    13950 |    434 batches | lr 0.000339 | ms/batch 330.75 | loss  4.69 | ppl   109.117
| epoch  33 step    14000 |     48 batches | lr 0.000339 | ms/batch 325.15 | loss  4.65 | ppl   105.011
----------------------------------------------------------------------------------------------------
| Eval  35 at step    14000 | time: 136.03s | valid loss  4.49 | valid ppl    88.709
----------------------------------------------------------------------------------------------------
| epoch  33 step    14050 |     98 batches | lr 0.000339 | ms/batch 478.70 | loss  4.64 | ppl   103.552
| epoch  33 step    14100 |    148 batches | lr 0.000339 | ms/batch 329.07 | loss  4.65 | ppl   104.650
| epoch  33 step    14150 |    198 batches | lr 0.000339 | ms/batch 326.83 | loss  4.67 | ppl   107.140
| epoch  33 step    14200 |    248 batches | lr 0.000339 | ms/batch 328.04 | loss  4.70 | ppl   109.647
| epoch  33 step    14250 |    298 batches | lr 0.000339 | ms/batch 328.39 | loss  4.70 | ppl   110.248
| epoch  33 step    14300 |    348 batches | lr 0.000339 | ms/batch 327.24 | loss  4.58 | ppl    97.096
| epoch  33 step    14350 |    398 batches | lr 0.000339 | ms/batch 327.52 | loss  4.66 | ppl   105.992
| epoch  34 step    14400 |     12 batches | lr 0.000339 | ms/batch 323.39 | loss  4.69 | ppl   108.879
----------------------------------------------------------------------------------------------------
| Eval  36 at step    14400 | time: 136.07s | valid loss  4.47 | valid ppl    87.682
----------------------------------------------------------------------------------------------------
| epoch  34 step    14450 |     62 batches | lr 0.000339 | ms/batch 473.10 | loss  4.61 | ppl   100.272
| epoch  34 step    14500 |    112 batches | lr 0.000339 | ms/batch 325.41 | loss  4.63 | ppl   102.035
| epoch  34 step    14550 |    162 batches | lr 0.000338 | ms/batch 324.70 | loss  4.65 | ppl   104.104
| epoch  34 step    14600 |    212 batches | lr 0.000338 | ms/batch 324.94 | loss  4.66 | ppl   105.397
| epoch  34 step    14650 |    262 batches | lr 0.000338 | ms/batch 326.81 | loss  4.66 | ppl   105.843
| epoch  34 step    14700 |    312 batches | lr 0.000338 | ms/batch 340.02 | loss  4.65 | ppl   104.929
| epoch  34 step    14750 |    362 batches | lr 0.000338 | ms/batch 333.58 | loss  4.59 | ppl    98.072
| epoch  34 step    14800 |    412 batches | lr 0.000338 | ms/batch 326.90 | loss  4.63 | ppl   102.570
----------------------------------------------------------------------------------------------------
| Eval  37 at step    14800 | time: 136.88s | valid loss  4.49 | valid ppl    89.148
----------------------------------------------------------------------------------------------------
| epoch  35 step    14850 |     26 batches | lr 0.000338 | ms/batch 427.21 | loss  4.68 | ppl   107.551
| epoch  35 step    14900 |     76 batches | lr 0.000338 | ms/batch 326.68 | loss  4.60 | ppl    99.765
| epoch  35 step    14950 |    126 batches | lr 0.000338 | ms/batch 326.65 | loss  4.63 | ppl   102.610
| epoch  35 step    15000 |    176 batches | lr 0.000338 | ms/batch 328.47 | loss  4.64 | ppl   103.318
| epoch  35 step    15050 |    226 batches | lr 0.000338 | ms/batch 333.02 | loss  4.66 | ppl   105.200
| epoch  35 step    15100 |    276 batches | lr 0.000338 | ms/batch 326.38 | loss  4.67 | ppl   107.023
| epoch  35 step    15150 |    326 batches | lr 0.000337 | ms/batch 328.68 | loss  4.60 | ppl    99.562
| epoch  35 step    15200 |    376 batches | lr 0.000337 | ms/batch 327.58 | loss  4.62 | ppl   101.851
----------------------------------------------------------------------------------------------------
| Eval  38 at step    15200 | time: 136.23s | valid loss  4.47 | valid ppl    87.427
----------------------------------------------------------------------------------------------------
| epoch  35 step    15250 |    426 batches | lr 0.000337 | ms/batch 466.45 | loss  4.65 | ppl   104.569
| epoch  36 step    15300 |     40 batches | lr 0.000337 | ms/batch 322.82 | loss  4.61 | ppl   100.759
| epoch  36 step    15350 |     90 batches | lr 0.000337 | ms/batch 326.22 | loss  4.57 | ppl    96.884
| epoch  36 step    15400 |    140 batches | lr 0.000337 | ms/batch 325.06 | loss  4.60 | ppl    99.702
| epoch  36 step    15450 |    190 batches | lr 0.000337 | ms/batch 326.67 | loss  4.63 | ppl   103.004
| epoch  36 step    15500 |    240 batches | lr 0.000337 | ms/batch 326.80 | loss  4.66 | ppl   105.479
| epoch  36 step    15550 |    290 batches | lr 0.000337 | ms/batch 324.70 | loss  4.68 | ppl   107.981
| epoch  36 step    15600 |    340 batches | lr 0.000337 | ms/batch 325.91 | loss  4.54 | ppl    93.640
----------------------------------------------------------------------------------------------------
| Eval  39 at step    15600 | time: 135.61s | valid loss  4.46 | valid ppl    86.374
----------------------------------------------------------------------------------------------------
| epoch  36 step    15650 |    390 batches | lr 0.000337 | ms/batch 468.06 | loss  4.62 | ppl   101.947
| epoch  37 step    15700 |      4 batches | lr 0.000337 | ms/batch 338.89 | loss  4.64 | ppl   104.039
| epoch  37 step    15750 |     54 batches | lr 0.000336 | ms/batch 325.68 | loss  4.56 | ppl    95.666
| epoch  37 step    15800 |    104 batches | lr 0.000336 | ms/batch 326.46 | loss  4.58 | ppl    97.865
| epoch  37 step    15850 |    154 batches | lr 0.000336 | ms/batch 327.05 | loss  4.62 | ppl   101.788
| epoch  37 step    15900 |    204 batches | lr 0.000336 | ms/batch 324.11 | loss  4.63 | ppl   102.586
| epoch  37 step    15950 |    254 batches | lr 0.000336 | ms/batch 324.82 | loss  4.62 | ppl   101.613
| epoch  37 step    16000 |    304 batches | lr 0.000336 | ms/batch 323.99 | loss  4.66 | ppl   105.159
----------------------------------------------------------------------------------------------------
| Eval  40 at step    16000 | time: 136.22s | valid loss  4.45 | valid ppl    85.659
----------------------------------------------------------------------------------------------------
| epoch  37 step    16050 |    354 batches | lr 0.000336 | ms/batch 524.16 | loss  4.54 | ppl    94.153
| epoch  37 step    16100 |    404 batches | lr 0.000336 | ms/batch 342.87 | loss  4.60 | ppl    99.866
| epoch  38 step    16150 |     18 batches | lr 0.000336 | ms/batch 327.07 | loss  4.65 | ppl   104.096
| epoch  38 step    16200 |     68 batches | lr 0.000336 | ms/batch 325.03 | loss  4.55 | ppl    94.825
| epoch  38 step    16250 |    118 batches | lr 0.000336 | ms/batch 324.83 | loss  4.58 | ppl    97.203
| epoch  38 step    16300 |    168 batches | lr 0.000336 | ms/batch 324.41 | loss  4.60 | ppl    99.112
| epoch  38 step    16350 |    218 batches | lr 0.000335 | ms/batch 324.96 | loss  4.62 | ppl   101.716
| epoch  38 step    16400 |    268 batches | lr 0.000335 | ms/batch 325.26 | loss  4.61 | ppl   100.343
----------------------------------------------------------------------------------------------------
| Eval  41 at step    16400 | time: 137.19s | valid loss  4.45 | valid ppl    85.792
----------------------------------------------------------------------------------------------------
| epoch  38 step    16450 |    318 batches | lr 0.000335 | ms/batch 431.63 | loss  4.60 | ppl    99.267
| epoch  38 step    16500 |    368 batches | lr 0.000335 | ms/batch 328.57 | loss  4.54 | ppl    93.224
| epoch  38 step    16550 |    418 batches | lr 0.000335 | ms/batch 329.56 | loss  4.59 | ppl    98.533
| epoch  39 step    16600 |     32 batches | lr 0.000335 | ms/batch 320.05 | loss  4.61 | ppl   100.885
| epoch  39 step    16650 |     82 batches | lr 0.000335 | ms/batch 324.46 | loss  4.56 | ppl    95.218
| epoch  39 step    16700 |    132 batches | lr 0.000335 | ms/batch 324.41 | loss  4.58 | ppl    97.453
| epoch  39 step    16750 |    182 batches | lr 0.000335 | ms/batch 325.35 | loss  4.59 | ppl    98.095
| epoch  39 step    16800 |    232 batches | lr 0.000335 | ms/batch 325.20 | loss  4.60 | ppl    99.360
----------------------------------------------------------------------------------------------------
| Eval  42 at step    16800 | time: 135.47s | valid loss  4.44 | valid ppl    85.010
----------------------------------------------------------------------------------------------------
| epoch  39 step    16850 |    282 batches | lr 0.000335 | ms/batch 467.04 | loss  4.60 | ppl    99.547
| epoch  39 step    16900 |    332 batches | lr 0.000334 | ms/batch 329.53 | loss  4.53 | ppl    92.397
| epoch  39 step    16950 |    382 batches | lr 0.000334 | ms/batch 328.86 | loss  4.57 | ppl    96.642
| epoch  39 step    17000 |    432 batches | lr 0.000334 | ms/batch 329.41 | loss  4.58 | ppl    97.743
| epoch  40 step    17050 |     46 batches | lr 0.000334 | ms/batch 319.10 | loss  4.54 | ppl    93.683
| epoch  40 step    17100 |     96 batches | lr 0.000334 | ms/batch 324.36 | loss  4.53 | ppl    92.303
| epoch  40 step    17150 |    146 batches | lr 0.000334 | ms/batch 324.11 | loss  4.58 | ppl    97.980
| epoch  40 step    17200 |    196 batches | lr 0.000334 | ms/batch 324.90 | loss  4.59 | ppl    98.980
----------------------------------------------------------------------------------------------------
| Eval  43 at step    17200 | time: 135.66s | valid loss  4.43 | valid ppl    83.918
----------------------------------------------------------------------------------------------------
| epoch  40 step    17250 |    246 batches | lr 0.000334 | ms/batch 465.32 | loss  4.60 | ppl    99.027
| epoch  40 step    17300 |    296 batches | lr 0.000334 | ms/batch 329.11 | loss  4.62 | ppl   102.003
| epoch  40 step    17350 |    346 batches | lr 0.000334 | ms/batch 328.09 | loss  4.47 | ppl    87.623
| epoch  40 step    17400 |    396 batches | lr 0.000334 | ms/batch 329.61 | loss  4.57 | ppl    96.250
| epoch  41 step    17450 |     10 batches | lr 0.000333 | ms/batch 320.79 | loss  4.61 | ppl   100.359
| epoch  41 step    17500 |     60 batches | lr 0.000333 | ms/batch 323.59 | loss  4.54 | ppl    93.340
| epoch  41 step    17550 |    110 batches | lr 0.000333 | ms/batch 324.35 | loss  4.52 | ppl    92.274
| epoch  41 step    17600 |    160 batches | lr 0.000333 | ms/batch 324.63 | loss  4.54 | ppl    93.735
----------------------------------------------------------------------------------------------------
| Eval  44 at step    17600 | time: 135.61s | valid loss  4.42 | valid ppl    83.345
----------------------------------------------------------------------------------------------------
| epoch  41 step    17650 |    210 batches | lr 0.000333 | ms/batch 463.76 | loss  4.58 | ppl    97.096
| epoch  41 step    17700 |    260 batches | lr 0.000333 | ms/batch 328.54 | loss  4.60 | ppl    99.601
| epoch  41 step    17750 |    310 batches | lr 0.000333 | ms/batch 327.82 | loss  4.58 | ppl    97.233
| epoch  41 step    17800 |    360 batches | lr 0.000333 | ms/batch 329.26 | loss  4.51 | ppl    90.929
| epoch  41 step    17850 |    410 batches | lr 0.000333 | ms/batch 328.51 | loss  4.55 | ppl    94.241
| epoch  42 step    17900 |     24 batches | lr 0.000333 | ms/batch 320.77 | loss  4.60 | ppl    99.127
| epoch  42 step    17950 |     74 batches | lr 0.000332 | ms/batch 324.42 | loss  4.51 | ppl    91.078
| epoch  42 step    18000 |    124 batches | lr 0.000332 | ms/batch 325.12 | loss  4.55 | ppl    94.906
----------------------------------------------------------------------------------------------------
| Eval  45 at step    18000 | time: 135.80s | valid loss  4.42 | valid ppl    82.966
----------------------------------------------------------------------------------------------------
| epoch  42 step    18050 |    174 batches | lr 0.000332 | ms/batch 464.22 | loss  4.55 | ppl    95.010
| epoch  42 step    18100 |    224 batches | lr 0.000332 | ms/batch 329.23 | loss  4.56 | ppl    95.591
| epoch  42 step    18150 |    274 batches | lr 0.000332 | ms/batch 327.69 | loss  4.58 | ppl    97.820
| epoch  42 step    18200 |    324 batches | lr 0.000332 | ms/batch 328.89 | loss  4.54 | ppl    93.757
| epoch  42 step    18250 |    374 batches | lr 0.000332 | ms/batch 328.74 | loss  4.55 | ppl    94.359
| epoch  42 step    18300 |    424 batches | lr 0.000332 | ms/batch 328.89 | loss  4.55 | ppl    95.084
| epoch  43 step    18350 |     38 batches | lr 0.000332 | ms/batch 322.10 | loss  4.56 | ppl    95.442
| epoch  43 step    18400 |     88 batches | lr 0.000332 | ms/batch 324.70 | loss  4.49 | ppl    89.505
----------------------------------------------------------------------------------------------------
| Eval  46 at step    18400 | time: 136.11s | valid loss  4.44 | valid ppl    84.540
----------------------------------------------------------------------------------------------------
| epoch  43 step    18450 |    138 batches | lr 0.000332 | ms/batch 432.88 | loss  4.55 | ppl    94.936
| epoch  43 step    18500 |    188 batches | lr 0.000331 | ms/batch 330.19 | loss  4.54 | ppl    93.662
| epoch  43 step    18550 |    238 batches | lr 0.000331 | ms/batch 330.08 | loss  4.57 | ppl    96.341
| epoch  43 step    18600 |    288 batches | lr 0.000331 | ms/batch 329.52 | loss  4.58 | ppl    97.736
| epoch  43 step    18650 |    338 batches | lr 0.000331 | ms/batch 329.08 | loss  4.47 | ppl    86.962
| epoch  43 step    18700 |    388 batches | lr 0.000331 | ms/batch 329.90 | loss  4.55 | ppl    94.241
| epoch  44 step    18750 |      2 batches | lr 0.000331 | ms/batch 324.42 | loss  4.57 | ppl    96.930
| epoch  44 step    18800 |     52 batches | lr 0.000331 | ms/batch 326.52 | loss  4.49 | ppl    89.505
----------------------------------------------------------------------------------------------------
| Eval  47 at step    18800 | time: 136.63s | valid loss  4.41 | valid ppl    82.658
----------------------------------------------------------------------------------------------------
| epoch  44 step    18850 |    102 batches | lr 0.000331 | ms/batch 468.35 | loss  4.50 | ppl    90.137
| epoch  44 step    18900 |    152 batches | lr 0.000331 | ms/batch 329.38 | loss  4.53 | ppl    92.614
| epoch  44 step    18950 |    202 batches | lr 0.000331 | ms/batch 329.00 | loss  4.53 | ppl    93.158
| epoch  44 step    19000 |    252 batches | lr 0.00033 | ms/batch 328.70 | loss  4.55 | ppl    94.684
| epoch  44 step    19050 |    302 batches | lr 0.00033 | ms/batch 329.72 | loss  4.58 | ppl    97.127
| epoch  44 step    19100 |    352 batches | lr 0.00033 | ms/batch 328.09 | loss  4.46 | ppl    86.373
| epoch  44 step    19150 |    402 batches | lr 0.00033 | ms/batch 328.66 | loss  4.53 | ppl    92.556
| epoch  45 step    19200 |     16 batches | lr 0.00033 | ms/batch 322.69 | loss  4.57 | ppl    96.311
----------------------------------------------------------------------------------------------------
| Eval  48 at step    19200 | time: 136.51s | valid loss  4.41 | valid ppl    82.645
----------------------------------------------------------------------------------------------------
| epoch  45 step    19250 |     66 batches | lr 0.00033 | ms/batch 467.03 | loss  4.49 | ppl    88.836
| epoch  45 step    19300 |    116 batches | lr 0.00033 | ms/batch 331.15 | loss  4.51 | ppl    91.214
| epoch  45 step    19350 |    166 batches | lr 0.00033 | ms/batch 330.31 | loss  4.55 | ppl    94.559
| epoch  45 step    19400 |    216 batches | lr 0.00033 | ms/batch 330.30 | loss  4.54 | ppl    93.479
| epoch  45 step    19450 |    266 batches | lr 0.00033 | ms/batch 330.92 | loss  4.53 | ppl    92.773
| epoch  45 step    19500 |    316 batches | lr 0.000329 | ms/batch 330.31 | loss  4.52 | ppl    91.771
| epoch  45 step    19550 |    366 batches | lr 0.000329 | ms/batch 329.48 | loss  4.48 | ppl    87.870
| epoch  45 step    19600 |    416 batches | lr 0.000329 | ms/batch 328.27 | loss  4.51 | ppl    90.624
----------------------------------------------------------------------------------------------------
| Eval  49 at step    19600 | time: 137.25s | valid loss  4.41 | valid ppl    82.682
----------------------------------------------------------------------------------------------------
| epoch  46 step    19650 |     30 batches | lr 0.000329 | ms/batch 424.68 | loss  4.55 | ppl    94.477
| epoch  46 step    19700 |     80 batches | lr 0.000329 | ms/batch 330.11 | loss  4.48 | ppl    88.386
| epoch  46 step    19750 |    130 batches | lr 0.000329 | ms/batch 328.86 | loss  4.53 | ppl    92.447
| epoch  46 step    19800 |    180 batches | lr 0.000329 | ms/batch 328.39 | loss  4.52 | ppl    92.231
| epoch  46 step    19850 |    230 batches | lr 0.000329 | ms/batch 328.29 | loss  4.53 | ppl    92.404
| epoch  46 step    19900 |    280 batches | lr 0.000329 | ms/batch 329.70 | loss  4.51 | ppl    91.320
| epoch  46 step    19950 |    330 batches | lr 0.000328 | ms/batch 329.35 | loss  4.48 | ppl    87.939
| epoch  46 step    20000 |    380 batches | lr 0.000328 | ms/batch 328.33 | loss  4.50 | ppl    90.080
----------------------------------------------------------------------------------------------------
| Eval  50 at step    20000 | time: 136.41s | valid loss  4.40 | valid ppl    81.583
----------------------------------------------------------------------------------------------------
| epoch  46 step    20050 |    430 batches | lr 0.000328 | ms/batch 461.44 | loss  4.53 | ppl    92.426
| epoch  47 step    20100 |     44 batches | lr 0.000328 | ms/batch 322.86 | loss  4.50 | ppl    89.806
| epoch  47 step    20150 |     94 batches | lr 0.000328 | ms/batch 329.74 | loss  4.46 | ppl    86.840
| epoch  47 step    20200 |    144 batches | lr 0.000328 | ms/batch 330.03 | loss  4.49 | ppl    89.365
| epoch  47 step    20250 |    194 batches | lr 0.000328 | ms/batch 330.34 | loss  4.51 | ppl    90.695
| epoch  47 step    20300 |    244 batches | lr 0.000328 | ms/batch 330.96 | loss  4.53 | ppl    92.419
| epoch  47 step    20350 |    294 batches | lr 0.000328 | ms/batch 330.59 | loss  4.55 | ppl    94.573
| epoch  47 step    20400 |    344 batches | lr 0.000327 | ms/batch 329.81 | loss  4.42 | ppl    83.012
----------------------------------------------------------------------------------------------------
| Eval  51 at step    20400 | time: 136.68s | valid loss  4.40 | valid ppl    81.106
----------------------------------------------------------------------------------------------------
| epoch  47 step    20450 |    394 batches | lr 0.000327 | ms/batch 465.70 | loss  4.52 | ppl    91.879
| epoch  48 step    20500 |      8 batches | lr 0.000327 | ms/batch 320.39 | loss  4.52 | ppl    92.137
| epoch  48 step    20550 |     58 batches | lr 0.000327 | ms/batch 328.87 | loss  4.45 | ppl    86.029
| epoch  48 step    20600 |    108 batches | lr 0.000327 | ms/batch 330.39 | loss  4.46 | ppl    86.772
| epoch  48 step    20650 |    158 batches | lr 0.000327 | ms/batch 329.98 | loss  4.51 | ppl    90.652
| epoch  48 step    20700 |    208 batches | lr 0.000327 | ms/batch 329.46 | loss  4.50 | ppl    90.257
| epoch  48 step    20750 |    258 batches | lr 0.000327 | ms/batch 329.33 | loss  4.51 | ppl    91.036
| epoch  48 step    20800 |    308 batches | lr 0.000327 | ms/batch 330.24 | loss  4.55 | ppl    94.544
----------------------------------------------------------------------------------------------------
| Eval  52 at step    20800 | time: 136.47s | valid loss  4.39 | valid ppl    80.509
----------------------------------------------------------------------------------------------------
| epoch  48 step    20850 |    358 batches | lr 0.000327 | ms/batch 465.44 | loss  4.42 | ppl    83.207
| epoch  48 step    20900 |    408 batches | lr 0.000326 | ms/batch 324.25 | loss  4.48 | ppl    88.069
| epoch  49 step    20950 |     22 batches | lr 0.000326 | ms/batch 320.70 | loss  4.53 | ppl    92.372
| epoch  49 step    21000 |     72 batches | lr 0.000326 | ms/batch 329.79 | loss  4.46 | ppl    86.251
| epoch  49 step    21050 |    122 batches | lr 0.000326 | ms/batch 329.67 | loss  4.49 | ppl    88.892
| epoch  49 step    21100 |    172 batches | lr 0.000326 | ms/batch 330.82 | loss  4.48 | ppl    88.186
| epoch  49 step    21150 |    222 batches | lr 0.000326 | ms/batch 329.95 | loss  4.53 | ppl    92.296
| epoch  49 step    21200 |    272 batches | lr 0.000326 | ms/batch 331.48 | loss  4.51 | ppl    91.363
----------------------------------------------------------------------------------------------------
| Eval  53 at step    21200 | time: 136.36s | valid loss  4.40 | valid ppl    81.347
----------------------------------------------------------------------------------------------------
| epoch  49 step    21250 |    322 batches | lr 0.000326 | ms/batch 430.07 | loss  4.49 | ppl    89.052
| epoch  49 step    21300 |    372 batches | lr 0.000326 | ms/batch 326.08 | loss  4.44 | ppl    84.425
| epoch  49 step    21350 |    422 batches | lr 0.000325 | ms/batch 326.57 | loss  4.47 | ppl    87.740
| epoch  50 step    21400 |     36 batches | lr 0.000325 | ms/batch 325.04 | loss  4.50 | ppl    90.334
| epoch  50 step    21450 |     86 batches | lr 0.000325 | ms/batch 329.19 | loss  4.43 | ppl    83.813
| epoch  50 step    21500 |    136 batches | lr 0.000325 | ms/batch 329.67 | loss  4.50 | ppl    89.856
| epoch  50 step    21550 |    186 batches | lr 0.000325 | ms/batch 331.17 | loss  4.48 | ppl    87.932
| epoch  50 step    21600 |    236 batches | lr 0.000325 | ms/batch 330.47 | loss  4.48 | ppl    88.608
----------------------------------------------------------------------------------------------------
| Eval  54 at step    21600 | time: 136.39s | valid loss  4.37 | valid ppl    79.251
----------------------------------------------------------------------------------------------------
| epoch  50 step    21650 |    286 batches | lr 0.000325 | ms/batch 461.61 | loss  4.52 | ppl    92.051
| epoch  50 step    21700 |    336 batches | lr 0.000325 | ms/batch 325.88 | loss  4.42 | ppl    83.077
| epoch  50 step    21750 |    386 batches | lr 0.000324 | ms/batch 325.03 | loss  4.48 | ppl    88.628
| epoch  50 step    21800 |    436 batches | lr 0.000324 | ms/batch 321.35 | loss  4.50 | ppl    89.905
| epoch  51 step    21850 |     50 batches | lr 0.000324 | ms/batch 329.62 | loss  4.45 | ppl    85.620
| epoch  51 step    21900 |    100 batches | lr 0.000324 | ms/batch 329.94 | loss  4.45 | ppl    85.227
| epoch  51 step    21950 |    150 batches | lr 0.000324 | ms/batch 330.06 | loss  4.48 | ppl    87.877
| epoch  51 step    22000 |    200 batches | lr 0.000324 | ms/batch 331.23 | loss  4.52 | ppl    92.087
----------------------------------------------------------------------------------------------------
| Eval  55 at step    22000 | time: 136.30s | valid loss  4.38 | valid ppl    79.542
----------------------------------------------------------------------------------------------------
| epoch  51 step    22050 |    250 batches | lr 0.000324 | ms/batch 429.26 | loss  4.51 | ppl    90.482
| epoch  51 step    22100 |    300 batches | lr 0.000324 | ms/batch 325.59 | loss  4.51 | ppl    91.171
| epoch  51 step    22150 |    350 batches | lr 0.000324 | ms/batch 325.90 | loss  4.38 | ppl    79.794
| epoch  51 step    22200 |    400 batches | lr 0.000323 | ms/batch 325.17 | loss  4.47 | ppl    87.733
| epoch  52 step    22250 |     14 batches | lr 0.000323 | ms/batch 320.57 | loss  4.50 | ppl    90.105
| epoch  52 step    22300 |     64 batches | lr 0.000323 | ms/batch 330.13 | loss  4.39 | ppl    80.988
| epoch  52 step    22350 |    114 batches | lr 0.000323 | ms/batch 329.57 | loss  4.47 | ppl    87.747
| epoch  52 step    22400 |    164 batches | lr 0.000323 | ms/batch 330.31 | loss  4.46 | ppl    86.609
----------------------------------------------------------------------------------------------------
| Eval  56 at step    22400 | time: 135.82s | valid loss  4.37 | valid ppl    78.817
----------------------------------------------------------------------------------------------------
| epoch  52 step    22450 |    214 batches | lr 0.000323 | ms/batch 462.73 | loss  4.49 | ppl    89.275
| epoch  52 step    22500 |    264 batches | lr 0.000323 | ms/batch 326.03 | loss  4.47 | ppl    87.760
| epoch  52 step    22550 |    314 batches | lr 0.000323 | ms/batch 325.61 | loss  4.47 | ppl    87.760
| epoch  52 step    22600 |    364 batches | lr 0.000323 | ms/batch 324.23 | loss  4.40 | ppl    81.336
| epoch  52 step    22650 |    414 batches | lr 0.000322 | ms/batch 324.96 | loss  4.48 | ppl    87.911
| epoch  53 step    22700 |     28 batches | lr 0.000322 | ms/batch 321.55 | loss  4.49 | ppl    88.996
| epoch  53 step    22750 |     78 batches | lr 0.000322 | ms/batch 330.08 | loss  4.41 | ppl    82.469
| epoch  53 step    22800 |    128 batches | lr 0.000322 | ms/batch 329.91 | loss  4.44 | ppl    85.054
----------------------------------------------------------------------------------------------------
| Eval  57 at step    22800 | time: 135.58s | valid loss  4.36 | valid ppl    78.335
----------------------------------------------------------------------------------------------------
| epoch  53 step    22850 |    178 batches | lr 0.000322 | ms/batch 461.61 | loss  4.47 | ppl    87.603
| epoch  53 step    22900 |    228 batches | lr 0.000322 | ms/batch 324.53 | loss  4.48 | ppl    88.366
| epoch  53 step    22950 |    278 batches | lr 0.000322 | ms/batch 324.10 | loss  4.46 | ppl    86.806
| epoch  53 step    23000 |    328 batches | lr 0.000322 | ms/batch 324.10 | loss  4.42 | ppl    83.395
| epoch  53 step    23050 |    378 batches | lr 0.000321 | ms/batch 324.07 | loss  4.45 | ppl    85.227
| epoch  53 step    23100 |    428 batches | lr 0.000321 | ms/batch 324.68 | loss  4.47 | ppl    87.357
| epoch  54 step    23150 |     42 batches | lr 0.000321 | ms/batch 321.86 | loss  4.46 | ppl    86.440
| epoch  54 step    23200 |     92 batches | lr 0.000321 | ms/batch 328.81 | loss  4.42 | ppl    83.285
----------------------------------------------------------------------------------------------------
| Eval  58 at step    23200 | time: 135.06s | valid loss  4.38 | valid ppl    79.852
----------------------------------------------------------------------------------------------------
| epoch  54 step    23250 |    142 batches | lr 0.000321 | ms/batch 427.90 | loss  4.44 | ppl    84.934
| epoch  54 step    23300 |    192 batches | lr 0.000321 | ms/batch 324.59 | loss  4.48 | ppl    88.511
| epoch  54 step    23350 |    242 batches | lr 0.000321 | ms/batch 324.05 | loss  4.49 | ppl    88.795
| epoch  54 step    23400 |    292 batches | lr 0.000321 | ms/batch 324.22 | loss  4.50 | ppl    90.384
| epoch  54 step    23450 |    342 batches | lr 0.00032 | ms/batch 325.14 | loss  4.38 | ppl    79.496
| epoch  54 step    23500 |    392 batches | lr 0.00032 | ms/batch 323.94 | loss  4.46 | ppl    86.596
| epoch  55 step    23550 |      6 batches | lr 0.00032 | ms/batch 318.46 | loss  4.48 | ppl    87.911
| epoch  55 step    23600 |     56 batches | lr 0.00032 | ms/batch 330.96 | loss  4.45 | ppl    85.220
----------------------------------------------------------------------------------------------------
| Eval  59 at step    23600 | time: 135.01s | valid loss  4.36 | valid ppl    78.112
----------------------------------------------------------------------------------------------------
| epoch  55 step    23650 |    106 batches | lr 0.00032 | ms/batch 464.22 | loss  4.41 | ppl    82.508
| epoch  55 step    23700 |    156 batches | lr 0.00032 | ms/batch 326.16 | loss  4.44 | ppl    84.702
| epoch  55 step    23750 |    206 batches | lr 0.00032 | ms/batch 325.74 | loss  4.46 | ppl    86.272
| epoch  55 step    23800 |    256 batches | lr 0.00032 | ms/batch 325.97 | loss  4.45 | ppl    85.875
| epoch  55 step    23850 |    306 batches | lr 0.000319 | ms/batch 326.74 | loss  4.48 | ppl    87.849
| epoch  55 step    23900 |    356 batches | lr 0.000319 | ms/batch 326.31 | loss  4.38 | ppl    79.869
| epoch  55 step    23950 |    406 batches | lr 0.000319 | ms/batch 326.69 | loss  4.44 | ppl    84.921
| epoch  56 step    24000 |     20 batches | lr 0.000319 | ms/batch 320.74 | loss  4.48 | ppl    88.373
----------------------------------------------------------------------------------------------------
| Eval  60 at step    24000 | time: 135.48s | valid loss  4.37 | valid ppl    78.988
----------------------------------------------------------------------------------------------------
| epoch  56 step    24050 |     70 batches | lr 0.000319 | ms/batch 428.46 | loss  4.39 | ppl    80.263
| epoch  56 step    24100 |    120 batches | lr 0.000319 | ms/batch 325.12 | loss  4.41 | ppl    82.475
| epoch  56 step    24150 |    170 batches | lr 0.000319 | ms/batch 325.86 | loss  4.45 | ppl    85.373
| epoch  56 step    24200 |    220 batches | lr 0.000319 | ms/batch 324.51 | loss  4.47 | ppl    87.037
| epoch  56 step    24250 |    270 batches | lr 0.000318 | ms/batch 325.71 | loss  4.46 | ppl    86.366
| epoch  56 step    24300 |    320 batches | lr 0.000318 | ms/batch 325.48 | loss  4.44 | ppl    84.921
| epoch  56 step    24350 |    370 batches | lr 0.000318 | ms/batch 324.87 | loss  4.41 | ppl    82.334
| epoch  56 step    24400 |    420 batches | lr 0.000318 | ms/batch 325.72 | loss  4.44 | ppl    84.715
----------------------------------------------------------------------------------------------------
| Eval  61 at step    24400 | time: 135.32s | valid loss  4.36 | valid ppl    78.106
----------------------------------------------------------------------------------------------------
| epoch  57 step    24450 |     34 batches | lr 0.000318 | ms/batch 458.01 | loss  4.46 | ppl    86.779
| epoch  57 step    24500 |     84 batches | lr 0.000318 | ms/batch 325.76 | loss  4.39 | ppl    80.584
| epoch  57 step    24550 |    134 batches | lr 0.000318 | ms/batch 325.83 | loss  4.42 | ppl    83.168
| epoch  57 step    24600 |    184 batches | lr 0.000318 | ms/batch 325.00 | loss  4.42 | ppl    83.174
| epoch  57 step    24650 |    234 batches | lr 0.000317 | ms/batch 326.96 | loss  4.47 | ppl    86.948
| epoch  57 step    24700 |    284 batches | lr 0.000317 | ms/batch 326.62 | loss  4.47 | ppl    87.179
| epoch  57 step    24750 |    334 batches | lr 0.000317 | ms/batch 327.18 | loss  4.37 | ppl    78.674
| epoch  57 step    24800 |    384 batches | lr 0.000317 | ms/batch 326.41 | loss  4.42 | ppl    83.187
----------------------------------------------------------------------------------------------------
| Eval  62 at step    24800 | time: 135.38s | valid loss  4.35 | valid ppl    77.709
----------------------------------------------------------------------------------------------------
| epoch  57 step    24850 |    434 batches | lr 0.000317 | ms/batch 467.12 | loss  4.45 | ppl    85.680
| epoch  58 step    24900 |     48 batches | lr 0.000317 | ms/batch 321.34 | loss  4.41 | ppl    82.103
| epoch  58 step    24950 |     98 batches | lr 0.000317 | ms/batch 326.57 | loss  4.39 | ppl    80.383
| epoch  58 step    25000 |    148 batches | lr 0.000317 | ms/batch 327.44 | loss  4.42 | ppl    82.721
| epoch  58 step    25050 |    198 batches | lr 0.000316 | ms/batch 327.06 | loss  4.46 | ppl    86.772
| epoch  58 step    25100 |    248 batches | lr 0.000316 | ms/batch 326.88 | loss  4.43 | ppl    84.319
| epoch  58 step    25150 |    298 batches | lr 0.000316 | ms/batch 327.10 | loss  4.48 | ppl    87.980
| epoch  58 step    25200 |    348 batches | lr 0.000316 | ms/batch 329.26 | loss  4.34 | ppl    76.480
----------------------------------------------------------------------------------------------------
| Eval  63 at step    25200 | time: 136.00s | valid loss  4.36 | valid ppl    78.037
----------------------------------------------------------------------------------------------------
| epoch  58 step    25250 |    398 batches | lr 0.000316 | ms/batch 434.43 | loss  4.43 | ppl    83.552
| epoch  59 step    25300 |     12 batches | lr 0.000316 | ms/batch 324.89 | loss  4.43 | ppl    84.234
| epoch  59 step    25350 |     62 batches | lr 0.000316 | ms/batch 326.92 | loss  4.39 | ppl    80.981
| epoch  59 step    25400 |    112 batches | lr 0.000316 | ms/batch 327.51 | loss  4.43 | ppl    83.781
| epoch  59 step    25450 |    162 batches | lr 0.000315 | ms/batch 326.46 | loss  4.43 | ppl    83.945
| epoch  59 step    25500 |    212 batches | lr 0.000315 | ms/batch 326.03 | loss  4.43 | ppl    83.539
| epoch  59 step    25550 |    262 batches | lr 0.000315 | ms/batch 325.50 | loss  4.44 | ppl    84.881
| epoch  59 step    25600 |    312 batches | lr 0.000315 | ms/batch 325.17 | loss  4.45 | ppl    85.453
----------------------------------------------------------------------------------------------------
| Eval  64 at step    25600 | time: 135.86s | valid loss  4.36 | valid ppl    78.240
----------------------------------------------------------------------------------------------------
| epoch  59 step    25650 |    362 batches | lr 0.000315 | ms/batch 433.05 | loss  4.35 | ppl    77.642
| epoch  59 step    25700 |    412 batches | lr 0.000315 | ms/batch 329.56 | loss  4.42 | ppl    82.889
| epoch  60 step    25750 |     26 batches | lr 0.000315 | ms/batch 322.36 | loss  4.46 | ppl    86.096
| epoch  60 step    25800 |     76 batches | lr 0.000314 | ms/batch 325.53 | loss  4.39 | ppl    80.408
| epoch  60 step    25850 |    126 batches | lr 0.000314 | ms/batch 326.77 | loss  4.41 | ppl    82.064
| epoch  60 step    25900 |    176 batches | lr 0.000314 | ms/batch 326.46 | loss  4.42 | ppl    82.753
| epoch  60 step    25950 |    226 batches | lr 0.000314 | ms/batch 327.78 | loss  4.42 | ppl    82.701
| epoch  60 step    26000 |    276 batches | lr 0.000314 | ms/batch 327.57 | loss  4.46 | ppl    86.575
----------------------------------------------------------------------------------------------------
| Eval  65 at step    26000 | time: 135.95s | valid loss  4.35 | valid ppl    77.570
----------------------------------------------------------------------------------------------------
| epoch  60 step    26050 |    326 batches | lr 0.000314 | ms/batch 467.91 | loss  4.39 | ppl    80.767
| epoch  60 step    26100 |    376 batches | lr 0.000314 | ms/batch 330.09 | loss  4.40 | ppl    81.381
| epoch  60 step    26150 |    426 batches | lr 0.000314 | ms/batch 331.73 | loss  4.41 | ppl    82.424
| epoch  61 step    26200 |     40 batches | lr 0.000313 | ms/batch 322.44 | loss  4.40 | ppl    81.375
| epoch  61 step    26250 |     90 batches | lr 0.000313 | ms/batch 326.93 | loss  4.36 | ppl    78.220
| epoch  61 step    26300 |    140 batches | lr 0.000313 | ms/batch 327.04 | loss  4.41 | ppl    82.482
| epoch  61 step    26350 |    190 batches | lr 0.000313 | ms/batch 327.39 | loss  4.44 | ppl    84.636
| epoch  61 step    26400 |    240 batches | lr 0.000313 | ms/batch 336.70 | loss  4.43 | ppl    83.578
----------------------------------------------------------------------------------------------------
| Eval  66 at step    26400 | time: 136.83s | valid loss  4.34 | valid ppl    76.803
----------------------------------------------------------------------------------------------------
| epoch  61 step    26450 |    290 batches | lr 0.000313 | ms/batch 467.84 | loss  4.46 | ppl    86.765
| epoch  61 step    26500 |    340 batches | lr 0.000313 | ms/batch 330.07 | loss  4.32 | ppl    75.042
| epoch  61 step    26550 |    390 batches | lr 0.000312 | ms/batch 331.48 | loss  4.41 | ppl    82.314
| epoch  62 step    26600 |      4 batches | lr 0.000312 | ms/batch 325.46 | loss  4.42 | ppl    83.005
| epoch  62 step    26650 |     54 batches | lr 0.000312 | ms/batch 325.55 | loss  4.36 | ppl    77.989
| epoch  62 step    26700 |    104 batches | lr 0.000312 | ms/batch 325.12 | loss  4.37 | ppl    78.674
| epoch  62 step    26750 |    154 batches | lr 0.000312 | ms/batch 326.45 | loss  4.41 | ppl    82.579
| epoch  62 step    26800 |    204 batches | lr 0.000312 | ms/batch 325.83 | loss  4.41 | ppl    82.282
----------------------------------------------------------------------------------------------------
| Eval  67 at step    26800 | time: 136.23s | valid loss  4.34 | valid ppl    76.903
----------------------------------------------------------------------------------------------------
| epoch  62 step    26850 |    254 batches | lr 0.000312 | ms/batch 433.10 | loss  4.40 | ppl    81.853
| epoch  62 step    26900 |    304 batches | lr 0.000312 | ms/batch 331.61 | loss  4.43 | ppl    83.532
| epoch  62 step    26950 |    354 batches | lr 0.000311 | ms/batch 329.47 | loss  4.30 | ppl    74.058
| epoch  62 step    27000 |    404 batches | lr 0.000311 | ms/batch 330.21 | loss  4.41 | ppl    82.173
| epoch  63 step    27050 |     18 batches | lr 0.000311 | ms/batch 322.02 | loss  4.43 | ppl    83.614
| epoch  63 step    27100 |     68 batches | lr 0.000311 | ms/batch 324.24 | loss  4.36 | ppl    78.172
| epoch  63 step    27150 |    118 batches | lr 0.000311 | ms/batch 326.69 | loss  4.38 | ppl    79.988
| epoch  63 step    27200 |    168 batches | lr 0.000311 | ms/batch 326.03 | loss  4.40 | ppl    81.731
----------------------------------------------------------------------------------------------------
| Eval  68 at step    27200 | time: 136.14s | valid loss  4.35 | valid ppl    77.314
----------------------------------------------------------------------------------------------------
| epoch  63 step    27250 |    218 batches | lr 0.000311 | ms/batch 433.78 | loss  4.43 | ppl    83.787
| epoch  63 step    27300 |    268 batches | lr 0.00031 | ms/batch 329.34 | loss  4.39 | ppl    80.924
| epoch  63 step    27350 |    318 batches | lr 0.00031 | ms/batch 329.34 | loss  4.41 | ppl    82.019
| epoch  63 step    27400 |    368 batches | lr 0.00031 | ms/batch 330.70 | loss  4.35 | ppl    77.497
| epoch  63 step    27450 |    418 batches | lr 0.00031 | ms/batch 329.79 | loss  4.40 | ppl    81.102
| epoch  64 step    27500 |     32 batches | lr 0.00031 | ms/batch 322.30 | loss  4.43 | ppl    83.591
| epoch  64 step    27550 |     82 batches | lr 0.00031 | ms/batch 326.87 | loss  4.33 | ppl    76.134
| epoch  64 step    27600 |    132 batches | lr 0.00031 | ms/batch 328.42 | loss  4.38 | ppl    79.982
----------------------------------------------------------------------------------------------------
| Eval  69 at step    27600 | time: 136.59s | valid loss  4.34 | valid ppl    76.573
----------------------------------------------------------------------------------------------------
| epoch  64 step    27650 |    182 batches | lr 0.000309 | ms/batch 467.30 | loss  4.38 | ppl    80.019
| epoch  64 step    27700 |    232 batches | lr 0.000309 | ms/batch 329.63 | loss  4.40 | ppl    81.311
| epoch  64 step    27750 |    282 batches | lr 0.000309 | ms/batch 330.24 | loss  4.42 | ppl    82.805
| epoch  64 step    27800 |    332 batches | lr 0.000309 | ms/batch 330.10 | loss  4.32 | ppl    75.448
| epoch  64 step    27850 |    382 batches | lr 0.000309 | ms/batch 330.05 | loss  4.37 | ppl    78.945
| epoch  64 step    27900 |    432 batches | lr 0.000309 | ms/batch 331.30 | loss  4.41 | ppl    81.942
| epoch  65 step    27950 |     46 batches | lr 0.000309 | ms/batch 320.09 | loss  4.38 | ppl    79.819
| epoch  65 step    28000 |     96 batches | lr 0.000308 | ms/batch 326.73 | loss  4.33 | ppl    76.128
----------------------------------------------------------------------------------------------------
| Eval  70 at step    28000 | time: 136.57s | valid loss  4.34 | valid ppl    76.540
----------------------------------------------------------------------------------------------------
| epoch  65 step    28050 |    146 batches | lr 0.000308 | ms/batch 468.99 | loss  4.37 | ppl    79.334
| epoch  65 step    28100 |    196 batches | lr 0.000308 | ms/batch 331.19 | loss  4.40 | ppl    81.642
| epoch  65 step    28150 |    246 batches | lr 0.000308 | ms/batch 330.73 | loss  4.40 | ppl    81.655
| epoch  65 step    28200 |    296 batches | lr 0.000308 | ms/batch 330.93 | loss  4.45 | ppl    86.029
| epoch  65 step    28250 |    346 batches | lr 0.000308 | ms/batch 329.63 | loss  4.30 | ppl    73.367
| epoch  65 step    28300 |    396 batches | lr 0.000308 | ms/batch 332.03 | loss  4.38 | ppl    79.446
| epoch  66 step    28350 |     10 batches | lr 0.000307 | ms/batch 323.42 | loss  4.44 | ppl    84.854
| epoch  66 step    28400 |     60 batches | lr 0.000307 | ms/batch 327.40 | loss  4.34 | ppl    76.504
----------------------------------------------------------------------------------------------------
| Eval  71 at step    28400 | time: 137.21s | valid loss  4.33 | valid ppl    75.779
----------------------------------------------------------------------------------------------------
| epoch  66 step    28450 |    110 batches | lr 0.000307 | ms/batch 473.42 | loss  4.37 | ppl    79.025
| epoch  66 step    28500 |    160 batches | lr 0.000307 | ms/batch 332.68 | loss  4.38 | ppl    80.025
| epoch  66 step    28550 |    210 batches | lr 0.000307 | ms/batch 332.34 | loss  4.39 | ppl    80.905
| epoch  66 step    28600 |    260 batches | lr 0.000307 | ms/batch 332.58 | loss  4.40 | ppl    81.457
| epoch  66 step    28650 |    310 batches | lr 0.000307 | ms/batch 331.91 | loss  4.39 | ppl    80.943
| epoch  66 step    28700 |    360 batches | lr 0.000306 | ms/batch 344.89 | loss  4.31 | ppl    74.081
| epoch  66 step    28750 |    410 batches | lr 0.000306 | ms/batch 331.28 | loss  4.37 | ppl    79.093
| epoch  67 step    28800 |     24 batches | lr 0.000306 | ms/batch 322.78 | loss  4.43 | ppl    83.728
----------------------------------------------------------------------------------------------------
| Eval  72 at step    28800 | time: 138.18s | valid loss  4.34 | valid ppl    76.658
----------------------------------------------------------------------------------------------------
| epoch  67 step    28850 |     74 batches | lr 0.000306 | ms/batch 434.80 | loss  4.30 | ppl    73.507
| epoch  67 step    28900 |    124 batches | lr 0.000306 | ms/batch 329.83 | loss  4.34 | ppl    77.044
| epoch  67 step    28950 |    174 batches | lr 0.000306 | ms/batch 330.20 | loss  4.38 | ppl    79.819
| epoch  67 step    29000 |    224 batches | lr 0.000306 | ms/batch 330.04 | loss  4.40 | ppl    81.203
| epoch  67 step    29050 |    274 batches | lr 0.000305 | ms/batch 329.87 | loss  4.39 | ppl    80.975
| epoch  67 step    29100 |    324 batches | lr 0.000305 | ms/batch 329.90 | loss  4.34 | ppl    76.379
| epoch  67 step    29150 |    374 batches | lr 0.000305 | ms/batch 331.52 | loss  4.35 | ppl    77.424
| epoch  67 step    29200 |    424 batches | lr 0.000305 | ms/batch 330.52 | loss  4.36 | ppl    78.135
----------------------------------------------------------------------------------------------------
| Eval  73 at step    29200 | time: 137.37s | valid loss  4.33 | valid ppl    75.786
----------------------------------------------------------------------------------------------------
| epoch  68 step    29250 |     38 batches | lr 0.000305 | ms/batch 429.46 | loss  4.38 | ppl    79.608
| epoch  68 step    29300 |     88 batches | lr 0.000305 | ms/batch 331.65 | loss  4.33 | ppl    75.719
| epoch  68 step    29350 |    138 batches | lr 0.000305 | ms/batch 330.64 | loss  4.36 | ppl    77.989
| epoch  68 step    29400 |    188 batches | lr 0.000304 | ms/batch 330.99 | loss  4.38 | ppl    79.932
| epoch  68 step    29450 |    238 batches | lr 0.000304 | ms/batch 330.99 | loss  4.39 | ppl    80.370
| epoch  68 step    29500 |    288 batches | lr 0.000304 | ms/batch 341.59 | loss  4.42 | ppl    82.902
| epoch  68 step    29550 |    338 batches | lr 0.000304 | ms/batch 338.35 | loss  4.28 | ppl    72.489
| epoch  68 step    29600 |    388 batches | lr 0.000304 | ms/batch 329.80 | loss  4.39 | ppl    80.565
----------------------------------------------------------------------------------------------------
| Eval  74 at step    29600 | time: 138.15s | valid loss  4.32 | valid ppl    75.321
----------------------------------------------------------------------------------------------------
| epoch  69 step    29650 |      2 batches | lr 0.000304 | ms/batch 456.96 | loss  4.39 | ppl    80.266
| epoch  69 step    29700 |     52 batches | lr 0.000303 | ms/batch 330.04 | loss  4.33 | ppl    76.200
| epoch  69 step    29750 |    102 batches | lr 0.000303 | ms/batch 330.74 | loss  4.33 | ppl    75.897
| epoch  69 step    29800 |    152 batches | lr 0.000303 | ms/batch 329.94 | loss  4.34 | ppl    77.080
| epoch  69 step    29850 |    202 batches | lr 0.000303 | ms/batch 329.52 | loss  4.36 | ppl    77.952
| epoch  69 step    29900 |    252 batches | lr 0.000303 | ms/batch 330.74 | loss  4.41 | ppl    82.347
| epoch  69 step    29950 |    302 batches | lr 0.000303 | ms/batch 330.19 | loss  4.41 | ppl    82.553
| epoch  69 step    30000 |    352 batches | lr 0.000303 | ms/batch 330.81 | loss  4.26 | ppl    71.159
----------------------------------------------------------------------------------------------------
| Eval  75 at step    30000 | time: 136.77s | valid loss  4.32 | valid ppl    74.963
----------------------------------------------------------------------------------------------------
| epoch  69 step    30050 |    402 batches | lr 0.000302 | ms/batch 481.83 | loss  4.36 | ppl    78.123
| epoch  70 step    30100 |     16 batches | lr 0.000302 | ms/batch 321.59 | loss  4.39 | ppl    80.511
| epoch  70 step    30150 |     66 batches | lr 0.000302 | ms/batch 330.90 | loss  4.32 | ppl    74.837
| epoch  70 step    30200 |    116 batches | lr 0.000302 | ms/batch 330.60 | loss  4.34 | ppl    76.642
| epoch  70 step    30250 |    166 batches | lr 0.000302 | ms/batch 329.75 | loss  4.33 | ppl    75.998
| epoch  70 step    30300 |    216 batches | lr 0.000302 | ms/batch 330.40 | loss  4.38 | ppl    79.801
| epoch  70 step    30350 |    266 batches | lr 0.000302 | ms/batch 331.71 | loss  4.38 | ppl    79.701
| epoch  70 step    30400 |    316 batches | lr 0.000301 | ms/batch 331.77 | loss  4.36 | ppl    78.123
----------------------------------------------------------------------------------------------------
| Eval  76 at step    30400 | time: 137.83s | valid loss  4.32 | valid ppl    75.476
----------------------------------------------------------------------------------------------------
| epoch  70 step    30450 |    366 batches | lr 0.000301 | ms/batch 430.08 | loss  4.30 | ppl    73.942
| epoch  70 step    30500 |    416 batches | lr 0.000301 | ms/batch 325.34 | loss  4.36 | ppl    78.643
| epoch  71 step    30550 |     30 batches | lr 0.000301 | ms/batch 323.21 | loss  4.36 | ppl    78.422
| epoch  71 step    30600 |     80 batches | lr 0.000301 | ms/batch 330.62 | loss  4.31 | ppl    74.208
| epoch  71 step    30650 |    130 batches | lr 0.000301 | ms/batch 330.55 | loss  4.33 | ppl    76.271
| epoch  71 step    30700 |    180 batches | lr 0.0003 | ms/batch 330.01 | loss  4.36 | ppl    77.873
| epoch  71 step    30750 |    230 batches | lr 0.0003 | ms/batch 331.89 | loss  4.36 | ppl    78.570
| epoch  71 step    30800 |    280 batches | lr 0.0003 | ms/batch 333.94 | loss  4.39 | ppl    80.559
----------------------------------------------------------------------------------------------------
| Eval  77 at step    30800 | time: 136.78s | valid loss  4.31 | valid ppl    74.576
----------------------------------------------------------------------------------------------------
| epoch  71 step    30850 |    330 batches | lr 0.0003 | ms/batch 464.24 | loss  4.30 | ppl    73.390
| epoch  71 step    30900 |    380 batches | lr 0.0003 | ms/batch 324.56 | loss  4.33 | ppl    75.980
| epoch  71 step    30950 |    430 batches | lr 0.0003 | ms/batch 325.75 | loss  4.37 | ppl    78.828
| epoch  72 step    31000 |     44 batches | lr 0.0003 | ms/batch 322.80 | loss  4.34 | ppl    76.349
| epoch  72 step    31050 |     94 batches | lr 0.000299 | ms/batch 329.68 | loss  4.29 | ppl    72.671
| epoch  72 step    31100 |    144 batches | lr 0.000299 | ms/batch 329.09 | loss  4.35 | ppl    77.842
| epoch  72 step    31150 |    194 batches | lr 0.000299 | ms/batch 329.26 | loss  4.36 | ppl    78.600
| epoch  72 step    31200 |    244 batches | lr 0.000299 | ms/batch 330.08 | loss  4.37 | ppl    78.760
----------------------------------------------------------------------------------------------------
| Eval  78 at step    31200 | time: 136.02s | valid loss  4.34 | valid ppl    76.347
----------------------------------------------------------------------------------------------------
| epoch  72 step    31250 |    294 batches | lr 0.000299 | ms/batch 429.19 | loss  4.38 | ppl    80.169
| epoch  72 step    31300 |    344 batches | lr 0.000299 | ms/batch 324.19 | loss  4.27 | ppl    71.399
| epoch  72 step    31350 |    394 batches | lr 0.000298 | ms/batch 325.57 | loss  4.34 | ppl    76.870
| epoch  73 step    31400 |      8 batches | lr 0.000298 | ms/batch 319.15 | loss  4.38 | ppl    80.000
| epoch  73 step    31450 |     58 batches | lr 0.000298 | ms/batch 330.36 | loss  4.29 | ppl    73.269
| epoch  73 step    31500 |    108 batches | lr 0.000298 | ms/batch 329.74 | loss  4.31 | ppl    74.673
| epoch  73 step    31550 |    158 batches | lr 0.000298 | ms/batch 330.22 | loss  4.33 | ppl    75.998
| epoch  73 step    31600 |    208 batches | lr 0.000298 | ms/batch 330.74 | loss  4.34 | ppl    76.510
----------------------------------------------------------------------------------------------------
| Eval  79 at step    31600 | time: 135.97s | valid loss  4.31 | valid ppl    74.384
----------------------------------------------------------------------------------------------------
| epoch  73 step    31650 |    258 batches | lr 0.000297 | ms/batch 461.52 | loss  4.35 | ppl    77.812
| epoch  73 step    31700 |    308 batches | lr 0.000297 | ms/batch 325.42 | loss  4.36 | ppl    78.404
| epoch  73 step    31750 |    358 batches | lr 0.000297 | ms/batch 324.04 | loss  4.26 | ppl    70.617
| epoch  73 step    31800 |    408 batches | lr 0.000297 | ms/batch 324.35 | loss  4.31 | ppl    74.656
| epoch  74 step    31850 |     22 batches | lr 0.000297 | ms/batch 320.89 | loss  4.37 | ppl    79.180
| epoch  74 step    31900 |     72 batches | lr 0.000297 | ms/batch 330.20 | loss  4.27 | ppl    71.633
| epoch  74 step    31950 |    122 batches | lr 0.000297 | ms/batch 328.86 | loss  4.33 | ppl    75.672
| epoch  74 step    32000 |    172 batches | lr 0.000296 | ms/batch 329.50 | loss  4.32 | ppl    75.324
----------------------------------------------------------------------------------------------------
| Eval  80 at step    32000 | time: 135.60s | valid loss  4.31 | valid ppl    74.501
----------------------------------------------------------------------------------------------------
| epoch  74 step    32050 |    222 batches | lr 0.000296 | ms/batch 428.67 | loss  4.34 | ppl    76.924
| epoch  74 step    32100 |    272 batches | lr 0.000296 | ms/batch 325.77 | loss  4.35 | ppl    77.485
| epoch  74 step    32150 |    322 batches | lr 0.000296 | ms/batch 326.24 | loss  4.32 | ppl    75.294
| epoch  74 step    32200 |    372 batches | lr 0.000296 | ms/batch 325.87 | loss  4.28 | ppl    72.257
| epoch  74 step    32250 |    422 batches | lr 0.000296 | ms/batch 326.41 | loss  4.33 | ppl    76.063
| epoch  75 step    32300 |     36 batches | lr 0.000295 | ms/batch 320.80 | loss  4.36 | ppl    78.104
| epoch  75 step    32350 |     86 batches | lr 0.000295 | ms/batch 329.68 | loss  4.28 | ppl    72.116
| epoch  75 step    32400 |    136 batches | lr 0.000295 | ms/batch 329.55 | loss  4.32 | ppl    75.130
----------------------------------------------------------------------------------------------------
| Eval  81 at step    32400 | time: 135.66s | valid loss  4.31 | valid ppl    74.326
----------------------------------------------------------------------------------------------------
| epoch  75 step    32450 |    186 batches | lr 0.000295 | ms/batch 477.77 | loss  4.33 | ppl    76.081
| epoch  75 step    32500 |    236 batches | lr 0.000295 | ms/batch 326.42 | loss  4.35 | ppl    77.146
| epoch  75 step    32550 |    286 batches | lr 0.000295 | ms/batch 327.04 | loss  4.36 | ppl    78.056
| epoch  75 step    32600 |    336 batches | lr 0.000294 | ms/batch 326.28 | loss  4.26 | ppl    70.973
| epoch  75 step    32650 |    386 batches | lr 0.000294 | ms/batch 326.31 | loss  4.34 | ppl    76.714
| epoch  75 step    32700 |    436 batches | lr 0.000294 | ms/batch 322.07 | loss  4.34 | ppl    76.663
| epoch  76 step    32750 |     50 batches | lr 0.000294 | ms/batch 328.90 | loss  4.30 | ppl    73.936
| epoch  76 step    32800 |    100 batches | lr 0.000294 | ms/batch 332.17 | loss  4.28 | ppl    72.009
----------------------------------------------------------------------------------------------------
| Eval  82 at step    32800 | time: 136.04s | valid loss  4.31 | valid ppl    74.351
----------------------------------------------------------------------------------------------------
| epoch  76 step    32850 |    150 batches | lr 0.000294 | ms/batch 430.17 | loss  4.31 | ppl    74.284
| epoch  76 step    32900 |    200 batches | lr 0.000294 | ms/batch 337.14 | loss  4.31 | ppl    74.650
| epoch  76 step    32950 |    250 batches | lr 0.000293 | ms/batch 340.37 | loss  4.35 | ppl    77.327
| epoch  76 step    33000 |    300 batches | lr 0.000293 | ms/batch 325.41 | loss  4.35 | ppl    77.709
| epoch  76 step    33050 |    350 batches | lr 0.000293 | ms/batch 325.22 | loss  4.22 | ppl    67.943
| epoch  76 step    33100 |    400 batches | lr 0.000293 | ms/batch 327.11 | loss  4.31 | ppl    74.557
| epoch  77 step    33150 |     14 batches | lr 0.000293 | ms/batch 319.61 | loss  4.35 | ppl    77.745
| epoch  77 step    33200 |     64 batches | lr 0.000293 | ms/batch 330.24 | loss  4.25 | ppl    70.451
----------------------------------------------------------------------------------------------------
| Eval  83 at step    33200 | time: 136.73s | valid loss  4.30 | valid ppl    73.776
----------------------------------------------------------------------------------------------------
| epoch  77 step    33250 |    114 batches | lr 0.000292 | ms/batch 461.42 | loss  4.29 | ppl    73.189
| epoch  77 step    33300 |    164 batches | lr 0.000292 | ms/batch 325.43 | loss  4.31 | ppl    74.708
| epoch  77 step    33350 |    214 batches | lr 0.000292 | ms/batch 326.93 | loss  4.33 | ppl    75.796
| epoch  77 step    33400 |    264 batches | lr 0.000292 | ms/batch 326.71 | loss  4.33 | ppl    75.719
| epoch  77 step    33450 |    314 batches | lr 0.000292 | ms/batch 324.29 | loss  4.32 | ppl    75.395
| epoch  77 step    33500 |    364 batches | lr 0.000292 | ms/batch 325.25 | loss  4.24 | ppl    69.416
| epoch  77 step    33550 |    414 batches | lr 0.000291 | ms/batch 324.90 | loss  4.30 | ppl    73.896
| epoch  78 step    33600 |     28 batches | lr 0.000291 | ms/batch 321.32 | loss  4.34 | ppl    76.549
----------------------------------------------------------------------------------------------------
| Eval  84 at step    33600 | time: 135.15s | valid loss  4.30 | valid ppl    73.643
----------------------------------------------------------------------------------------------------
| epoch  78 step    33650 |     78 batches | lr 0.000291 | ms/batch 460.11 | loss  4.27 | ppl    71.633
| epoch  78 step    33700 |    128 batches | lr 0.000291 | ms/batch 325.68 | loss  4.30 | ppl    74.046
| epoch  78 step    33750 |    178 batches | lr 0.000291 | ms/batch 324.46 | loss  4.32 | ppl    75.536
| epoch  78 step    33800 |    228 batches | lr 0.000291 | ms/batch 324.92 | loss  4.33 | ppl    75.962
| epoch  78 step    33850 |    278 batches | lr 0.00029 | ms/batch 325.46 | loss  4.33 | ppl    75.998
| epoch  78 step    33900 |    328 batches | lr 0.00029 | ms/batch 323.98 | loss  4.27 | ppl    71.768
| epoch  78 step    33950 |    378 batches | lr 0.00029 | ms/batch 331.66 | loss  4.29 | ppl    72.676
| epoch  78 step    34000 |    428 batches | lr 0.00029 | ms/batch 341.32 | loss  4.34 | ppl    76.767
----------------------------------------------------------------------------------------------------
| Eval  85 at step    34000 | time: 136.40s | valid loss  4.30 | valid ppl    73.641
----------------------------------------------------------------------------------------------------
| epoch  79 step    34050 |     42 batches | lr 0.00029 | ms/batch 460.93 | loss  4.32 | ppl    75.336
| epoch  79 step    34100 |     92 batches | lr 0.00029 | ms/batch 325.31 | loss  4.25 | ppl    69.791
| epoch  79 step    34150 |    142 batches | lr 0.000289 | ms/batch 324.29 | loss  4.30 | ppl    73.613
| epoch  79 step    34200 |    192 batches | lr 0.000289 | ms/batch 326.18 | loss  4.32 | ppl    75.536
| epoch  79 step    34250 |    242 batches | lr 0.000289 | ms/batch 326.03 | loss  4.30 | ppl    73.977
| epoch  79 step    34300 |    292 batches | lr 0.000289 | ms/batch 324.31 | loss  4.37 | ppl    79.167
| epoch  79 step    34350 |    342 batches | lr 0.000289 | ms/batch 324.99 | loss  4.22 | ppl    67.917
| epoch  79 step    34400 |    392 batches | lr 0.000289 | ms/batch 325.15 | loss  4.30 | ppl    73.775
----------------------------------------------------------------------------------------------------
| Eval  86 at step    34400 | time: 134.91s | valid loss  4.30 | valid ppl    73.495
----------------------------------------------------------------------------------------------------
| epoch  80 step    34450 |      6 batches | lr 0.000288 | ms/batch 460.51 | loss  4.34 | ppl    76.906
| epoch  80 step    34500 |     56 batches | lr 0.000288 | ms/batch 325.04 | loss  4.28 | ppl    71.993
| epoch  80 step    34550 |    106 batches | lr 0.000288 | ms/batch 325.62 | loss  4.29 | ppl    72.688
| epoch  80 step    34600 |    156 batches | lr 0.000288 | ms/batch 325.66 | loss  4.27 | ppl    71.393
| epoch  80 step    34650 |    206 batches | lr 0.000288 | ms/batch 325.30 | loss  4.32 | ppl    75.018
| epoch  80 step    34700 |    256 batches | lr 0.000288 | ms/batch 325.21 | loss  4.29 | ppl    72.835
| epoch  80 step    34750 |    306 batches | lr 0.000287 | ms/batch 325.03 | loss  4.33 | ppl    75.891
| epoch  80 step    34800 |    356 batches | lr 0.000287 | ms/batch 324.71 | loss  4.23 | ppl    68.965
----------------------------------------------------------------------------------------------------
| Eval  87 at step    34800 | time: 135.22s | valid loss  4.29 | valid ppl    72.628
----------------------------------------------------------------------------------------------------
| epoch  80 step    34850 |    406 batches | lr 0.000287 | ms/batch 465.41 | loss  4.28 | ppl    71.886
| epoch  81 step    34900 |     20 batches | lr 0.000287 | ms/batch 322.01 | loss  4.34 | ppl    77.092
| epoch  81 step    34950 |     70 batches | lr 0.000287 | ms/batch 324.99 | loss  4.25 | ppl    70.116
| epoch  81 step    35000 |    120 batches | lr 0.000287 | ms/batch 325.62 | loss  4.29 | ppl    72.648
| epoch  81 step    35050 |    170 batches | lr 0.000286 | ms/batch 325.61 | loss  4.29 | ppl    73.041
| epoch  81 step    35100 |    220 batches | lr 0.000286 | ms/batch 325.21 | loss  4.33 | ppl    75.566
| epoch  81 step    35150 |    270 batches | lr 0.000286 | ms/batch 325.71 | loss  4.31 | ppl    74.260
| epoch  81 step    35200 |    320 batches | lr 0.000286 | ms/batch 326.32 | loss  4.28 | ppl    72.314
----------------------------------------------------------------------------------------------------
| Eval  88 at step    35200 | time: 135.43s | valid loss  4.30 | valid ppl    73.746
----------------------------------------------------------------------------------------------------
| epoch  81 step    35250 |    370 batches | lr 0.000286 | ms/batch 433.82 | loss  4.24 | ppl    69.492
| epoch  81 step    35300 |    420 batches | lr 0.000286 | ms/batch 330.63 | loss  4.28 | ppl    72.560
| epoch  82 step    35350 |     34 batches | lr 0.000285 | ms/batch 320.70 | loss  4.34 | ppl    76.984
| epoch  82 step    35400 |     84 batches | lr 0.000285 | ms/batch 326.24 | loss  4.23 | ppl    68.946
| epoch  82 step    35450 |    134 batches | lr 0.000285 | ms/batch 328.25 | loss  4.29 | ppl    72.995
| epoch  82 step    35500 |    184 batches | lr 0.000285 | ms/batch 326.05 | loss  4.28 | ppl    72.433
| epoch  82 step    35550 |    234 batches | lr 0.000285 | ms/batch 327.76 | loss  4.31 | ppl    74.168
| epoch  82 step    35600 |    284 batches | lr 0.000285 | ms/batch 326.42 | loss  4.32 | ppl    75.501
----------------------------------------------------------------------------------------------------
| Eval  89 at step    35600 | time: 136.01s | valid loss  4.29 | valid ppl    73.290
----------------------------------------------------------------------------------------------------
| epoch  82 step    35650 |    334 batches | lr 0.000284 | ms/batch 435.98 | loss  4.22 | ppl    67.925
| epoch  82 step    35700 |    384 batches | lr 0.000284 | ms/batch 329.63 | loss  4.30 | ppl    73.384
| epoch  82 step    35750 |    434 batches | lr 0.000284 | ms/batch 329.65 | loss  4.34 | ppl    76.966
| epoch  83 step    35800 |     48 batches | lr 0.000284 | ms/batch 319.15 | loss  4.24 | ppl    69.310
| epoch  83 step    35850 |     98 batches | lr 0.000284 | ms/batch 326.00 | loss  4.26 | ppl    70.534
| epoch  83 step    35900 |    148 batches | lr 0.000283 | ms/batch 325.91 | loss  4.28 | ppl    72.489
| epoch  83 step    35950 |    198 batches | lr 0.000283 | ms/batch 326.71 | loss  4.29 | ppl    73.332
| epoch  83 step    36000 |    248 batches | lr 0.000283 | ms/batch 325.03 | loss  4.30 | ppl    73.384
----------------------------------------------------------------------------------------------------
| Eval  90 at step    36000 | time: 135.88s | valid loss  4.30 | valid ppl    73.941
----------------------------------------------------------------------------------------------------
| epoch  83 step    36050 |    298 batches | lr 0.000283 | ms/batch 434.02 | loss  4.33 | ppl    75.684
| epoch  83 step    36100 |    348 batches | lr 0.000283 | ms/batch 329.66 | loss  4.18 | ppl    65.203
| epoch  83 step    36150 |    398 batches | lr 0.000283 | ms/batch 329.33 | loss  4.27 | ppl    71.734
| epoch  84 step    36200 |     12 batches | lr 0.000282 | ms/batch 322.60 | loss  4.34 | ppl    76.486
| epoch  84 step    36250 |     62 batches | lr 0.000282 | ms/batch 325.53 | loss  4.24 | ppl    69.392
| epoch  84 step    36300 |    112 batches | lr 0.000282 | ms/batch 325.50 | loss  4.26 | ppl    70.804
| epoch  84 step    36350 |    162 batches | lr 0.000282 | ms/batch 325.78 | loss  4.27 | ppl    71.858
| epoch  84 step    36400 |    212 batches | lr 0.000282 | ms/batch 325.92 | loss  4.29 | ppl    73.281
----------------------------------------------------------------------------------------------------
| Eval  91 at step    36400 | time: 135.95s | valid loss  4.30 | valid ppl    73.958
----------------------------------------------------------------------------------------------------
| epoch  84 step    36450 |    262 batches | lr 0.000282 | ms/batch 433.21 | loss  4.31 | ppl    74.510
| epoch  84 step    36500 |    312 batches | lr 0.000281 | ms/batch 329.43 | loss  4.31 | ppl    74.173
| epoch  84 step    36550 |    362 batches | lr 0.000281 | ms/batch 329.69 | loss  4.25 | ppl    69.952
| epoch  84 step    36600 |    412 batches | lr 0.000281 | ms/batch 331.68 | loss  4.28 | ppl    72.111
| epoch  85 step    36650 |     26 batches | lr 0.000281 | ms/batch 321.60 | loss  4.31 | ppl    74.098
| epoch  85 step    36700 |     76 batches | lr 0.000281 | ms/batch 325.38 | loss  4.23 | ppl    68.500
| epoch  85 step    36750 |    126 batches | lr 0.000281 | ms/batch 325.83 | loss  4.27 | ppl    71.332
| epoch  85 step    36800 |    176 batches | lr 0.00028 | ms/batch 324.75 | loss  4.27 | ppl    71.589
----------------------------------------------------------------------------------------------------
| Eval  92 at step    36800 | time: 136.05s | valid loss  4.28 | valid ppl    72.103
----------------------------------------------------------------------------------------------------
| epoch  85 step    36850 |    226 batches | lr 0.00028 | ms/batch 466.14 | loss  4.27 | ppl    71.712
| epoch  85 step    36900 |    276 batches | lr 0.00028 | ms/batch 330.57 | loss  4.28 | ppl    72.297
| epoch  85 step    36950 |    326 batches | lr 0.00028 | ms/batch 329.67 | loss  4.23 | ppl    68.932
| epoch  85 step    37000 |    376 batches | lr 0.00028 | ms/batch 329.65 | loss  4.26 | ppl    71.026
| epoch  85 step    37050 |    426 batches | lr 0.000279 | ms/batch 329.08 | loss  4.29 | ppl    72.949
| epoch  86 step    37100 |     40 batches | lr 0.000279 | ms/batch 319.03 | loss  4.26 | ppl    71.007
| epoch  86 step    37150 |     90 batches | lr 0.000279 | ms/batch 325.67 | loss  4.20 | ppl    66.699
| epoch  86 step    37200 |    140 batches | lr 0.000279 | ms/batch 324.55 | loss  4.28 | ppl    72.015
----------------------------------------------------------------------------------------------------
| Eval  93 at step    37200 | time: 136.10s | valid loss  4.28 | valid ppl    72.431
----------------------------------------------------------------------------------------------------
| epoch  86 step    37250 |    190 batches | lr 0.000279 | ms/batch 432.96 | loss  4.28 | ppl    72.450
| epoch  86 step    37300 |    240 batches | lr 0.000279 | ms/batch 330.27 | loss  4.29 | ppl    73.315
| epoch  86 step    37350 |    290 batches | lr 0.000278 | ms/batch 331.61 | loss  4.33 | ppl    76.206
| epoch  86 step    37400 |    340 batches | lr 0.000278 | ms/batch 330.47 | loss  4.20 | ppl    66.629
| epoch  86 step    37450 |    390 batches | lr 0.000278 | ms/batch 331.00 | loss  4.27 | ppl    71.661
| epoch  87 step    37500 |      4 batches | lr 0.000278 | ms/batch 323.45 | loss  4.27 | ppl    71.799
| epoch  87 step    37550 |     54 batches | lr 0.000278 | ms/batch 326.23 | loss  4.22 | ppl    67.922
| epoch  87 step    37600 |    104 batches | lr 0.000278 | ms/batch 324.75 | loss  4.23 | ppl    68.412
----------------------------------------------------------------------------------------------------
| Eval  94 at step    37600 | time: 136.53s | valid loss  4.29 | valid ppl    72.652
----------------------------------------------------------------------------------------------------
| epoch  87 step    37650 |    154 batches | lr 0.000277 | ms/batch 433.66 | loss  4.23 | ppl    68.943
| epoch  87 step    37700 |    204 batches | lr 0.000277 | ms/batch 330.23 | loss  4.27 | ppl    71.639
| epoch  87 step    37750 |    254 batches | lr 0.000277 | ms/batch 330.47 | loss  4.27 | ppl    71.852
| epoch  87 step    37800 |    304 batches | lr 0.000277 | ms/batch 329.62 | loss  4.33 | ppl    75.678
| epoch  87 step    37850 |    354 batches | lr 0.000277 | ms/batch 328.78 | loss  4.19 | ppl    66.341
| epoch  87 step    37900 |    404 batches | lr 0.000276 | ms/batch 330.67 | loss  4.26 | ppl    70.506
| epoch  88 step    37950 |     18 batches | lr 0.000276 | ms/batch 323.09 | loss  4.30 | ppl    73.731
| epoch  88 step    38000 |     68 batches | lr 0.000276 | ms/batch 327.47 | loss  4.21 | ppl    67.196
----------------------------------------------------------------------------------------------------
| Eval  95 at step    38000 | time: 136.72s | valid loss  4.27 | valid ppl    71.664
----------------------------------------------------------------------------------------------------
| epoch  88 step    38050 |    118 batches | lr 0.000276 | ms/batch 470.09 | loss  4.25 | ppl    70.237
| epoch  88 step    38100 |    168 batches | lr 0.000276 | ms/batch 331.16 | loss  4.26 | ppl    70.534
| epoch  88 step    38150 |    218 batches | lr 0.000276 | ms/batch 330.93 | loss  4.28 | ppl    72.376
| epoch  88 step    38200 |    268 batches | lr 0.000275 | ms/batch 332.28 | loss  4.29 | ppl    72.779
| epoch  88 step    38250 |    318 batches | lr 0.000275 | ms/batch 331.04 | loss  4.25 | ppl    69.974
| epoch  88 step    38300 |    368 batches | lr 0.000275 | ms/batch 332.25 | loss  4.22 | ppl    68.289
| epoch  88 step    38350 |    418 batches | lr 0.000275 | ms/batch 331.08 | loss  4.27 | ppl    71.279
| epoch  89 step    38400 |     32 batches | lr 0.000275 | ms/batch 322.86 | loss  4.27 | ppl    71.438
----------------------------------------------------------------------------------------------------
| Eval  96 at step    38400 | time: 137.36s | valid loss  4.27 | valid ppl    71.850
----------------------------------------------------------------------------------------------------
| epoch  89 step    38450 |     82 batches | lr 0.000274 | ms/batch 434.31 | loss  4.22 | ppl    67.721
| epoch  89 step    38500 |    132 batches | lr 0.000274 | ms/batch 330.43 | loss  4.24 | ppl    69.658
| epoch  89 step    38550 |    182 batches | lr 0.000274 | ms/batch 331.73 | loss  4.27 | ppl    71.455
| epoch  89 step    38600 |    232 batches | lr 0.000274 | ms/batch 333.19 | loss  4.28 | ppl    72.263
| epoch  89 step    38650 |    282 batches | lr 0.000274 | ms/batch 331.14 | loss  4.29 | ppl    72.861
| epoch  89 step    38700 |    332 batches | lr 0.000274 | ms/batch 330.13 | loss  4.21 | ppl    67.552
| epoch  89 step    38750 |    382 batches | lr 0.000273 | ms/batch 329.19 | loss  4.24 | ppl    69.186
| epoch  89 step    38800 |    432 batches | lr 0.000273 | ms/batch 329.58 | loss  4.27 | ppl    71.852
----------------------------------------------------------------------------------------------------
| Eval  97 at step    38800 | time: 137.46s | valid loss  4.28 | valid ppl    72.402
----------------------------------------------------------------------------------------------------
| epoch  90 step    38850 |     46 batches | lr 0.000273 | ms/batch 427.29 | loss  4.24 | ppl    69.224
| epoch  90 step    38900 |     96 batches | lr 0.000273 | ms/batch 328.41 | loss  4.23 | ppl    68.594
| epoch  90 step    38950 |    146 batches | lr 0.000273 | ms/batch 330.10 | loss  4.26 | ppl    70.531
| epoch  90 step    39000 |    196 batches | lr 0.000272 | ms/batch 329.45 | loss  4.26 | ppl    70.661
| epoch  90 step    39050 |    246 batches | lr 0.000272 | ms/batch 330.07 | loss  4.27 | ppl    71.807
| epoch  90 step    39100 |    296 batches | lr 0.000272 | ms/batch 329.59 | loss  4.29 | ppl    72.750
| epoch  90 step    39150 |    346 batches | lr 0.000272 | ms/batch 329.54 | loss  4.17 | ppl    64.478
| epoch  90 step    39200 |    396 batches | lr 0.000272 | ms/batch 330.56 | loss  4.26 | ppl    70.926
----------------------------------------------------------------------------------------------------
| Eval  98 at step    39200 | time: 136.73s | valid loss  4.27 | valid ppl    71.751
----------------------------------------------------------------------------------------------------
| epoch  91 step    39250 |     10 batches | lr 0.000272 | ms/batch 422.04 | loss  4.29 | ppl    72.927
| epoch  91 step    39300 |     60 batches | lr 0.000271 | ms/batch 330.31 | loss  4.21 | ppl    67.194
| epoch  91 step    39350 |    110 batches | lr 0.000271 | ms/batch 329.07 | loss  4.20 | ppl    66.885
| epoch  91 step    39400 |    160 batches | lr 0.000271 | ms/batch 329.26 | loss  4.26 | ppl    70.614
| epoch  91 step    39450 |    210 batches | lr 0.000271 | ms/batch 329.88 | loss  4.24 | ppl    69.690
| epoch  91 step    39500 |    260 batches | lr 0.000271 | ms/batch 331.64 | loss  4.27 | ppl    71.869
| epoch  91 step    39550 |    310 batches | lr 0.00027 | ms/batch 330.02 | loss  4.28 | ppl    72.026
| epoch  91 step    39600 |    360 batches | lr 0.00027 | ms/batch 330.67 | loss  4.19 | ppl    66.266
----------------------------------------------------------------------------------------------------
| Eval  99 at step    39600 | time: 136.65s | valid loss  4.27 | valid ppl    71.823
----------------------------------------------------------------------------------------------------
| epoch  91 step    39650 |    410 batches | lr 0.00027 | ms/batch 429.26 | loss  4.24 | ppl    69.105
| epoch  92 step    39700 |     24 batches | lr 0.00027 | ms/batch 321.09 | loss  4.26 | ppl    70.619
| epoch  92 step    39750 |     74 batches | lr 0.00027 | ms/batch 330.37 | loss  4.19 | ppl    65.940
| epoch  92 step    39800 |    124 batches | lr 0.00027 | ms/batch 330.62 | loss  4.24 | ppl    69.603
| epoch  92 step    39850 |    174 batches | lr 0.000269 | ms/batch 331.68 | loss  4.23 | ppl    69.005
| epoch  92 step    39900 |    224 batches | lr 0.000269 | ms/batch 332.45 | loss  4.26 | ppl    70.573
| epoch  92 step    39950 |    274 batches | lr 0.000269 | ms/batch 342.22 | loss  4.29 | ppl    72.841
| epoch  92 step    40000 |    324 batches | lr 0.000269 | ms/batch 331.77 | loss  4.20 | ppl    66.624
----------------------------------------------------------------------------------------------------
| Eval 100 at step    40000 | time: 137.49s | valid loss  4.29 | valid ppl    73.088
----------------------------------------------------------------------------------------------------
| epoch  92 step    40050 |    374 batches | lr 0.000269 | ms/batch 429.65 | loss  4.22 | ppl    67.837
| epoch  92 step    40100 |    424 batches | lr 0.000268 | ms/batch 327.61 | loss  4.22 | ppl    67.954
| epoch  93 step    40150 |     38 batches | lr 0.000268 | ms/batch 323.01 | loss  4.24 | ppl    69.381
| epoch  93 step    40200 |     88 batches | lr 0.000268 | ms/batch 329.49 | loss  4.22 | ppl    68.068
| epoch  93 step    40250 |    138 batches | lr 0.000268 | ms/batch 330.99 | loss  4.25 | ppl    69.870
| epoch  93 step    40300 |    188 batches | lr 0.000268 | ms/batch 329.63 | loss  4.25 | ppl    69.990
| epoch  93 step    40350 |    238 batches | lr 0.000267 | ms/batch 329.26 | loss  4.25 | ppl    69.881
| epoch  93 step    40400 |    288 batches | lr 0.000267 | ms/batch 329.32 | loss  4.26 | ppl    71.115
----------------------------------------------------------------------------------------------------
| Eval 101 at step    40400 | time: 136.47s | valid loss  4.28 | valid ppl    72.040
----------------------------------------------------------------------------------------------------
| epoch  93 step    40450 |    338 batches | lr 0.000267 | ms/batch 427.08 | loss  4.16 | ppl    63.969
| epoch  93 step    40500 |    388 batches | lr 0.000267 | ms/batch 325.44 | loss  4.23 | ppl    68.680
| epoch  94 step    40550 |      2 batches | lr 0.000267 | ms/batch 321.26 | loss  4.26 | ppl    70.564
| epoch  94 step    40600 |     52 batches | lr 0.000267 | ms/batch 329.14 | loss  4.21 | ppl    67.141
| epoch  94 step    40650 |    102 batches | lr 0.000266 | ms/batch 330.62 | loss  4.21 | ppl    67.557
| epoch  94 step    40700 |    152 batches | lr 0.000266 | ms/batch 331.67 | loss  4.25 | ppl    70.073
| epoch  94 step    40750 |    202 batches | lr 0.000266 | ms/batch 330.84 | loss  4.24 | ppl    69.446
| epoch  94 step    40800 |    252 batches | lr 0.000266 | ms/batch 330.73 | loss  4.26 | ppl    71.101
----------------------------------------------------------------------------------------------------
| Eval 102 at step    40800 | time: 136.37s | valid loss  4.29 | valid ppl    73.307
----------------------------------------------------------------------------------------------------
| epoch  94 step    40850 |    302 batches | lr 0.000266 | ms/batch 430.26 | loss  4.29 | ppl    73.223
| epoch  94 step    40900 |    352 batches | lr 0.000265 | ms/batch 327.20 | loss  4.17 | ppl    64.627
| epoch  94 step    40950 |    402 batches | lr 0.000265 | ms/batch 326.54 | loss  4.22 | ppl    68.348
| epoch  95 step    41000 |     16 batches | lr 0.000265 | ms/batch 322.39 | loss  4.25 | ppl    70.314
| epoch  95 step    41050 |     66 batches | lr 0.000265 | ms/batch 332.36 | loss  4.20 | ppl    66.400
| epoch  95 step    41100 |    116 batches | lr 0.000265 | ms/batch 332.45 | loss  4.20 | ppl    66.587
| epoch  95 step    41150 |    166 batches | lr 0.000264 | ms/batch 332.34 | loss  4.23 | ppl    68.645
| epoch  95 step    41200 |    216 batches | lr 0.000264 | ms/batch 330.76 | loss  4.26 | ppl    70.746
----------------------------------------------------------------------------------------------------
| Eval 103 at step    41200 | time: 136.73s | valid loss  4.27 | valid ppl    71.497
----------------------------------------------------------------------------------------------------
| epoch  95 step    41250 |    266 batches | lr 0.000264 | ms/batch 466.57 | loss  4.23 | ppl    68.447
| epoch  95 step    41300 |    316 batches | lr 0.000264 | ms/batch 327.84 | loss  4.23 | ppl    68.997
| epoch  95 step    41350 |    366 batches | lr 0.000264 | ms/batch 327.53 | loss  4.19 | ppl    65.745
| epoch  95 step    41400 |    416 batches | lr 0.000264 | ms/batch 327.39 | loss  4.21 | ppl    67.644
| epoch  96 step    41450 |     30 batches | lr 0.000263 | ms/batch 324.57 | loss  4.25 | ppl    70.267
| epoch  96 step    41500 |     80 batches | lr 0.000263 | ms/batch 332.08 | loss  4.18 | ppl    65.126
| epoch  96 step    41550 |    130 batches | lr 0.000263 | ms/batch 332.12 | loss  4.23 | ppl    68.423
| epoch  96 step    41600 |    180 batches | lr 0.000263 | ms/batch 330.87 | loss  4.22 | ppl    68.023
----------------------------------------------------------------------------------------------------
| Eval 104 at step    41600 | time: 136.73s | valid loss  4.25 | valid ppl    70.385
----------------------------------------------------------------------------------------------------
| epoch  96 step    41650 |    230 batches | lr 0.000263 | ms/batch 470.20 | loss  4.25 | ppl    70.018
| epoch  96 step    41700 |    280 batches | lr 0.000262 | ms/batch 327.05 | loss  4.24 | ppl    69.718
| epoch  96 step    41750 |    330 batches | lr 0.000262 | ms/batch 327.48 | loss  4.18 | ppl    65.358
| epoch  96 step    41800 |    380 batches | lr 0.000262 | ms/batch 328.09 | loss  4.21 | ppl    67.459
| epoch  96 step    41850 |    430 batches | lr 0.000262 | ms/batch 328.05 | loss  4.22 | ppl    68.329
| epoch  97 step    41900 |     44 batches | lr 0.000262 | ms/batch 324.55 | loss  4.21 | ppl    67.362
| epoch  97 step    41950 |     94 batches | lr 0.000261 | ms/batch 332.09 | loss  4.16 | ppl    63.782
| epoch  97 step    42000 |    144 batches | lr 0.000261 | ms/batch 332.82 | loss  4.23 | ppl    68.546
----------------------------------------------------------------------------------------------------
| Eval 105 at step    42000 | time: 136.54s | valid loss  4.27 | valid ppl    71.519
----------------------------------------------------------------------------------------------------
| epoch  97 step    42050 |    194 batches | lr 0.000261 | ms/batch 431.81 | loss  4.23 | ppl    68.911
| epoch  97 step    42100 |    244 batches | lr 0.000261 | ms/batch 329.02 | loss  4.25 | ppl    70.062
| epoch  97 step    42150 |    294 batches | lr 0.000261 | ms/batch 328.38 | loss  4.27 | ppl    71.550
| epoch  97 step    42200 |    344 batches | lr 0.00026 | ms/batch 330.01 | loss  4.12 | ppl    61.384
| epoch  97 step    42250 |    394 batches | lr 0.00026 | ms/batch 327.40 | loss  4.23 | ppl    68.817
| epoch  98 step    42300 |      8 batches | lr 0.00026 | ms/batch 321.24 | loss  4.27 | ppl    71.193
| epoch  98 step    42350 |     58 batches | lr 0.00026 | ms/batch 330.65 | loss  4.16 | ppl    64.046
| epoch  98 step    42400 |    108 batches | lr 0.00026 | ms/batch 330.63 | loss  4.20 | ppl    66.483
----------------------------------------------------------------------------------------------------
| Eval 106 at step    42400 | time: 136.45s | valid loss  4.26 | valid ppl    71.077
----------------------------------------------------------------------------------------------------
| epoch  98 step    42450 |    158 batches | lr 0.000259 | ms/batch 431.10 | loss  4.20 | ppl    66.903
| epoch  98 step    42500 |    208 batches | lr 0.000259 | ms/batch 325.78 | loss  4.23 | ppl    68.562
| epoch  98 step    42550 |    258 batches | lr 0.000259 | ms/batch 326.58 | loss  4.25 | ppl    70.215
| epoch  98 step    42600 |    308 batches | lr 0.000259 | ms/batch 325.64 | loss  4.25 | ppl    69.756
| epoch  98 step    42650 |    358 batches | lr 0.000259 | ms/batch 326.03 | loss  4.18 | ppl    65.231
| epoch  98 step    42700 |    408 batches | lr 0.000259 | ms/batch 326.35 | loss  4.20 | ppl    66.457
| epoch  99 step    42750 |     22 batches | lr 0.000258 | ms/batch 322.31 | loss  4.26 | ppl    70.829
| epoch  99 step    42800 |     72 batches | lr 0.000258 | ms/batch 330.64 | loss  4.16 | ppl    64.147
----------------------------------------------------------------------------------------------------
| Eval 107 at step    42800 | time: 135.73s | valid loss  4.27 | valid ppl    71.588
----------------------------------------------------------------------------------------------------
| epoch  99 step    42850 |    122 batches | lr 0.000258 | ms/batch 430.62 | loss  4.21 | ppl    67.436
| epoch  99 step    42900 |    172 batches | lr 0.000258 | ms/batch 326.07 | loss  4.22 | ppl    67.715
| epoch  99 step    42950 |    222 batches | lr 0.000258 | ms/batch 325.52 | loss  4.24 | ppl    69.332
| epoch  99 step    43000 |    272 batches | lr 0.000257 | ms/batch 325.83 | loss  4.22 | ppl    68.305
| epoch  99 step    43050 |    322 batches | lr 0.000257 | ms/batch 327.28 | loss  4.18 | ppl    65.604
| epoch  99 step    43100 |    372 batches | lr 0.000257 | ms/batch 327.01 | loss  4.19 | ppl    65.817
| epoch  99 step    43150 |    422 batches | lr 0.000257 | ms/batch 326.05 | loss  4.21 | ppl    67.209
| epoch 100 step    43200 |     36 batches | lr 0.000257 | ms/batch 323.63 | loss  4.23 | ppl    68.924
----------------------------------------------------------------------------------------------------
| Eval 108 at step    43200 | time: 135.61s | valid loss  4.26 | valid ppl    70.845
----------------------------------------------------------------------------------------------------
| epoch 100 step    43250 |     86 batches | lr 0.000256 | ms/batch 429.66 | loss  4.14 | ppl    62.568
| epoch 100 step    43300 |    136 batches | lr 0.000256 | ms/batch 326.34 | loss  4.21 | ppl    67.583
| epoch 100 step    43350 |    186 batches | lr 0.000256 | ms/batch 326.78 | loss  4.20 | ppl    66.788
| epoch 100 step    43400 |    236 batches | lr 0.000256 | ms/batch 325.98 | loss  4.22 | ppl    67.980
| epoch 100 step    43450 |    286 batches | lr 0.000256 | ms/batch 325.64 | loss  4.25 | ppl    70.221
| epoch 100 step    43500 |    336 batches | lr 0.000255 | ms/batch 326.02 | loss  4.13 | ppl    62.054
| epoch 100 step    43550 |    386 batches | lr 0.000255 | ms/batch 325.76 | loss  4.19 | ppl    66.193
| epoch 100 step    43600 |    436 batches | lr 0.000255 | ms/batch 320.75 | loss  4.25 | ppl    70.407
----------------------------------------------------------------------------------------------------
| Eval 109 at step    43600 | time: 135.33s | valid loss  4.26 | valid ppl    70.573
----------------------------------------------------------------------------------------------------
| epoch 101 step    43650 |     50 batches | lr 0.000255 | ms/batch 426.51 | loss  4.19 | ppl    66.219
| epoch 101 step    43700 |    100 batches | lr 0.000255 | ms/batch 324.95 | loss  4.18 | ppl    65.619
| epoch 101 step    43750 |    150 batches | lr 0.000254 | ms/batch 325.57 | loss  4.19 | ppl    66.333
| epoch 101 step    43800 |    200 batches | lr 0.000254 | ms/batch 324.90 | loss  4.21 | ppl    67.673
| epoch 101 step    43850 |    250 batches | lr 0.000254 | ms/batch 324.69 | loss  4.23 | ppl    68.720
| epoch 101 step    43900 |    300 batches | lr 0.000254 | ms/batch 326.01 | loss  4.25 | ppl    69.958
| epoch 101 step    43950 |    350 batches | lr 0.000254 | ms/batch 324.62 | loss  4.13 | ppl    61.914
| epoch 101 step    44000 |    400 batches | lr 0.000253 | ms/batch 325.09 | loss  4.20 | ppl    66.496
----------------------------------------------------------------------------------------------------
| Eval 110 at step    44000 | time: 135.10s | valid loss  4.25 | valid ppl    70.371
----------------------------------------------------------------------------------------------------
| epoch 102 step    44050 |     14 batches | lr 0.000253 | ms/batch 456.70 | loss  4.25 | ppl    69.996
| epoch 102 step    44100 |     64 batches | lr 0.000253 | ms/batch 324.84 | loss  4.14 | ppl    63.049
| epoch 102 step    44150 |    114 batches | lr 0.000253 | ms/batch 325.60 | loss  4.18 | ppl    65.139
| epoch 102 step    44200 |    164 batches | lr 0.000253 | ms/batch 324.71 | loss  4.21 | ppl    67.194
| epoch 102 step    44250 |    214 batches | lr 0.000252 | ms/batch 325.98 | loss  4.22 | ppl    68.252
| epoch 102 step    44300 |    264 batches | lr 0.000252 | ms/batch 325.65 | loss  4.22 | ppl    67.901
| epoch 102 step    44350 |    314 batches | lr 0.000252 | ms/batch 326.35 | loss  4.22 | ppl    67.919
| epoch 102 step    44400 |    364 batches | lr 0.000252 | ms/batch 325.87 | loss  4.12 | ppl    61.291
----------------------------------------------------------------------------------------------------
| Eval 111 at step    44400 | time: 135.17s | valid loss  4.25 | valid ppl    70.156
----------------------------------------------------------------------------------------------------
| epoch 102 step    44450 |    414 batches | lr 0.000252 | ms/batch 466.56 | loss  4.18 | ppl    65.379
| epoch 103 step    44500 |     28 batches | lr 0.000251 | ms/batch 322.14 | loss  4.26 | ppl    70.691
| epoch 103 step    44550 |     78 batches | lr 0.000251 | ms/batch 325.63 | loss  4.14 | ppl    62.955
| epoch 103 step    44600 |    128 batches | lr 0.000251 | ms/batch 326.15 | loss  4.18 | ppl    65.053
| epoch 103 step    44650 |    178 batches | lr 0.000251 | ms/batch 325.70 | loss  4.21 | ppl    67.652
| epoch 103 step    44700 |    228 batches | lr 0.000251 | ms/batch 325.85 | loss  4.21 | ppl    67.251
| epoch 103 step    44750 |    278 batches | lr 0.000251 | ms/batch 325.01 | loss  4.23 | ppl    68.749
| epoch 103 step    44800 |    328 batches | lr 0.00025 | ms/batch 325.62 | loss  4.13 | ppl    62.373
----------------------------------------------------------------------------------------------------
| Eval 112 at step    44800 | time: 135.47s | valid loss  4.26 | valid ppl    71.156
----------------------------------------------------------------------------------------------------
| epoch 103 step    44850 |    378 batches | lr 0.00025 | ms/batch 431.45 | loss  4.18 | ppl    65.356
| epoch 103 step    44900 |    428 batches | lr 0.00025 | ms/batch 328.87 | loss  4.21 | ppl    67.136
| epoch 104 step    44950 |     42 batches | lr 0.00025 | ms/batch 320.44 | loss  4.21 | ppl    67.499
| epoch 104 step    45000 |     92 batches | lr 0.00025 | ms/batch 326.86 | loss  4.14 | ppl    62.872
| epoch 104 step    45050 |    142 batches | lr 0.000249 | ms/batch 325.68 | loss  4.20 | ppl    66.856
| epoch 104 step    45100 |    192 batches | lr 0.000249 | ms/batch 326.79 | loss  4.19 | ppl    66.235
| epoch 104 step    45150 |    242 batches | lr 0.000249 | ms/batch 324.49 | loss  4.20 | ppl    66.825
| epoch 104 step    45200 |    292 batches | lr 0.000249 | ms/batch 325.39 | loss  4.22 | ppl    68.230
----------------------------------------------------------------------------------------------------
| Eval 113 at step    45200 | time: 135.51s | valid loss  4.26 | valid ppl    70.608
----------------------------------------------------------------------------------------------------
| epoch 104 step    45250 |    342 batches | lr 0.000249 | ms/batch 431.09 | loss  4.10 | ppl    60.515
| epoch 104 step    45300 |    392 batches | lr 0.000248 | ms/batch 328.78 | loss  4.22 | ppl    67.811
| epoch 105 step    45350 |      6 batches | lr 0.000248 | ms/batch 322.73 | loss  4.23 | ppl    68.946
| epoch 105 step    45400 |     56 batches | lr 0.000248 | ms/batch 323.97 | loss  4.15 | ppl    63.300
| epoch 105 step    45450 |    106 batches | lr 0.000248 | ms/batch 326.91 | loss  4.18 | ppl    65.340
| epoch 105 step    45500 |    156 batches | lr 0.000248 | ms/batch 326.05 | loss  4.19 | ppl    66.217
| epoch 105 step    45550 |    206 batches | lr 0.000247 | ms/batch 326.30 | loss  4.18 | ppl    65.409
| epoch 105 step    45600 |    256 batches | lr 0.000247 | ms/batch 324.97 | loss  4.21 | ppl    67.220
----------------------------------------------------------------------------------------------------
| Eval 114 at step    45600 | time: 135.53s | valid loss  4.28 | valid ppl    71.948
----------------------------------------------------------------------------------------------------
| epoch 105 step    45650 |    306 batches | lr 0.000247 | ms/batch 432.68 | loss  4.24 | ppl    69.308
| epoch 105 step    45700 |    356 batches | lr 0.000247 | ms/batch 329.23 | loss  4.13 | ppl    62.446
| epoch 105 step    45750 |    406 batches | lr 0.000247 | ms/batch 329.48 | loss  4.17 | ppl    64.933
| epoch 106 step    45800 |     20 batches | lr 0.000246 | ms/batch 321.91 | loss  4.23 | ppl    68.817
| epoch 106 step    45850 |     70 batches | lr 0.000246 | ms/batch 324.80 | loss  4.15 | ppl    63.479
| epoch 106 step    45900 |    120 batches | lr 0.000246 | ms/batch 324.37 | loss  4.17 | ppl    64.827
| epoch 106 step    45950 |    170 batches | lr 0.000246 | ms/batch 326.47 | loss  4.18 | ppl    65.307
| epoch 106 step    46000 |    220 batches | lr 0.000246 | ms/batch 327.00 | loss  4.21 | ppl    67.086
----------------------------------------------------------------------------------------------------
| Eval 115 at step    46000 | time: 135.84s | valid loss  4.26 | valid ppl    71.013
----------------------------------------------------------------------------------------------------
| epoch 106 step    46050 |    270 batches | lr 0.000245 | ms/batch 433.48 | loss  4.21 | ppl    67.343
| epoch 106 step    46100 |    320 batches | lr 0.000245 | ms/batch 331.14 | loss  4.18 | ppl    65.047
| epoch 106 step    46150 |    370 batches | lr 0.000245 | ms/batch 331.26 | loss  4.14 | ppl    62.958
| epoch 106 step    46200 |    420 batches | lr 0.000245 | ms/batch 331.23 | loss  4.17 | ppl    64.670
| epoch 107 step    46250 |     34 batches | lr 0.000245 | ms/batch 321.80 | loss  4.20 | ppl    66.798
| epoch 107 step    46300 |     84 batches | lr 0.000244 | ms/batch 325.56 | loss  4.14 | ppl    62.666
| epoch 107 step    46350 |    134 batches | lr 0.000244 | ms/batch 326.03 | loss  4.17 | ppl    64.812
| epoch 107 step    46400 |    184 batches | lr 0.000244 | ms/batch 325.42 | loss  4.19 | ppl    65.783
----------------------------------------------------------------------------------------------------
| Eval 116 at step    46400 | time: 136.28s | valid loss  4.25 | valid ppl    69.996
----------------------------------------------------------------------------------------------------
| epoch 107 step    46450 |    234 batches | lr 0.000244 | ms/batch 466.39 | loss  4.20 | ppl    66.639
| epoch 107 step    46500 |    284 batches | lr 0.000243 | ms/batch 329.35 | loss  4.22 | ppl    67.948
| epoch 107 step    46550 |    334 batches | lr 0.000243 | ms/batch 330.06 | loss  4.15 | ppl    63.123
| epoch 107 step    46600 |    384 batches | lr 0.000243 | ms/batch 329.67 | loss  4.17 | ppl    64.612
| epoch 107 step    46650 |    434 batches | lr 0.000243 | ms/batch 329.42 | loss  4.21 | ppl    67.251
| epoch 108 step    46700 |     48 batches | lr 0.000243 | ms/batch 319.49 | loss  4.16 | ppl    64.335
| epoch 108 step    46750 |     98 batches | lr 0.000242 | ms/batch 324.03 | loss  4.12 | ppl    61.759
| epoch 108 step    46800 |    148 batches | lr 0.000242 | ms/batch 324.48 | loss  4.17 | ppl    64.705
----------------------------------------------------------------------------------------------------
| Eval 117 at step    46800 | time: 136.02s | valid loss  4.25 | valid ppl    70.042
----------------------------------------------------------------------------------------------------
| epoch 108 step    46850 |    198 batches | lr 0.000242 | ms/batch 432.11 | loss  4.20 | ppl    66.489
| epoch 108 step    46900 |    248 batches | lr 0.000242 | ms/batch 328.87 | loss  4.20 | ppl    67.005
| epoch 108 step    46950 |    298 batches | lr 0.000242 | ms/batch 329.45 | loss  4.22 | ppl    68.166
| epoch 108 step    47000 |    348 batches | lr 0.000241 | ms/batch 329.52 | loss  4.08 | ppl    59.419
| epoch 108 step    47050 |    398 batches | lr 0.000241 | ms/batch 329.59 | loss  4.18 | ppl    65.629
| epoch 109 step    47100 |     12 batches | lr 0.000241 | ms/batch 321.82 | loss  4.23 | ppl    68.881
| epoch 109 step    47150 |     62 batches | lr 0.000241 | ms/batch 326.73 | loss  4.10 | ppl    60.640
| epoch 109 step    47200 |    112 batches | lr 0.000241 | ms/batch 325.64 | loss  4.13 | ppl    62.210
----------------------------------------------------------------------------------------------------
| Eval 118 at step    47200 | time: 136.19s | valid loss  4.26 | valid ppl    70.472
----------------------------------------------------------------------------------------------------
| epoch 109 step    47250 |    162 batches | lr 0.00024 | ms/batch 432.96 | loss  4.18 | ppl    65.676
| epoch 109 step    47300 |    212 batches | lr 0.00024 | ms/batch 329.69 | loss  4.20 | ppl    66.692
| epoch 109 step    47350 |    262 batches | lr 0.00024 | ms/batch 329.51 | loss  4.19 | ppl    65.791
| epoch 109 step    47400 |    312 batches | lr 0.00024 | ms/batch 329.40 | loss  4.19 | ppl    65.974
| epoch 109 step    47450 |    362 batches | lr 0.00024 | ms/batch 329.22 | loss  4.12 | ppl    61.271
| epoch 109 step    47500 |    412 batches | lr 0.000239 | ms/batch 329.47 | loss  4.16 | ppl    64.057
| epoch 110 step    47550 |     26 batches | lr 0.000239 | ms/batch 321.50 | loss  4.21 | ppl    67.338
| epoch 110 step    47600 |     76 batches | lr 0.000239 | ms/batch 324.61 | loss  4.12 | ppl    61.391
----------------------------------------------------------------------------------------------------
| Eval 119 at step    47600 | time: 136.31s | valid loss  4.26 | valid ppl    70.988
----------------------------------------------------------------------------------------------------
| epoch 110 step    47650 |    126 batches | lr 0.000239 | ms/batch 432.27 | loss  4.14 | ppl    62.985
| epoch 110 step    47700 |    176 batches | lr 0.000239 | ms/batch 330.70 | loss  4.17 | ppl    64.728
| epoch 110 step    47750 |    226 batches | lr 0.000238 | ms/batch 328.93 | loss  4.18 | ppl    65.532
| epoch 110 step    47800 |    276 batches | lr 0.000238 | ms/batch 329.86 | loss  4.20 | ppl    66.838
| epoch 110 step    47850 |    326 batches | lr 0.000238 | ms/batch 329.62 | loss  4.15 | ppl    63.345
| epoch 110 step    47900 |    376 batches | lr 0.000238 | ms/batch 329.70 | loss  4.14 | ppl    62.913
| epoch 110 step    47950 |    426 batches | lr 0.000238 | ms/batch 329.05 | loss  4.18 | ppl    65.604
| epoch 111 step    48000 |     40 batches | lr 0.000237 | ms/batch 321.06 | loss  4.16 | ppl    64.137
----------------------------------------------------------------------------------------------------
| Eval 120 at step    48000 | time: 136.60s | valid loss  4.25 | valid ppl    70.058
----------------------------------------------------------------------------------------------------
| epoch 111 step    48050 |     90 batches | lr 0.000237 | ms/batch 434.14 | loss  4.12 | ppl    61.312
| epoch 111 step    48100 |    140 batches | lr 0.000237 | ms/batch 330.97 | loss  4.13 | ppl    62.049
| epoch 111 step    48150 |    190 batches | lr 0.000237 | ms/batch 328.70 | loss  4.17 | ppl    64.393
| epoch 111 step    48200 |    240 batches | lr 0.000237 | ms/batch 329.54 | loss  4.18 | ppl    65.496
| epoch 111 step    48250 |    290 batches | lr 0.000236 | ms/batch 329.81 | loss  4.21 | ppl    67.515
| epoch 111 step    48300 |    340 batches | lr 0.000236 | ms/batch 328.59 | loss  4.07 | ppl    58.720
| epoch 111 step    48350 |    390 batches | lr 0.000236 | ms/batch 329.75 | loss  4.18 | ppl    65.356
| epoch 112 step    48400 |      4 batches | lr 0.000236 | ms/batch 323.25 | loss  4.19 | ppl    66.242
----------------------------------------------------------------------------------------------------
| Eval 121 at step    48400 | time: 136.70s | valid loss  4.26 | valid ppl    70.539
----------------------------------------------------------------------------------------------------
| epoch 112 step    48450 |     54 batches | lr 0.000236 | ms/batch 432.26 | loss  4.11 | ppl    60.688
| epoch 112 step    48500 |    104 batches | lr 0.000235 | ms/batch 330.98 | loss  4.15 | ppl    63.580
| epoch 112 step    48550 |    154 batches | lr 0.000235 | ms/batch 328.84 | loss  4.17 | ppl    64.748
| epoch 112 step    48600 |    204 batches | lr 0.000235 | ms/batch 330.06 | loss  4.16 | ppl    64.302
| epoch 112 step    48650 |    254 batches | lr 0.000235 | ms/batch 329.42 | loss  4.19 | ppl    66.211
| epoch 112 step    48700 |    304 batches | lr 0.000234 | ms/batch 329.52 | loss  4.21 | ppl    67.236
| epoch 112 step    48750 |    354 batches | lr 0.000234 | ms/batch 330.53 | loss  4.11 | ppl    61.109
| epoch 112 step    48800 |    404 batches | lr 0.000234 | ms/batch 330.88 | loss  4.17 | ppl    64.415
----------------------------------------------------------------------------------------------------
| Eval 122 at step    48800 | time: 137.18s | valid loss  4.25 | valid ppl    70.029
----------------------------------------------------------------------------------------------------
| epoch 113 step    48850 |     18 batches | lr 0.000234 | ms/batch 424.73 | loss  4.23 | ppl    68.460
| epoch 113 step    48900 |     68 batches | lr 0.000234 | ms/batch 329.13 | loss  4.12 | ppl    61.389
| epoch 113 step    48950 |    118 batches | lr 0.000233 | ms/batch 329.69 | loss  4.14 | ppl    62.732
| epoch 113 step    49000 |    168 batches | lr 0.000233 | ms/batch 328.85 | loss  4.14 | ppl    62.695
| epoch 113 step    49050 |    218 batches | lr 0.000233 | ms/batch 330.05 | loss  4.19 | ppl    66.297
| epoch 113 step    49100 |    268 batches | lr 0.000233 | ms/batch 329.60 | loss  4.18 | ppl    65.437
| epoch 113 step    49150 |    318 batches | lr 0.000233 | ms/batch 329.94 | loss  4.14 | ppl    62.778
| epoch 113 step    49200 |    368 batches | lr 0.000232 | ms/batch 329.99 | loss  4.12 | ppl    61.293
----------------------------------------------------------------------------------------------------
| Eval 123 at step    49200 | time: 136.55s | valid loss  4.24 | valid ppl    69.624
----------------------------------------------------------------------------------------------------
| epoch 113 step    49250 |    418 batches | lr 0.000232 | ms/batch 464.67 | loss  4.13 | ppl    62.032
| epoch 114 step    49300 |     32 batches | lr 0.000232 | ms/batch 323.25 | loss  4.20 | ppl    66.718
| epoch 114 step    49350 |     82 batches | lr 0.000232 | ms/batch 331.50 | loss  4.13 | ppl    61.981
| epoch 114 step    49400 |    132 batches | lr 0.000232 | ms/batch 331.29 | loss  4.14 | ppl    62.975
| epoch 114 step    49450 |    182 batches | lr 0.000231 | ms/batch 330.81 | loss  4.15 | ppl    63.372
| epoch 114 step    49500 |    232 batches | lr 0.000231 | ms/batch 331.72 | loss  4.17 | ppl    64.448
| epoch 114 step    49550 |    282 batches | lr 0.000231 | ms/batch 330.37 | loss  4.20 | ppl    66.543
| epoch 114 step    49600 |    332 batches | lr 0.000231 | ms/batch 329.15 | loss  4.09 | ppl    59.705
----------------------------------------------------------------------------------------------------
| Eval 124 at step    49600 | time: 136.96s | valid loss  4.25 | valid ppl    70.338
----------------------------------------------------------------------------------------------------
| epoch 114 step    49650 |    382 batches | lr 0.000231 | ms/batch 427.69 | loss  4.14 | ppl    62.960
| epoch 114 step    49700 |    432 batches | lr 0.00023 | ms/batch 339.34 | loss  4.19 | ppl    65.940
| epoch 115 step    49750 |     46 batches | lr 0.00023 | ms/batch 337.36 | loss  4.12 | ppl    61.670
| epoch 115 step    49800 |     96 batches | lr 0.00023 | ms/batch 343.99 | loss  4.10 | ppl    60.373
| epoch 115 step    49850 |    146 batches | lr 0.00023 | ms/batch 343.28 | loss  4.14 | ppl    62.958
| epoch 115 step    49900 |    196 batches | lr 0.000229 | ms/batch 335.52 | loss  4.16 | ppl    64.239
| epoch 115 step    49950 |    246 batches | lr 0.000229 | ms/batch 329.67 | loss  4.17 | ppl    64.541
| epoch 115 step    50000 |    296 batches | lr 0.000229 | ms/batch 329.22 | loss  4.21 | ppl    67.573
----------------------------------------------------------------------------------------------------
| Eval 125 at step    50000 | time: 139.34s | valid loss  4.24 | valid ppl    69.712
----------------------------------------------------------------------------------------------------
| epoch 115 step    50050 |    346 batches | lr 0.000229 | ms/batch 429.81 | loss  4.07 | ppl    58.338
| epoch 115 step    50100 |    396 batches | lr 0.000229 | ms/batch 325.42 | loss  4.15 | ppl    63.595
| epoch 116 step    50150 |     10 batches | lr 0.000228 | ms/batch 320.78 | loss  4.20 | ppl    67.013
| epoch 116 step    50200 |     60 batches | lr 0.000228 | ms/batch 330.80 | loss  4.09 | ppl    59.882
| epoch 116 step    50250 |    110 batches | lr 0.000228 | ms/batch 329.62 | loss  4.12 | ppl    61.624
| epoch 116 step    50300 |    160 batches | lr 0.000228 | ms/batch 331.59 | loss  4.14 | ppl    62.551
| epoch 116 step    50350 |    210 batches | lr 0.000228 | ms/batch 330.45 | loss  4.18 | ppl    65.088
| epoch 116 step    50400 |    260 batches | lr 0.000227 | ms/batch 330.47 | loss  4.15 | ppl    63.570
----------------------------------------------------------------------------------------------------
| Eval 126 at step    50400 | time: 136.43s | valid loss  4.26 | valid ppl    70.541
----------------------------------------------------------------------------------------------------
| epoch 116 step    50450 |    310 batches | lr 0.000227 | ms/batch 430.10 | loss  4.15 | ppl    63.350
| epoch 116 step    50500 |    360 batches | lr 0.000227 | ms/batch 326.29 | loss  4.07 | ppl    58.397
| epoch 116 step    50550 |    410 batches | lr 0.000227 | ms/batch 327.25 | loss  4.12 | ppl    61.583
| epoch 117 step    50600 |     24 batches | lr 0.000227 | ms/batch 321.15 | loss  4.17 | ppl    64.731
| epoch 117 step    50650 |     74 batches | lr 0.000226 | ms/batch 328.65 | loss  4.08 | ppl    59.173
| epoch 117 step    50700 |    124 batches | lr 0.000226 | ms/batch 331.97 | loss  4.12 | ppl    61.264
| epoch 117 step    50750 |    174 batches | lr 0.000226 | ms/batch 330.29 | loss  4.14 | ppl    62.631
| epoch 117 step    50800 |    224 batches | lr 0.000226 | ms/batch 330.70 | loss  4.13 | ppl    62.482
----------------------------------------------------------------------------------------------------
| Eval 127 at step    50800 | time: 136.32s | valid loss  4.24 | valid ppl    69.410
----------------------------------------------------------------------------------------------------
| epoch 117 step    50850 |    274 batches | lr 0.000226 | ms/batch 462.19 | loss  4.16 | ppl    64.102
| epoch 117 step    50900 |    324 batches | lr 0.000225 | ms/batch 326.07 | loss  4.11 | ppl    61.071
| epoch 117 step    50950 |    374 batches | lr 0.000225 | ms/batch 327.56 | loss  4.13 | ppl    62.015
| epoch 117 step    51000 |    424 batches | lr 0.000225 | ms/batch 326.30 | loss  4.14 | ppl    63.115
| epoch 118 step    51050 |     38 batches | lr 0.000225 | ms/batch 321.84 | loss  4.15 | ppl    63.276
| epoch 118 step    51100 |     88 batches | lr 0.000224 | ms/batch 332.49 | loss  4.09 | ppl    60.007
| epoch 118 step    51150 |    138 batches | lr 0.000224 | ms/batch 331.20 | loss  4.13 | ppl    62.465
| epoch 118 step    51200 |    188 batches | lr 0.000224 | ms/batch 333.59 | loss  4.16 | ppl    64.062
----------------------------------------------------------------------------------------------------
| Eval 128 at step    51200 | time: 136.70s | valid loss  4.25 | valid ppl    69.951
----------------------------------------------------------------------------------------------------
| epoch 118 step    51250 |    238 batches | lr 0.000224 | ms/batch 450.22 | loss  4.14 | ppl    62.719
| epoch 118 step    51300 |    288 batches | lr 0.000224 | ms/batch 341.38 | loss  4.20 | ppl    66.530
| epoch 118 step    51350 |    338 batches | lr 0.000223 | ms/batch 340.81 | loss  4.07 | ppl    58.765
| epoch 118 step    51400 |    388 batches | lr 0.000223 | ms/batch 329.53 | loss  4.15 | ppl    63.276
| epoch 119 step    51450 |      2 batches | lr 0.000223 | ms/batch 320.15 | loss  4.21 | ppl    67.222
| epoch 119 step    51500 |     52 batches | lr 0.000223 | ms/batch 329.27 | loss  4.10 | ppl    60.319
| epoch 119 step    51550 |    102 batches | lr 0.000223 | ms/batch 328.68 | loss  4.06 | ppl    58.122
| epoch 119 step    51600 |    152 batches | lr 0.000222 | ms/batch 329.89 | loss  4.13 | ppl    62.390
----------------------------------------------------------------------------------------------------
| Eval 129 at step    51600 | time: 138.25s | valid loss  4.24 | valid ppl    69.305
----------------------------------------------------------------------------------------------------
| epoch 119 step    51650 |    202 batches | lr 0.000222 | ms/batch 460.04 | loss  4.15 | ppl    63.446
| epoch 119 step    51700 |    252 batches | lr 0.000222 | ms/batch 324.21 | loss  4.15 | ppl    63.740
| epoch 119 step    51750 |    302 batches | lr 0.000222 | ms/batch 324.50 | loss  4.17 | ppl    64.852
| epoch 119 step    51800 |    352 batches | lr 0.000221 | ms/batch 325.72 | loss  4.05 | ppl    57.404
| epoch 119 step    51850 |    402 batches | lr 0.000221 | ms/batch 325.40 | loss  4.15 | ppl    63.556
| epoch 120 step    51900 |     16 batches | lr 0.000221 | ms/batch 319.84 | loss  4.15 | ppl    63.665
| epoch 120 step    51950 |     66 batches | lr 0.000221 | ms/batch 328.74 | loss  4.07 | ppl    58.616
| epoch 120 step    52000 |    116 batches | lr 0.000221 | ms/batch 330.11 | loss  4.12 | ppl    61.653
----------------------------------------------------------------------------------------------------
| Eval 130 at step    52000 | time: 135.39s | valid loss  4.24 | valid ppl    69.403
----------------------------------------------------------------------------------------------------
| epoch 120 step    52050 |    166 batches | lr 0.00022 | ms/batch 432.79 | loss  4.12 | ppl    61.709
| epoch 120 step    52100 |    216 batches | lr 0.00022 | ms/batch 327.76 | loss  4.15 | ppl    63.528
| epoch 120 step    52150 |    266 batches | lr 0.00022 | ms/batch 327.01 | loss  4.11 | ppl    61.161
| epoch 120 step    52200 |    316 batches | lr 0.00022 | ms/batch 326.17 | loss  4.13 | ppl    62.338
| epoch 120 step    52250 |    366 batches | lr 0.00022 | ms/batch 326.56 | loss  4.06 | ppl    58.176
| epoch 120 step    52300 |    416 batches | lr 0.000219 | ms/batch 326.64 | loss  4.13 | ppl    62.098
| epoch 121 step    52350 |     30 batches | lr 0.000219 | ms/batch 322.20 | loss  4.17 | ppl    64.549
| epoch 121 step    52400 |     80 batches | lr 0.000219 | ms/batch 340.64 | loss  4.09 | ppl    59.612
----------------------------------------------------------------------------------------------------
| Eval 131 at step    52400 | time: 136.61s | valid loss  4.25 | valid ppl    69.837
----------------------------------------------------------------------------------------------------
| epoch 121 step    52450 |    130 batches | lr 0.000219 | ms/batch 447.24 | loss  4.13 | ppl    62.302
| epoch 121 step    52500 |    180 batches | lr 0.000219 | ms/batch 340.76 | loss  4.13 | ppl    62.040
| epoch 121 step    52550 |    230 batches | lr 0.000218 | ms/batch 340.53 | loss  4.15 | ppl    63.662
| epoch 121 step    52600 |    280 batches | lr 0.000218 | ms/batch 339.18 | loss  4.15 | ppl    63.258
| epoch 121 step    52650 |    330 batches | lr 0.000218 | ms/batch 341.59 | loss  4.09 | ppl    59.871
| epoch 121 step    52700 |    380 batches | lr 0.000218 | ms/batch 339.54 | loss  4.11 | ppl    60.932
| epoch 121 step    52750 |    430 batches | lr 0.000217 | ms/batch 339.55 | loss  4.17 | ppl    64.400
| epoch 122 step    52800 |     44 batches | lr 0.000217 | ms/batch 336.53 | loss  4.11 | ppl    60.921
----------------------------------------------------------------------------------------------------
| Eval 132 at step    52800 | time: 141.25s | valid loss  4.24 | valid ppl    69.477
----------------------------------------------------------------------------------------------------
| epoch 122 step    52850 |     94 batches | lr 0.000217 | ms/batch 432.83 | loss  4.08 | ppl    58.894
| epoch 122 step    52900 |    144 batches | lr 0.000217 | ms/batch 326.29 | loss  4.11 | ppl    60.804
| epoch 122 step    52950 |    194 batches | lr 0.000217 | ms/batch 324.96 | loss  4.13 | ppl    62.433
| epoch 122 step    53000 |    244 batches | lr 0.000216 | ms/batch 324.55 | loss  4.14 | ppl    63.100
| epoch 122 step    53050 |    294 batches | lr 0.000216 | ms/batch 325.07 | loss  4.17 | ppl    64.918
| epoch 122 step    53100 |    344 batches | lr 0.000216 | ms/batch 326.07 | loss  4.03 | ppl    56.015
| epoch 122 step    53150 |    394 batches | lr 0.000216 | ms/batch 325.20 | loss  4.14 | ppl    62.683
| epoch 123 step    53200 |      8 batches | lr 0.000216 | ms/batch 319.51 | loss  4.16 | ppl    63.812
----------------------------------------------------------------------------------------------------
| Eval 133 at step    53200 | time: 135.05s | valid loss  4.23 | valid ppl    68.771
----------------------------------------------------------------------------------------------------
| epoch 123 step    53250 |     58 batches | lr 0.000215 | ms/batch 475.47 | loss  4.07 | ppl    58.310
| epoch 123 step    53300 |    108 batches | lr 0.000215 | ms/batch 324.90 | loss  4.08 | ppl    59.319
| epoch 123 step    53350 |    158 batches | lr 0.000215 | ms/batch 324.67 | loss  4.11 | ppl    60.871
| epoch 123 step    53400 |    208 batches | lr 0.000215 | ms/batch 325.11 | loss  4.13 | ppl    62.229
| epoch 123 step    53450 |    258 batches | lr 0.000214 | ms/batch 324.24 | loss  4.13 | ppl    62.482
| epoch 123 step    53500 |    308 batches | lr 0.000214 | ms/batch 325.21 | loss  4.15 | ppl    63.123
| epoch 123 step    53550 |    358 batches | lr 0.000214 | ms/batch 325.58 | loss  4.07 | ppl    58.484
| epoch 123 step    53600 |    408 batches | lr 0.000214 | ms/batch 327.15 | loss  4.11 | ppl    61.004
----------------------------------------------------------------------------------------------------
| Eval 134 at step    53600 | time: 135.18s | valid loss  4.24 | valid ppl    69.495
----------------------------------------------------------------------------------------------------
| epoch 124 step    53650 |     22 batches | lr 0.000214 | ms/batch 426.37 | loss  4.18 | ppl    65.379
| epoch 124 step    53700 |     72 batches | lr 0.000213 | ms/batch 326.33 | loss  4.06 | ppl    57.834
| epoch 124 step    53750 |    122 batches | lr 0.000213 | ms/batch 327.56 | loss  4.10 | ppl    60.281
| epoch 124 step    53800 |    172 batches | lr 0.000213 | ms/batch 326.44 | loss  4.10 | ppl    60.288
| epoch 124 step    53850 |    222 batches | lr 0.000213 | ms/batch 326.55 | loss  4.12 | ppl    61.815
| epoch 124 step    53900 |    272 batches | lr 0.000213 | ms/batch 326.06 | loss  4.13 | ppl    62.402
| epoch 124 step    53950 |    322 batches | lr 0.000212 | ms/batch 325.34 | loss  4.11 | ppl    61.166
| epoch 124 step    54000 |    372 batches | lr 0.000212 | ms/batch 325.83 | loss  4.07 | ppl    58.782
----------------------------------------------------------------------------------------------------
| Eval 135 at step    54000 | time: 135.52s | valid loss  4.23 | valid ppl    68.617
----------------------------------------------------------------------------------------------------
| epoch 124 step    54050 |    422 batches | lr 0.000212 | ms/batch 464.77 | loss  4.11 | ppl    60.785
| epoch 125 step    54100 |     36 batches | lr 0.000212 | ms/batch 320.72 | loss  4.13 | ppl    62.190
| epoch 125 step    54150 |     86 batches | lr 0.000211 | ms/batch 325.97 | loss  4.06 | ppl    58.097
| epoch 125 step    54200 |    136 batches | lr 0.000211 | ms/batch 325.65 | loss  4.13 | ppl    62.035
| epoch 125 step    54250 |    186 batches | lr 0.000211 | ms/batch 325.87 | loss  4.10 | ppl    60.506
| epoch 125 step    54300 |    236 batches | lr 0.000211 | ms/batch 325.96 | loss  4.13 | ppl    62.292
| epoch 125 step    54350 |    286 batches | lr 0.000211 | ms/batch 326.05 | loss  4.15 | ppl    63.692
| epoch 125 step    54400 |    336 batches | lr 0.00021 | ms/batch 325.70 | loss  4.05 | ppl    57.427
----------------------------------------------------------------------------------------------------
| Eval 136 at step    54400 | time: 135.38s | valid loss  4.24 | valid ppl    69.252
----------------------------------------------------------------------------------------------------
| epoch 125 step    54450 |    386 batches | lr 0.00021 | ms/batch 432.95 | loss  4.11 | ppl    61.214
| epoch 125 step    54500 |    436 batches | lr 0.00021 | ms/batch 326.97 | loss  4.13 | ppl    62.224
| epoch 126 step    54550 |     50 batches | lr 0.00021 | ms/batch 324.51 | loss  4.08 | ppl    59.363
| epoch 126 step    54600 |    100 batches | lr 0.00021 | ms/batch 327.57 | loss  4.08 | ppl    59.000
| epoch 126 step    54650 |    150 batches | lr 0.000209 | ms/batch 326.59 | loss  4.09 | ppl    59.700
| epoch 126 step    54700 |    200 batches | lr 0.000209 | ms/batch 324.73 | loss  4.13 | ppl    62.268
| epoch 126 step    54750 |    250 batches | lr 0.000209 | ms/batch 325.54 | loss  4.15 | ppl    63.466
| epoch 126 step    54800 |    300 batches | lr 0.000209 | ms/batch 324.54 | loss  4.15 | ppl    63.303
----------------------------------------------------------------------------------------------------
| Eval 137 at step    54800 | time: 135.69s | valid loss  4.24 | valid ppl    69.401
----------------------------------------------------------------------------------------------------
| epoch 126 step    54850 |    350 batches | lr 0.000208 | ms/batch 432.95 | loss  4.03 | ppl    56.378
| epoch 126 step    54900 |    400 batches | lr 0.000208 | ms/batch 329.23 | loss  4.11 | ppl    61.126
| epoch 127 step    54950 |     14 batches | lr 0.000208 | ms/batch 322.64 | loss  4.14 | ppl    62.673
| epoch 127 step    55000 |     64 batches | lr 0.000208 | ms/batch 327.27 | loss  4.05 | ppl    57.604
| epoch 127 step    55050 |    114 batches | lr 0.000208 | ms/batch 325.99 | loss  4.10 | ppl    60.077
| epoch 127 step    55100 |    164 batches | lr 0.000207 | ms/batch 326.47 | loss  4.08 | ppl    59.377
| epoch 127 step    55150 |    214 batches | lr 0.000207 | ms/batch 327.55 | loss  4.11 | ppl    60.650
| epoch 127 step    55200 |    264 batches | lr 0.000207 | ms/batch 326.71 | loss  4.12 | ppl    61.480
----------------------------------------------------------------------------------------------------
| Eval 138 at step    55200 | time: 135.95s | valid loss  4.25 | valid ppl    69.907
----------------------------------------------------------------------------------------------------
| epoch 127 step    55250 |    314 batches | lr 0.000207 | ms/batch 432.61 | loss  4.09 | ppl    59.698
| epoch 127 step    55300 |    364 batches | lr 0.000206 | ms/batch 329.14 | loss  4.03 | ppl    56.435
| epoch 127 step    55350 |    414 batches | lr 0.000206 | ms/batch 329.63 | loss  4.12 | ppl    61.276
| epoch 128 step    55400 |     28 batches | lr 0.000206 | ms/batch 322.54 | loss  4.14 | ppl    62.987
| epoch 128 step    55450 |     78 batches | lr 0.000206 | ms/batch 325.88 | loss  4.05 | ppl    57.566
| epoch 128 step    55500 |    128 batches | lr 0.000206 | ms/batch 325.83 | loss  4.07 | ppl    58.669
| epoch 128 step    55550 |    178 batches | lr 0.000205 | ms/batch 327.40 | loss  4.09 | ppl    59.777
| epoch 128 step    55600 |    228 batches | lr 0.000205 | ms/batch 326.88 | loss  4.12 | ppl    61.754
----------------------------------------------------------------------------------------------------
| Eval 139 at step    55600 | time: 136.00s | valid loss  4.24 | valid ppl    69.611
----------------------------------------------------------------------------------------------------
| epoch 128 step    55650 |    278 batches | lr 0.000205 | ms/batch 432.97 | loss  4.13 | ppl    62.436
| epoch 128 step    55700 |    328 batches | lr 0.000205 | ms/batch 330.46 | loss  4.06 | ppl    58.260
| epoch 128 step    55750 |    378 batches | lr 0.000205 | ms/batch 329.84 | loss  4.06 | ppl    58.024
| epoch 128 step    55800 |    428 batches | lr 0.000204 | ms/batch 330.07 | loss  4.12 | ppl    61.569
| epoch 129 step    55850 |     42 batches | lr 0.000204 | ms/batch 321.95 | loss  4.09 | ppl    59.570
| epoch 129 step    55900 |     92 batches | lr 0.000204 | ms/batch 327.15 | loss  4.08 | ppl    58.972
| epoch 129 step    55950 |    142 batches | lr 0.000204 | ms/batch 327.91 | loss  4.08 | ppl    59.405
| epoch 129 step    56000 |    192 batches | lr 0.000203 | ms/batch 326.42 | loss  4.12 | ppl    61.660
----------------------------------------------------------------------------------------------------
| Eval 140 at step    56000 | time: 136.36s | valid loss  4.23 | valid ppl    68.840
----------------------------------------------------------------------------------------------------
| epoch 129 step    56050 |    242 batches | lr 0.000203 | ms/batch 435.22 | loss  4.12 | ppl    61.420
| epoch 129 step    56100 |    292 batches | lr 0.000203 | ms/batch 332.20 | loss  4.15 | ppl    63.172
| epoch 129 step    56150 |    342 batches | lr 0.000203 | ms/batch 330.21 | loss  4.00 | ppl    54.392
| epoch 129 step    56200 |    392 batches | lr 0.000203 | ms/batch 332.30 | loss  4.08 | ppl    59.333
| epoch 130 step    56250 |      6 batches | lr 0.000202 | ms/batch 323.98 | loss  4.14 | ppl    63.026
| epoch 130 step    56300 |     56 batches | lr 0.000202 | ms/batch 324.57 | loss  4.05 | ppl    57.382
| epoch 130 step    56350 |    106 batches | lr 0.000202 | ms/batch 326.23 | loss  4.04 | ppl    56.587
| epoch 130 step    56400 |    156 batches | lr 0.000202 | ms/batch 326.41 | loss  4.09 | ppl    59.780
----------------------------------------------------------------------------------------------------
| Eval 141 at step    56400 | time: 136.55s | valid loss  4.22 | valid ppl    68.364
----------------------------------------------------------------------------------------------------
| epoch 130 step    56450 |    206 batches | lr 0.000202 | ms/batch 467.58 | loss  4.08 | ppl    59.072
| epoch 130 step    56500 |    256 batches | lr 0.000201 | ms/batch 331.27 | loss  4.11 | ppl    60.840
| epoch 130 step    56550 |    306 batches | lr 0.000201 | ms/batch 330.85 | loss  4.14 | ppl    62.715
| epoch 130 step    56600 |    356 batches | lr 0.000201 | ms/batch 330.83 | loss  4.03 | ppl    56.219
| epoch 130 step    56650 |    406 batches | lr 0.000201 | ms/batch 330.49 | loss  4.07 | ppl    58.704
| epoch 131 step    56700 |     20 batches | lr 0.0002 | ms/batch 321.79 | loss  4.13 | ppl    62.091
| epoch 131 step    56750 |     70 batches | lr 0.0002 | ms/batch 329.41 | loss  4.01 | ppl    55.026
| epoch 131 step    56800 |    120 batches | lr 0.0002 | ms/batch 327.48 | loss  4.09 | ppl    59.672
----------------------------------------------------------------------------------------------------
| Eval 142 at step    56800 | time: 136.85s | valid loss  4.23 | valid ppl    69.027
----------------------------------------------------------------------------------------------------
| epoch 131 step    56850 |    170 batches | lr 0.0002 | ms/batch 435.53 | loss  4.08 | ppl    58.970
| epoch 131 step    56900 |    220 batches | lr 0.0002 | ms/batch 330.64 | loss  4.11 | ppl    60.647
| epoch 131 step    56950 |    270 batches | lr 0.000199 | ms/batch 330.24 | loss  4.10 | ppl    60.413
| epoch 131 step    57000 |    320 batches | lr 0.000199 | ms/batch 330.20 | loss  4.11 | ppl    60.730
| epoch 131 step    57050 |    370 batches | lr 0.000199 | ms/batch 331.85 | loss  4.04 | ppl    57.076
| epoch 131 step    57100 |    420 batches | lr 0.000199 | ms/batch 329.42 | loss  4.08 | ppl    59.291
| epoch 132 step    57150 |     34 batches | lr 0.000198 | ms/batch 319.80 | loss  4.10 | ppl    60.343
| epoch 132 step    57200 |     84 batches | lr 0.000198 | ms/batch 324.41 | loss  4.02 | ppl    55.836
----------------------------------------------------------------------------------------------------
| Eval 143 at step    57200 | time: 136.55s | valid loss  4.24 | valid ppl    69.437
----------------------------------------------------------------------------------------------------
| epoch 132 step    57250 |    134 batches | lr 0.000198 | ms/batch 431.73 | loss  4.10 | ppl    60.277
| epoch 132 step    57300 |    184 batches | lr 0.000198 | ms/batch 329.54 | loss  4.09 | ppl    59.808
| epoch 132 step    57350 |    234 batches | lr 0.000198 | ms/batch 328.31 | loss  4.11 | ppl    60.657
| epoch 132 step    57400 |    284 batches | lr 0.000197 | ms/batch 328.95 | loss  4.11 | ppl    60.859
| epoch 132 step    57450 |    334 batches | lr 0.000197 | ms/batch 329.89 | loss  4.02 | ppl    55.810
| epoch 132 step    57500 |    384 batches | lr 0.000197 | ms/batch 329.76 | loss  4.08 | ppl    59.351
| epoch 132 step    57550 |    434 batches | lr 0.000197 | ms/batch 328.87 | loss  4.12 | ppl    61.750
| epoch 133 step    57600 |     48 batches | lr 0.000196 | ms/batch 318.76 | loss  4.04 | ppl    56.563
----------------------------------------------------------------------------------------------------
| Eval 144 at step    57600 | time: 136.28s | valid loss  4.23 | valid ppl    68.709
----------------------------------------------------------------------------------------------------
| epoch 133 step    57650 |     98 batches | lr 0.000196 | ms/batch 433.54 | loss  4.04 | ppl    56.709
| epoch 133 step    57700 |    148 batches | lr 0.000196 | ms/batch 328.38 | loss  4.06 | ppl    57.742
| epoch 133 step    57750 |    198 batches | lr 0.000196 | ms/batch 329.94 | loss  4.10 | ppl    60.204
| epoch 133 step    57800 |    248 batches | lr 0.000196 | ms/batch 328.62 | loss  4.11 | ppl    60.795
| epoch 133 step    57850 |    298 batches | lr 0.000195 | ms/batch 330.89 | loss  4.15 | ppl    63.160
| epoch 133 step    57900 |    348 batches | lr 0.000195 | ms/batch 332.44 | loss  4.01 | ppl    54.921
| epoch 133 step    57950 |    398 batches | lr 0.000195 | ms/batch 330.07 | loss  4.08 | ppl    59.354
| epoch 134 step    58000 |     12 batches | lr 0.000195 | ms/batch 323.32 | loss  4.13 | ppl    61.899
----------------------------------------------------------------------------------------------------
| Eval 145 at step    58000 | time: 136.87s | valid loss  4.22 | valid ppl    68.234
----------------------------------------------------------------------------------------------------
| epoch 134 step    58050 |     62 batches | lr 0.000195 | ms/batch 464.69 | loss  4.04 | ppl    56.638
| epoch 134 step    58100 |    112 batches | lr 0.000194 | ms/batch 329.16 | loss  4.05 | ppl    57.129
| epoch 134 step    58150 |    162 batches | lr 0.000194 | ms/batch 328.60 | loss  4.07 | ppl    58.345
| epoch 134 step    58200 |    212 batches | lr 0.000194 | ms/batch 329.42 | loss  4.07 | ppl    58.610
| epoch 134 step    58250 |    262 batches | lr 0.000194 | ms/batch 331.12 | loss  4.10 | ppl    60.326
| epoch 134 step    58300 |    312 batches | lr 0.000193 | ms/batch 330.73 | loss  4.08 | ppl    59.250
| epoch 134 step    58350 |    362 batches | lr 0.000193 | ms/batch 330.58 | loss  4.02 | ppl    55.627
| epoch 134 step    58400 |    412 batches | lr 0.000193 | ms/batch 330.08 | loss  4.06 | ppl    57.961
----------------------------------------------------------------------------------------------------
| Eval 146 at step    58400 | time: 137.11s | valid loss  4.23 | valid ppl    68.549
----------------------------------------------------------------------------------------------------
| epoch 135 step    58450 |     26 batches | lr 0.000193 | ms/batch 427.13 | loss  4.10 | ppl    60.105
| epoch 135 step    58500 |     76 batches | lr 0.000193 | ms/batch 331.30 | loss  4.04 | ppl    57.102
| epoch 135 step    58550 |    126 batches | lr 0.000192 | ms/batch 332.41 | loss  4.06 | ppl    58.151
| epoch 135 step    58600 |    176 batches | lr 0.000192 | ms/batch 330.58 | loss  4.06 | ppl    57.728
| epoch 135 step    58650 |    226 batches | lr 0.000192 | ms/batch 330.17 | loss  4.11 | ppl    60.700
| epoch 135 step    58700 |    276 batches | lr 0.000192 | ms/batch 331.45 | loss  4.10 | ppl    60.579
| epoch 135 step    58750 |    326 batches | lr 0.000191 | ms/batch 331.32 | loss  4.05 | ppl    57.324
| epoch 135 step    58800 |    376 batches | lr 0.000191 | ms/batch 329.81 | loss  4.06 | ppl    57.798
----------------------------------------------------------------------------------------------------
| Eval 147 at step    58800 | time: 137.20s | valid loss  4.23 | valid ppl    68.682
----------------------------------------------------------------------------------------------------
| epoch 135 step    58850 |    426 batches | lr 0.000191 | ms/batch 430.57 | loss  4.06 | ppl    58.020
| epoch 136 step    58900 |     40 batches | lr 0.000191 | ms/batch 323.34 | loss  4.07 | ppl    58.411
| epoch 136 step    58950 |     90 batches | lr 0.000191 | ms/batch 332.60 | loss  4.00 | ppl    54.556
| epoch 136 step    59000 |    140 batches | lr 0.00019 | ms/batch 332.68 | loss  4.09 | ppl    59.467
| epoch 136 step    59050 |    190 batches | lr 0.00019 | ms/batch 330.01 | loss  4.09 | ppl    60.016
| epoch 136 step    59100 |    240 batches | lr 0.00019 | ms/batch 331.92 | loss  4.09 | ppl    59.672
| epoch 136 step    59150 |    290 batches | lr 0.00019 | ms/batch 329.76 | loss  4.13 | ppl    62.355
| epoch 136 step    59200 |    340 batches | lr 0.000189 | ms/batch 329.20 | loss  3.99 | ppl    54.237
----------------------------------------------------------------------------------------------------
| Eval 148 at step    59200 | time: 136.99s | valid loss  4.23 | valid ppl    68.856
----------------------------------------------------------------------------------------------------
| epoch 136 step    59250 |    390 batches | lr 0.000189 | ms/batch 429.76 | loss  4.07 | ppl    58.775
| epoch 137 step    59300 |      4 batches | lr 0.000189 | ms/batch 319.17 | loss  4.11 | ppl    60.650
| epoch 137 step    59350 |     54 batches | lr 0.000189 | ms/batch 329.13 | loss  4.02 | ppl    55.849
| epoch 137 step    59400 |    104 batches | lr 0.000189 | ms/batch 329.32 | loss  4.02 | ppl    55.705
| epoch 137 step    59450 |    154 batches | lr 0.000188 | ms/batch 329.47 | loss  4.06 | ppl    57.990
| epoch 137 step    59500 |    204 batches | lr 0.000188 | ms/batch 329.36 | loss  4.07 | ppl    58.699
| epoch 137 step    59550 |    254 batches | lr 0.000188 | ms/batch 330.42 | loss  4.09 | ppl    59.630
| epoch 137 step    59600 |    304 batches | lr 0.000188 | ms/batch 329.79 | loss  4.10 | ppl    60.472
----------------------------------------------------------------------------------------------------
| Eval 149 at step    59600 | time: 136.32s | valid loss  4.23 | valid ppl    68.713
----------------------------------------------------------------------------------------------------
| epoch 137 step    59650 |    354 batches | lr 0.000188 | ms/batch 428.50 | loss  3.99 | ppl    53.863
| epoch 137 step    59700 |    404 batches | lr 0.000187 | ms/batch 324.18 | loss  4.05 | ppl    57.663
| epoch 138 step    59750 |     18 batches | lr 0.000187 | ms/batch 321.41 | loss  4.10 | ppl    60.548
| epoch 138 step    59800 |     68 batches | lr 0.000187 | ms/batch 330.61 | loss  4.00 | ppl    54.343
| epoch 138 step    59850 |    118 batches | lr 0.000187 | ms/batch 329.94 | loss  4.07 | ppl    58.338
| epoch 138 step    59900 |    168 batches | lr 0.000186 | ms/batch 332.06 | loss  4.06 | ppl    57.780
| epoch 138 step    59950 |    218 batches | lr 0.000186 | ms/batch 330.81 | loss  4.08 | ppl    59.340
| epoch 138 step    60000 |    268 batches | lr 0.000186 | ms/batch 330.49 | loss  4.09 | ppl    59.733
----------------------------------------------------------------------------------------------------
| Eval 150 at step    60000 | time: 136.40s | valid loss  4.23 | valid ppl    68.845
----------------------------------------------------------------------------------------------------
| epoch 138 step    60050 |    318 batches | lr 0.000186 | ms/batch 429.29 | loss  4.05 | ppl    57.196
| epoch 138 step    60100 |    368 batches | lr 0.000186 | ms/batch 325.70 | loss  4.02 | ppl    55.471
| epoch 138 step    60150 |    418 batches | lr 0.000185 | ms/batch 325.82 | loss  4.07 | ppl    58.720
| epoch 139 step    60200 |     32 batches | lr 0.000185 | ms/batch 323.47 | loss  4.09 | ppl    59.777
| epoch 139 step    60250 |     82 batches | lr 0.000185 | ms/batch 330.70 | loss  4.01 | ppl    55.263
| epoch 139 step    60300 |    132 batches | lr 0.000185 | ms/batch 330.09 | loss  4.05 | ppl    57.388
| epoch 139 step    60350 |    182 batches | lr 0.000184 | ms/batch 329.29 | loss  4.07 | ppl    58.267
| epoch 139 step    60400 |    232 batches | lr 0.000184 | ms/batch 330.65 | loss  4.06 | ppl    57.936
----------------------------------------------------------------------------------------------------
| Eval 151 at step    60400 | time: 136.24s | valid loss  4.23 | valid ppl    68.565
----------------------------------------------------------------------------------------------------
| epoch 139 step    60450 |    282 batches | lr 0.000184 | ms/batch 429.65 | loss  4.10 | ppl    60.070
| epoch 139 step    60500 |    332 batches | lr 0.000184 | ms/batch 334.56 | loss  4.00 | ppl    54.872
| epoch 139 step    60550 |    382 batches | lr 0.000184 | ms/batch 340.13 | loss  4.04 | ppl    56.762
| epoch 139 step    60600 |    432 batches | lr 0.000183 | ms/batch 339.84 | loss  4.08 | ppl    59.312
| epoch 140 step    60650 |     46 batches | lr 0.000183 | ms/batch 333.75 | loss  4.05 | ppl    57.669
| epoch 140 step    60700 |     96 batches | lr 0.000183 | ms/batch 329.20 | loss  4.01 | ppl    55.041
| epoch 140 step    60750 |    146 batches | lr 0.000183 | ms/batch 328.29 | loss  4.05 | ppl    57.415
| epoch 140 step    60800 |    196 batches | lr 0.000182 | ms/batch 330.00 | loss  4.06 | ppl    57.762
----------------------------------------------------------------------------------------------------
| Eval 152 at step    60800 | time: 138.26s | valid loss  4.22 | valid ppl    68.219
----------------------------------------------------------------------------------------------------
| epoch 140 step    60850 |    246 batches | lr 0.000182 | ms/batch 460.93 | loss  4.07 | ppl    58.450
| epoch 140 step    60900 |    296 batches | lr 0.000182 | ms/batch 324.00 | loss  4.11 | ppl    60.709
| epoch 140 step    60950 |    346 batches | lr 0.000182 | ms/batch 324.84 | loss  3.96 | ppl    52.494
| epoch 140 step    61000 |    396 batches | lr 0.000182 | ms/batch 324.58 | loss  4.03 | ppl    56.433
| epoch 141 step    61050 |     10 batches | lr 0.000181 | ms/batch 319.75 | loss  4.09 | ppl    60.004
| epoch 141 step    61100 |     60 batches | lr 0.000181 | ms/batch 329.34 | loss  3.98 | ppl    53.785
| epoch 141 step    61150 |    110 batches | lr 0.000181 | ms/batch 328.40 | loss  4.01 | ppl    54.994
| epoch 141 step    61200 |    160 batches | lr 0.000181 | ms/batch 329.35 | loss  4.05 | ppl    57.523
----------------------------------------------------------------------------------------------------
| Eval 153 at step    61200 | time: 135.41s | valid loss  4.22 | valid ppl    67.823
----------------------------------------------------------------------------------------------------
| epoch 141 step    61250 |    210 batches | lr 0.00018 | ms/batch 461.73 | loss  4.06 | ppl    58.233
| epoch 141 step    61300 |    260 batches | lr 0.00018 | ms/batch 324.49 | loss  4.07 | ppl    58.507
| epoch 141 step    61350 |    310 batches | lr 0.00018 | ms/batch 324.08 | loss  4.08 | ppl    59.099
| epoch 141 step    61400 |    360 batches | lr 0.00018 | ms/batch 324.00 | loss  4.01 | ppl    54.925
| epoch 141 step    61450 |    410 batches | lr 0.00018 | ms/batch 325.17 | loss  4.05 | ppl    57.227
| epoch 142 step    61500 |     24 batches | lr 0.000179 | ms/batch 321.35 | loss  4.07 | ppl    58.600
| epoch 142 step    61550 |     74 batches | lr 0.000179 | ms/batch 328.40 | loss  4.02 | ppl    55.766
| epoch 142 step    61600 |    124 batches | lr 0.000179 | ms/batch 330.61 | loss  4.04 | ppl    56.986
----------------------------------------------------------------------------------------------------
| Eval 154 at step    61600 | time: 135.33s | valid loss  4.22 | valid ppl    68.097
----------------------------------------------------------------------------------------------------
| epoch 142 step    61650 |    174 batches | lr 0.000179 | ms/batch 428.54 | loss  4.04 | ppl    56.552
| epoch 142 step    61700 |    224 batches | lr 0.000179 | ms/batch 324.87 | loss  4.07 | ppl    58.383
| epoch 142 step    61750 |    274 batches | lr 0.000178 | ms/batch 325.28 | loss  4.07 | ppl    58.279
| epoch 142 step    61800 |    324 batches | lr 0.000178 | ms/batch 324.49 | loss  4.03 | ppl    56.298
| epoch 142 step    61850 |    374 batches | lr 0.000178 | ms/batch 324.86 | loss  4.02 | ppl    55.725
| epoch 142 step    61900 |    424 batches | lr 0.000178 | ms/batch 324.20 | loss  4.05 | ppl    57.180
| epoch 143 step    61950 |     38 batches | lr 0.000177 | ms/batch 321.59 | loss  4.05 | ppl    57.627
| epoch 143 step    62000 |     88 batches | lr 0.000177 | ms/batch 329.07 | loss  3.98 | ppl    53.699
----------------------------------------------------------------------------------------------------
| Eval 155 at step    62000 | time: 135.13s | valid loss  4.24 | valid ppl    69.151
----------------------------------------------------------------------------------------------------
| epoch 143 step    62050 |    138 batches | lr 0.000177 | ms/batch 428.74 | loss  4.04 | ppl    56.720
| epoch 143 step    62100 |    188 batches | lr 0.000177 | ms/batch 327.02 | loss  4.06 | ppl    57.954
| epoch 143 step    62150 |    238 batches | lr 0.000177 | ms/batch 325.24 | loss  4.05 | ppl    57.514
| epoch 143 step    62200 |    288 batches | lr 0.000176 | ms/batch 323.50 | loss  4.10 | ppl    60.371
| epoch 143 step    62250 |    338 batches | lr 0.000176 | ms/batch 324.14 | loss  3.96 | ppl    52.603
| epoch 143 step    62300 |    388 batches | lr 0.000176 | ms/batch 325.84 | loss  4.04 | ppl    56.631
| epoch 144 step    62350 |      2 batches | lr 0.000176 | ms/batch 317.79 | loss  4.07 | ppl    58.281
| epoch 144 step    62400 |     52 batches | lr 0.000175 | ms/batch 329.62 | loss  4.00 | ppl    54.799
----------------------------------------------------------------------------------------------------
| Eval 156 at step    62400 | time: 135.11s | valid loss  4.23 | valid ppl    68.931
----------------------------------------------------------------------------------------------------
| epoch 144 step    62450 |    102 batches | lr 0.000175 | ms/batch 428.12 | loss  4.01 | ppl    54.994
| epoch 144 step    62500 |    152 batches | lr 0.000175 | ms/batch 326.38 | loss  4.05 | ppl    57.326
| epoch 144 step    62550 |    202 batches | lr 0.000175 | ms/batch 324.98 | loss  4.04 | ppl    56.915
| epoch 144 step    62600 |    252 batches | lr 0.000175 | ms/batch 325.31 | loss  4.04 | ppl    57.031
| epoch 144 step    62650 |    302 batches | lr 0.000174 | ms/batch 325.97 | loss  4.08 | ppl    59.382
| epoch 144 step    62700 |    352 batches | lr 0.000174 | ms/batch 325.41 | loss  3.96 | ppl    52.459
| epoch 144 step    62750 |    402 batches | lr 0.000174 | ms/batch 324.37 | loss  4.04 | ppl    56.647
| epoch 145 step    62800 |     16 batches | lr 0.000174 | ms/batch 320.16 | loss  4.06 | ppl    57.807
----------------------------------------------------------------------------------------------------
| Eval 157 at step    62800 | time: 135.02s | valid loss  4.22 | valid ppl    68.231
----------------------------------------------------------------------------------------------------
| epoch 145 step    62850 |     66 batches | lr 0.000173 | ms/batch 428.01 | loss  4.00 | ppl    54.718
| epoch 145 step    62900 |    116 batches | lr 0.000173 | ms/batch 326.38 | loss  4.03 | ppl    56.463
| epoch 145 step    62950 |    166 batches | lr 0.000173 | ms/batch 327.30 | loss  4.03 | ppl    56.448
| epoch 145 step    63000 |    216 batches | lr 0.000173 | ms/batch 325.72 | loss  4.05 | ppl    57.147
| epoch 145 step    63050 |    266 batches | lr 0.000173 | ms/batch 325.61 | loss  4.06 | ppl    58.140
| epoch 145 step    63100 |    316 batches | lr 0.000172 | ms/batch 325.58 | loss  4.03 | ppl    56.307
| epoch 145 step    63150 |    366 batches | lr 0.000172 | ms/batch 324.51 | loss  4.00 | ppl    54.534
| epoch 145 step    63200 |    416 batches | lr 0.000172 | ms/batch 324.93 | loss  4.00 | ppl    54.468
----------------------------------------------------------------------------------------------------
| Eval 158 at step    63200 | time: 135.41s | valid loss  4.22 | valid ppl    68.291
----------------------------------------------------------------------------------------------------
| epoch 146 step    63250 |     30 batches | lr 0.000172 | ms/batch 425.94 | loss  4.04 | ppl    57.102
| epoch 146 step    63300 |     80 batches | lr 0.000171 | ms/batch 324.91 | loss  3.99 | ppl    53.819
| epoch 146 step    63350 |    130 batches | lr 0.000171 | ms/batch 325.42 | loss  4.03 | ppl    56.287
| epoch 146 step    63400 |    180 batches | lr 0.000171 | ms/batch 324.94 | loss  4.04 | ppl    56.782
| epoch 146 step    63450 |    230 batches | lr 0.000171 | ms/batch 324.38 | loss  4.04 | ppl    56.989
| epoch 146 step    63500 |    280 batches | lr 0.000171 | ms/batch 324.43 | loss  4.06 | ppl    57.694
| epoch 146 step    63550 |    330 batches | lr 0.00017 | ms/batch 323.40 | loss  4.00 | ppl    54.496
| epoch 146 step    63600 |    380 batches | lr 0.00017 | ms/batch 325.60 | loss  4.02 | ppl    55.843
----------------------------------------------------------------------------------------------------
| Eval 159 at step    63600 | time: 134.94s | valid loss  4.22 | valid ppl    67.933
----------------------------------------------------------------------------------------------------
| epoch 146 step    63650 |    430 batches | lr 0.00017 | ms/batch 433.43 | loss  4.06 | ppl    57.798
| epoch 147 step    63700 |     44 batches | lr 0.00017 | ms/batch 321.12 | loss  4.01 | ppl    55.134
| epoch 147 step    63750 |     94 batches | lr 0.00017 | ms/batch 325.26 | loss  3.97 | ppl    52.844
| epoch 147 step    63800 |    144 batches | lr 0.000169 | ms/batch 325.13 | loss  4.02 | ppl    55.921
| epoch 147 step    63850 |    194 batches | lr 0.000169 | ms/batch 325.83 | loss  4.05 | ppl    57.136
| epoch 147 step    63900 |    244 batches | lr 0.000169 | ms/batch 325.67 | loss  4.08 | ppl    59.072
| epoch 147 step    63950 |    294 batches | lr 0.000169 | ms/batch 324.26 | loss  4.08 | ppl    59.428
| epoch 147 step    64000 |    344 batches | lr 0.000168 | ms/batch 326.21 | loss  3.94 | ppl    51.509
----------------------------------------------------------------------------------------------------
| Eval 160 at step    64000 | time: 135.36s | valid loss  4.23 | valid ppl    68.995
----------------------------------------------------------------------------------------------------
| epoch 147 step    64050 |    394 batches | lr 0.000168 | ms/batch 433.43 | loss  4.03 | ppl    56.263
| epoch 148 step    64100 |      8 batches | lr 0.000168 | ms/batch 323.42 | loss  4.06 | ppl    58.181
| epoch 148 step    64150 |     58 batches | lr 0.000168 | ms/batch 326.05 | loss  3.98 | ppl    53.448
| epoch 148 step    64200 |    108 batches | lr 0.000168 | ms/batch 325.16 | loss  3.98 | ppl    53.511
| epoch 148 step    64250 |    158 batches | lr 0.000167 | ms/batch 325.79 | loss  4.01 | ppl    55.166
| epoch 148 step    64300 |    208 batches | lr 0.000167 | ms/batch 326.75 | loss  4.03 | ppl    56.120
| epoch 148 step    64350 |    258 batches | lr 0.000167 | ms/batch 325.22 | loss  4.05 | ppl    57.263
| epoch 148 step    64400 |    308 batches | lr 0.000167 | ms/batch 325.55 | loss  4.05 | ppl    57.404
----------------------------------------------------------------------------------------------------
| Eval 161 at step    64400 | time: 135.58s | valid loss  4.22 | valid ppl    68.136
----------------------------------------------------------------------------------------------------
| epoch 148 step    64450 |    358 batches | lr 0.000166 | ms/batch 433.47 | loss  3.97 | ppl    52.770
| epoch 148 step    64500 |    408 batches | lr 0.000166 | ms/batch 329.63 | loss  4.02 | ppl    55.675
| epoch 149 step    64550 |     22 batches | lr 0.000166 | ms/batch 322.25 | loss  4.08 | ppl    58.857
| epoch 149 step    64600 |     72 batches | lr 0.000166 | ms/batch 325.28 | loss  3.96 | ppl    52.519
| epoch 149 step    64650 |    122 batches | lr 0.000166 | ms/batch 325.93 | loss  4.01 | ppl    55.011
| epoch 149 step    64700 |    172 batches | lr 0.000165 | ms/batch 325.65 | loss  4.03 | ppl    56.048
| epoch 149 step    64750 |    222 batches | lr 0.000165 | ms/batch 325.81 | loss  4.04 | ppl    56.793
| epoch 149 step    64800 |    272 batches | lr 0.000165 | ms/batch 326.10 | loss  4.04 | ppl    56.909
----------------------------------------------------------------------------------------------------
| Eval 162 at step    64800 | time: 135.70s | valid loss  4.22 | valid ppl    67.823
----------------------------------------------------------------------------------------------------
| epoch 149 step    64850 |    322 batches | lr 0.000165 | ms/batch 472.62 | loss  4.00 | ppl    54.351
| epoch 149 step    64900 |    372 batches | lr 0.000164 | ms/batch 329.84 | loss  3.98 | ppl    53.513
| epoch 149 step    64950 |    422 batches | lr 0.000164 | ms/batch 329.29 | loss  4.03 | ppl    56.149
| epoch 150 step    65000 |     36 batches | lr 0.000164 | ms/batch 321.02 | loss  4.04 | ppl    57.015
| epoch 150 step    65050 |     86 batches | lr 0.000164 | ms/batch 325.33 | loss  3.94 | ppl    51.537
| epoch 150 step    65100 |    136 batches | lr 0.000164 | ms/batch 324.83 | loss  4.02 | ppl    55.716
| epoch 150 step    65150 |    186 batches | lr 0.000163 | ms/batch 324.63 | loss  4.01 | ppl    55.402
| epoch 150 step    65200 |    236 batches | lr 0.000163 | ms/batch 325.93 | loss  4.05 | ppl    57.494
----------------------------------------------------------------------------------------------------
| Eval 163 at step    65200 | time: 135.67s | valid loss  4.21 | valid ppl    67.610
----------------------------------------------------------------------------------------------------
| epoch 150 step    65250 |    286 batches | lr 0.000163 | ms/batch 465.52 | loss  4.07 | ppl    58.477
| epoch 150 step    65300 |    336 batches | lr 0.000163 | ms/batch 330.50 | loss  3.93 | ppl    50.921
| epoch 150 step    65350 |    386 batches | lr 0.000162 | ms/batch 332.53 | loss  4.01 | ppl    55.371
| epoch 150 step    65400 |    436 batches | lr 0.000162 | ms/batch 327.58 | loss  4.03 | ppl    56.455
| epoch 151 step    65450 |     50 batches | lr 0.000162 | ms/batch 324.00 | loss  4.00 | ppl    54.470
| epoch 151 step    65500 |    100 batches | lr 0.000162 | ms/batch 325.62 | loss  3.97 | ppl    53.005
| epoch 151 step    65550 |    150 batches | lr 0.000162 | ms/batch 327.22 | loss  3.99 | ppl    54.205
| epoch 151 step    65600 |    200 batches | lr 0.000161 | ms/batch 327.25 | loss  4.03 | ppl    56.395
----------------------------------------------------------------------------------------------------
| Eval 164 at step    65600 | time: 136.38s | valid loss  4.22 | valid ppl    68.099
----------------------------------------------------------------------------------------------------
| epoch 151 step    65650 |    250 batches | lr 0.000161 | ms/batch 433.66 | loss  4.04 | ppl    56.751
| epoch 151 step    65700 |    300 batches | lr 0.000161 | ms/batch 331.57 | loss  4.06 | ppl    58.224
| epoch 151 step    65750 |    350 batches | lr 0.000161 | ms/batch 331.49 | loss  3.94 | ppl    51.642
| epoch 151 step    65800 |    400 batches | lr 0.000161 | ms/batch 330.50 | loss  4.01 | ppl    55.074
| epoch 152 step    65850 |     14 batches | lr 0.00016 | ms/batch 323.79 | loss  4.07 | ppl    58.676
| epoch 152 step    65900 |     64 batches | lr 0.00016 | ms/batch 325.96 | loss  3.96 | ppl    52.296
| epoch 152 step    65950 |    114 batches | lr 0.00016 | ms/batch 326.82 | loss  3.98 | ppl    53.269
| epoch 152 step    66000 |    164 batches | lr 0.00016 | ms/batch 327.54 | loss  3.98 | ppl    53.720
----------------------------------------------------------------------------------------------------
| Eval 165 at step    66000 | time: 136.56s | valid loss  4.21 | valid ppl    67.559
----------------------------------------------------------------------------------------------------
| epoch 152 step    66050 |    214 batches | lr 0.000159 | ms/batch 467.16 | loss  4.01 | ppl    55.309
| epoch 152 step    66100 |    264 batches | lr 0.000159 | ms/batch 330.12 | loss  4.04 | ppl    56.953
| epoch 152 step    66150 |    314 batches | lr 0.000159 | ms/batch 330.48 | loss  4.03 | ppl    56.088
| epoch 152 step    66200 |    364 batches | lr 0.000159 | ms/batch 331.33 | loss  3.96 | ppl    52.236
| epoch 152 step    66250 |    414 batches | lr 0.000159 | ms/batch 331.34 | loss  3.99 | ppl    54.279
| epoch 153 step    66300 |     28 batches | lr 0.000158 | ms/batch 320.66 | loss  4.05 | ppl    57.301
| epoch 153 step    66350 |     78 batches | lr 0.000158 | ms/batch 325.70 | loss  3.95 | ppl    52.008
| epoch 153 step    66400 |    128 batches | lr 0.000158 | ms/batch 326.82 | loss  3.99 | ppl    54.165
----------------------------------------------------------------------------------------------------
| Eval 166 at step    66400 | time: 136.55s | valid loss  4.21 | valid ppl    67.647
----------------------------------------------------------------------------------------------------
| epoch 153 step    66450 |    178 batches | lr 0.000158 | ms/batch 435.65 | loss  4.02 | ppl    55.482
| epoch 153 step    66500 |    228 batches | lr 0.000157 | ms/batch 330.98 | loss  4.02 | ppl    55.573
| epoch 153 step    66550 |    278 batches | lr 0.000157 | ms/batch 330.74 | loss  4.05 | ppl    57.505
| epoch 153 step    66600 |    328 batches | lr 0.000157 | ms/batch 333.45 | loss  3.97 | ppl    53.182
| epoch 153 step    66650 |    378 batches | lr 0.000157 | ms/batch 336.33 | loss  4.00 | ppl    54.466
| epoch 153 step    66700 |    428 batches | lr 0.000157 | ms/batch 329.72 | loss  4.02 | ppl    55.928
| epoch 154 step    66750 |     42 batches | lr 0.000156 | ms/batch 321.05 | loss  4.01 | ppl    55.033
| epoch 154 step    66800 |     92 batches | lr 0.000156 | ms/batch 326.50 | loss  3.95 | ppl    52.027
----------------------------------------------------------------------------------------------------
| Eval 167 at step    66800 | time: 137.21s | valid loss  4.22 | valid ppl    68.326
----------------------------------------------------------------------------------------------------
| epoch 154 step    66850 |    142 batches | lr 0.000156 | ms/batch 433.10 | loss  4.00 | ppl    54.366
| epoch 154 step    66900 |    192 batches | lr 0.000156 | ms/batch 329.92 | loss  4.01 | ppl    54.955
| epoch 154 step    66950 |    242 batches | lr 0.000155 | ms/batch 328.53 | loss  4.02 | ppl    55.675
| epoch 154 step    67000 |    292 batches | lr 0.000155 | ms/batch 329.77 | loss  4.07 | ppl    58.395
| epoch 154 step    67050 |    342 batches | lr 0.000155 | ms/batch 330.61 | loss  3.94 | ppl    51.210
| epoch 154 step    67100 |    392 batches | lr 0.000155 | ms/batch 331.26 | loss  4.01 | ppl    55.048
| epoch 155 step    67150 |      6 batches | lr 0.000155 | ms/batch 322.96 | loss  4.05 | ppl    57.183
| epoch 155 step    67200 |     56 batches | lr 0.000154 | ms/batch 324.78 | loss  3.98 | ppl    53.609
----------------------------------------------------------------------------------------------------
| Eval 168 at step    67200 | time: 136.53s | valid loss  4.22 | valid ppl    67.780
----------------------------------------------------------------------------------------------------
| epoch 155 step    67250 |    106 batches | lr 0.000154 | ms/batch 432.35 | loss  3.97 | ppl    53.140
| epoch 155 step    67300 |    156 batches | lr 0.000154 | ms/batch 329.61 | loss  3.98 | ppl    53.459
| epoch 155 step    67350 |    206 batches | lr 0.000154 | ms/batch 330.35 | loss  4.01 | ppl    55.158
| epoch 155 step    67400 |    256 batches | lr 0.000154 | ms/batch 332.50 | loss  4.02 | ppl    55.449
| epoch 155 step    67450 |    306 batches | lr 0.000153 | ms/batch 330.77 | loss  4.04 | ppl    56.561
| epoch 155 step    67500 |    356 batches | lr 0.000153 | ms/batch 330.25 | loss  3.93 | ppl    50.714
| epoch 155 step    67550 |    406 batches | lr 0.000153 | ms/batch 331.78 | loss  3.99 | ppl    53.800
| epoch 156 step    67600 |     20 batches | lr 0.000153 | ms/batch 322.48 | loss  4.04 | ppl    56.735
----------------------------------------------------------------------------------------------------
| Eval 169 at step    67600 | time: 137.00s | valid loss  4.21 | valid ppl    67.539
----------------------------------------------------------------------------------------------------
| epoch 156 step    67650 |     70 batches | lr 0.000152 | ms/batch 466.69 | loss  3.92 | ppl    50.649
| epoch 156 step    67700 |    120 batches | lr 0.000152 | ms/batch 332.15 | loss  3.98 | ppl    53.668
| epoch 156 step    67750 |    170 batches | lr 0.000152 | ms/batch 331.36 | loss  4.01 | ppl    54.915
| epoch 156 step    67800 |    220 batches | lr 0.000152 | ms/batch 331.14 | loss  4.02 | ppl    55.873
| epoch 156 step    67850 |    270 batches | lr 0.000152 | ms/batch 339.77 | loss  4.01 | ppl    55.076
| epoch 156 step    67900 |    320 batches | lr 0.000151 | ms/batch 338.46 | loss  4.01 | ppl    54.874
| epoch 156 step    67950 |    370 batches | lr 0.000151 | ms/batch 328.94 | loss  3.95 | ppl    52.183
| epoch 156 step    68000 |    420 batches | lr 0.000151 | ms/batch 329.80 | loss  4.00 | ppl    54.564
----------------------------------------------------------------------------------------------------
| Eval 170 at step    68000 | time: 138.33s | valid loss  4.22 | valid ppl    68.006
----------------------------------------------------------------------------------------------------
| epoch 157 step    68050 |     34 batches | lr 0.000151 | ms/batch 425.89 | loss  4.02 | ppl    55.928
| epoch 157 step    68100 |     84 batches | lr 0.00015 | ms/batch 331.96 | loss  3.91 | ppl    49.899
| epoch 157 step    68150 |    134 batches | lr 0.00015 | ms/batch 331.88 | loss  3.99 | ppl    54.222
| epoch 157 step    68200 |    184 batches | lr 0.00015 | ms/batch 331.96 | loss  3.99 | ppl    54.303
| epoch 157 step    68250 |    234 batches | lr 0.00015 | ms/batch 342.41 | loss  4.02 | ppl    55.884
| epoch 157 step    68300 |    284 batches | lr 0.00015 | ms/batch 335.58 | loss  4.02 | ppl    55.961
| epoch 157 step    68350 |    334 batches | lr 0.000149 | ms/batch 330.64 | loss  3.93 | ppl    50.831
| epoch 157 step    68400 |    384 batches | lr 0.000149 | ms/batch 329.22 | loss  3.99 | ppl    53.924
----------------------------------------------------------------------------------------------------
| Eval 171 at step    68400 | time: 137.99s | valid loss  4.22 | valid ppl    68.186
----------------------------------------------------------------------------------------------------
| epoch 157 step    68450 |    434 batches | lr 0.000149 | ms/batch 429.22 | loss  4.02 | ppl    55.886
| epoch 158 step    68500 |     48 batches | lr 0.000149 | ms/batch 323.54 | loss  3.97 | ppl    53.150
| epoch 158 step    68550 |     98 batches | lr 0.000148 | ms/batch 331.30 | loss  3.95 | ppl    52.027
| epoch 158 step    68600 |    148 batches | lr 0.000148 | ms/batch 329.44 | loss  4.03 | ppl    56.083
| epoch 158 step    68650 |    198 batches | lr 0.000148 | ms/batch 330.53 | loss  4.02 | ppl    55.684
| epoch 158 step    68700 |    248 batches | lr 0.000148 | ms/batch 330.23 | loss  3.99 | ppl    54.213
| epoch 158 step    68750 |    298 batches | lr 0.000148 | ms/batch 329.64 | loss  4.03 | ppl    56.309
| epoch 158 step    68800 |    348 batches | lr 0.000147 | ms/batch 329.57 | loss  3.89 | ppl    48.928
----------------------------------------------------------------------------------------------------
| Eval 172 at step    68800 | time: 136.68s | valid loss  4.21 | valid ppl    67.482
----------------------------------------------------------------------------------------------------
| epoch 158 step    68850 |    398 batches | lr 0.000147 | ms/batch 465.67 | loss  3.99 | ppl    53.926
| epoch 159 step    68900 |     12 batches | lr 0.000147 | ms/batch 319.99 | loss  4.02 | ppl    55.542
| epoch 159 step    68950 |     62 batches | lr 0.000147 | ms/batch 330.21 | loss  3.93 | ppl    50.709
| epoch 159 step    69000 |    112 batches | lr 0.000147 | ms/batch 330.41 | loss  3.96 | ppl    52.582
| epoch 159 step    69050 |    162 batches | lr 0.000146 | ms/batch 329.60 | loss  3.99 | ppl    53.853
| epoch 159 step    69100 |    212 batches | lr 0.000146 | ms/batch 330.24 | loss  3.99 | ppl    54.101
| epoch 159 step    69150 |    262 batches | lr 0.000146 | ms/batch 331.60 | loss  4.02 | ppl    55.682
| epoch 159 step    69200 |    312 batches | lr 0.000146 | ms/batch 332.26 | loss  4.00 | ppl    54.481
----------------------------------------------------------------------------------------------------
| Eval 173 at step    69200 | time: 136.74s | valid loss  4.21 | valid ppl    67.424
----------------------------------------------------------------------------------------------------
| epoch 159 step    69250 |    362 batches | lr 0.000145 | ms/batch 465.46 | loss  3.93 | ppl    50.915
| epoch 159 step    69300 |    412 batches | lr 0.000145 | ms/batch 325.72 | loss  3.97 | ppl    52.999
| epoch 160 step    69350 |     26 batches | lr 0.000145 | ms/batch 322.87 | loss  4.00 | ppl    54.765
| epoch 160 step    69400 |     76 batches | lr 0.000145 | ms/batch 330.01 | loss  3.92 | ppl    50.487
| epoch 160 step    69450 |    126 batches | lr 0.000145 | ms/batch 330.38 | loss  3.97 | ppl    52.856
| epoch 160 step    69500 |    176 batches | lr 0.000144 | ms/batch 331.21 | loss  4.00 | ppl    54.605
| epoch 160 step    69550 |    226 batches | lr 0.000144 | ms/batch 333.30 | loss  3.99 | ppl    54.120
| epoch 160 step    69600 |    276 batches | lr 0.000144 | ms/batch 331.41 | loss  4.03 | ppl    56.066
----------------------------------------------------------------------------------------------------
| Eval 174 at step    69600 | time: 136.83s | valid loss  4.20 | valid ppl    66.884
----------------------------------------------------------------------------------------------------
| epoch 160 step    69650 |    326 batches | lr 0.000144 | ms/batch 463.70 | loss  3.92 | ppl    50.531
| epoch 160 step    69700 |    376 batches | lr 0.000144 | ms/batch 326.45 | loss  3.96 | ppl    52.611
| epoch 160 step    69750 |    426 batches | lr 0.000143 | ms/batch 326.44 | loss  3.98 | ppl    53.381
| epoch 161 step    69800 |     40 batches | lr 0.000143 | ms/batch 323.40 | loss  3.98 | ppl    53.285
| epoch 161 step    69850 |     90 batches | lr 0.000143 | ms/batch 331.86 | loss  3.93 | ppl    51.050
| epoch 161 step    69900 |    140 batches | lr 0.000143 | ms/batch 330.60 | loss  3.96 | ppl    52.628
| epoch 161 step    69950 |    190 batches | lr 0.000142 | ms/batch 331.69 | loss  3.99 | ppl    53.876
| epoch 161 step    70000 |    240 batches | lr 0.000142 | ms/batch 331.46 | loss  4.00 | ppl    54.483
----------------------------------------------------------------------------------------------------
| Eval 175 at step    70000 | time: 136.58s | valid loss  4.22 | valid ppl    67.779
----------------------------------------------------------------------------------------------------
| epoch 161 step    70050 |    290 batches | lr 0.000142 | ms/batch 431.66 | loss  4.04 | ppl    56.671
| epoch 161 step    70100 |    340 batches | lr 0.000142 | ms/batch 328.08 | loss  3.90 | ppl    49.401
| epoch 161 step    70150 |    390 batches | lr 0.000142 | ms/batch 327.75 | loss  3.99 | ppl    54.089
| epoch 162 step    70200 |      4 batches | lr 0.000141 | ms/batch 321.53 | loss  4.02 | ppl    55.430
| epoch 162 step    70250 |     54 batches | lr 0.000141 | ms/batch 331.33 | loss  3.92 | ppl    50.471
| epoch 162 step    70300 |    104 batches | lr 0.000141 | ms/batch 330.06 | loss  3.95 | ppl    51.856
| epoch 162 step    70350 |    154 batches | lr 0.000141 | ms/batch 328.75 | loss  3.98 | ppl    53.622
| epoch 162 step    70400 |    204 batches | lr 0.00014 | ms/batch 331.42 | loss  4.00 | ppl    54.398
----------------------------------------------------------------------------------------------------
| Eval 176 at step    70400 | time: 136.53s | valid loss  4.20 | valid ppl    66.582
----------------------------------------------------------------------------------------------------
| epoch 162 step    70450 |    254 batches | lr 0.00014 | ms/batch 464.09 | loss  4.00 | ppl    54.641
| epoch 162 step    70500 |    304 batches | lr 0.00014 | ms/batch 326.97 | loss  4.05 | ppl    57.207
| epoch 162 step    70550 |    354 batches | lr 0.00014 | ms/batch 327.32 | loss  3.90 | ppl    49.235
| epoch 162 step    70600 |    404 batches | lr 0.00014 | ms/batch 332.69 | loss  3.98 | ppl    53.352
| epoch 163 step    70650 |     18 batches | lr 0.000139 | ms/batch 338.15 | loss  4.01 | ppl    55.162
| epoch 163 step    70700 |     68 batches | lr 0.000139 | ms/batch 344.21 | loss  3.91 | ppl    49.796
| epoch 163 step    70750 |    118 batches | lr 0.000139 | ms/batch 343.84 | loss  3.97 | ppl    52.809
| epoch 163 step    70800 |    168 batches | lr 0.000139 | ms/batch 342.91 | loss  3.98 | ppl    53.465
----------------------------------------------------------------------------------------------------
| Eval 177 at step    70800 | time: 139.53s | valid loss  4.20 | valid ppl    66.700
----------------------------------------------------------------------------------------------------
| epoch 163 step    70850 |    218 batches | lr 0.000139 | ms/batch 449.24 | loss  4.01 | ppl    55.237
| epoch 163 step    70900 |    268 batches | lr 0.000138 | ms/batch 332.73 | loss  4.00 | ppl    54.823
| epoch 163 step    70950 |    318 batches | lr 0.000138 | ms/batch 326.23 | loss  3.98 | ppl    53.532
| epoch 163 step    71000 |    368 batches | lr 0.000138 | ms/batch 325.75 | loss  3.93 | ppl    50.659
| epoch 163 step    71050 |    418 batches | lr 0.000138 | ms/batch 324.55 | loss  3.98 | ppl    53.607
| epoch 164 step    71100 |     32 batches | lr 0.000137 | ms/batch 322.91 | loss  4.01 | ppl    54.955
| epoch 164 step    71150 |     82 batches | lr 0.000137 | ms/batch 329.14 | loss  3.90 | ppl    49.586
| epoch 164 step    71200 |    132 batches | lr 0.000137 | ms/batch 328.44 | loss  3.95 | ppl    51.962
----------------------------------------------------------------------------------------------------
| Eval 178 at step    71200 | time: 136.71s | valid loss  4.21 | valid ppl    67.223
----------------------------------------------------------------------------------------------------
| epoch 164 step    71250 |    182 batches | lr 0.000137 | ms/batch 428.36 | loss  3.95 | ppl    51.875
| epoch 164 step    71300 |    232 batches | lr 0.000137 | ms/batch 325.15 | loss  4.00 | ppl    54.590
| epoch 164 step    71350 |    282 batches | lr 0.000136 | ms/batch 326.40 | loss  4.05 | ppl    57.214
| epoch 164 step    71400 |    332 batches | lr 0.000136 | ms/batch 325.41 | loss  3.92 | ppl    50.314
| epoch 164 step    71450 |    382 batches | lr 0.000136 | ms/batch 326.08 | loss  3.96 | ppl    52.412
| epoch 164 step    71500 |    432 batches | lr 0.000136 | ms/batch 325.85 | loss  4.00 | ppl    54.351
| epoch 165 step    71550 |     46 batches | lr 0.000136 | ms/batch 322.64 | loss  3.97 | ppl    52.927
| epoch 165 step    71600 |     96 batches | lr 0.000135 | ms/batch 331.40 | loss  3.88 | ppl    48.377
----------------------------------------------------------------------------------------------------
| Eval 179 at step    71600 | time: 135.60s | valid loss  4.22 | valid ppl    67.945
----------------------------------------------------------------------------------------------------
| epoch 165 step    71650 |    146 batches | lr 0.000135 | ms/batch 428.75 | loss  3.95 | ppl    51.984
| epoch 165 step    71700 |    196 batches | lr 0.000135 | ms/batch 324.67 | loss  3.96 | ppl    52.392
| epoch 165 step    71750 |    246 batches | lr 0.000135 | ms/batch 324.87 | loss  3.98 | ppl    53.678
| epoch 165 step    71800 |    296 batches | lr 0.000134 | ms/batch 325.19 | loss  4.01 | ppl    55.393
| epoch 165 step    71850 |    346 batches | lr 0.000134 | ms/batch 324.07 | loss  3.88 | ppl    48.646
| epoch 165 step    71900 |    396 batches | lr 0.000134 | ms/batch 324.44 | loss  3.97 | ppl    53.078
| epoch 166 step    71950 |     10 batches | lr 0.000134 | ms/batch 319.37 | loss  4.01 | ppl    54.895
| epoch 166 step    72000 |     60 batches | lr 0.000134 | ms/batch 328.79 | loss  3.93 | ppl    50.847
----------------------------------------------------------------------------------------------------
| Eval 180 at step    72000 | time: 135.00s | valid loss  4.22 | valid ppl    67.771
----------------------------------------------------------------------------------------------------
| epoch 166 step    72050 |    110 batches | lr 0.000133 | ms/batch 427.78 | loss  3.96 | ppl    52.339
| epoch 166 step    72100 |    160 batches | lr 0.000133 | ms/batch 325.22 | loss  3.94 | ppl    51.234
| epoch 166 step    72150 |    210 batches | lr 0.000133 | ms/batch 324.72 | loss  3.98 | ppl    53.396
| epoch 166 step    72200 |    260 batches | lr 0.000133 | ms/batch 324.94 | loss  4.03 | ppl    56.085
| epoch 166 step    72250 |    310 batches | lr 0.000133 | ms/batch 340.57 | loss  3.98 | ppl    53.496
| epoch 166 step    72300 |    360 batches | lr 0.000132 | ms/batch 325.44 | loss  3.93 | ppl    51.026
| epoch 166 step    72350 |    410 batches | lr 0.000132 | ms/batch 326.74 | loss  3.97 | ppl    53.194
| epoch 167 step    72400 |     24 batches | lr 0.000132 | ms/batch 320.81 | loss  3.99 | ppl    54.017
----------------------------------------------------------------------------------------------------
| Eval 181 at step    72400 | time: 135.82s | valid loss  4.21 | valid ppl    67.062
----------------------------------------------------------------------------------------------------
| epoch 167 step    72450 |     74 batches | lr 0.000132 | ms/batch 428.73 | loss  3.89 | ppl    49.125
| epoch 167 step    72500 |    124 batches | lr 0.000131 | ms/batch 325.88 | loss  3.95 | ppl    51.881
| epoch 167 step    72550 |    174 batches | lr 0.000131 | ms/batch 326.03 | loss  3.94 | ppl    51.242
| epoch 167 step    72600 |    224 batches | lr 0.000131 | ms/batch 325.91 | loss  3.98 | ppl    53.636
| epoch 167 step    72650 |    274 batches | lr 0.000131 | ms/batch 325.84 | loss  4.00 | ppl    54.419
| epoch 167 step    72700 |    324 batches | lr 0.000131 | ms/batch 326.37 | loss  3.95 | ppl    52.153
| epoch 167 step    72750 |    374 batches | lr 0.00013 | ms/batch 325.83 | loss  3.96 | ppl    52.308
| epoch 167 step    72800 |    424 batches | lr 0.00013 | ms/batch 326.33 | loss  3.95 | ppl    52.165
----------------------------------------------------------------------------------------------------
| Eval 182 at step    72800 | time: 135.54s | valid loss  4.21 | valid ppl    67.503
----------------------------------------------------------------------------------------------------
| epoch 168 step    72850 |     38 batches | lr 0.00013 | ms/batch 423.54 | loss  3.97 | ppl    52.869
| epoch 168 step    72900 |     88 batches | lr 0.00013 | ms/batch 325.11 | loss  3.91 | ppl    49.654
| epoch 168 step    72950 |    138 batches | lr 0.00013 | ms/batch 325.44 | loss  3.96 | ppl    52.476
| epoch 168 step    73000 |    188 batches | lr 0.000129 | ms/batch 324.50 | loss  3.96 | ppl    52.273
| epoch 168 step    73050 |    238 batches | lr 0.000129 | ms/batch 325.66 | loss  3.97 | ppl    53.049
| epoch 168 step    73100 |    288 batches | lr 0.000129 | ms/batch 325.26 | loss  4.01 | ppl    55.317
| epoch 168 step    73150 |    338 batches | lr 0.000129 | ms/batch 326.32 | loss  3.89 | ppl    48.699
| epoch 168 step    73200 |    388 batches | lr 0.000129 | ms/batch 327.07 | loss  3.96 | ppl    52.578
----------------------------------------------------------------------------------------------------
| Eval 183 at step    73200 | time: 135.17s | valid loss  4.22 | valid ppl    67.830
----------------------------------------------------------------------------------------------------
| epoch 169 step    73250 |      2 batches | lr 0.000128 | ms/batch 428.71 | loss  3.99 | ppl    54.101
| epoch 169 step    73300 |     52 batches | lr 0.000128 | ms/batch 326.58 | loss  3.92 | ppl    50.161
| epoch 169 step    73350 |    102 batches | lr 0.000128 | ms/batch 327.68 | loss  3.93 | ppl    50.685
| epoch 169 step    73400 |    152 batches | lr 0.000128 | ms/batch 326.60 | loss  3.95 | ppl    51.948
| epoch 169 step    73450 |    202 batches | lr 0.000127 | ms/batch 327.55 | loss  3.96 | ppl    52.595
| epoch 169 step    73500 |    252 batches | lr 0.000127 | ms/batch 328.43 | loss  3.98 | ppl    53.576
| epoch 169 step    73550 |    302 batches | lr 0.000127 | ms/batch 325.32 | loss  3.99 | ppl    54.146
| epoch 169 step    73600 |    352 batches | lr 0.000127 | ms/batch 327.34 | loss  3.88 | ppl    48.618
----------------------------------------------------------------------------------------------------
| Eval 184 at step    73600 | time: 135.90s | valid loss  4.21 | valid ppl    67.533
----------------------------------------------------------------------------------------------------
| epoch 169 step    73650 |    402 batches | lr 0.000127 | ms/batch 434.70 | loss  3.96 | ppl    52.406
| epoch 170 step    73700 |     16 batches | lr 0.000126 | ms/batch 323.74 | loss  4.01 | ppl    54.949
| epoch 170 step    73750 |     66 batches | lr 0.000126 | ms/batch 326.07 | loss  3.91 | ppl    49.794
| epoch 170 step    73800 |    116 batches | lr 0.000126 | ms/batch 326.36 | loss  3.94 | ppl    51.664
| epoch 170 step    73850 |    166 batches | lr 0.000126 | ms/batch 325.30 | loss  3.94 | ppl    51.499
| epoch 170 step    73900 |    216 batches | lr 0.000126 | ms/batch 324.66 | loss  3.97 | ppl    52.887
| epoch 170 step    73950 |    266 batches | lr 0.000125 | ms/batch 323.54 | loss  3.96 | ppl    52.628
| epoch 170 step    74000 |    316 batches | lr 0.000125 | ms/batch 324.70 | loss  3.99 | ppl    53.876
----------------------------------------------------------------------------------------------------
| Eval 185 at step    74000 | time: 135.43s | valid loss  4.21 | valid ppl    67.208
----------------------------------------------------------------------------------------------------
| epoch 170 step    74050 |    366 batches | lr 0.000125 | ms/batch 433.35 | loss  3.90 | ppl    49.530
| epoch 170 step    74100 |    416 batches | lr 0.000125 | ms/batch 329.87 | loss  3.97 | ppl    53.098
| epoch 171 step    74150 |     30 batches | lr 0.000124 | ms/batch 321.63 | loss  3.97 | ppl    52.846
| epoch 171 step    74200 |     80 batches | lr 0.000124 | ms/batch 324.95 | loss  3.90 | ppl    49.493
| epoch 171 step    74250 |    130 batches | lr 0.000124 | ms/batch 325.55 | loss  3.93 | ppl    50.720
| epoch 171 step    74300 |    180 batches | lr 0.000124 | ms/batch 326.23 | loss  3.94 | ppl    51.531
| epoch 171 step    74350 |    230 batches | lr 0.000124 | ms/batch 326.39 | loss  3.96 | ppl    52.257
| epoch 171 step    74400 |    280 batches | lr 0.000123 | ms/batch 325.73 | loss  4.01 | ppl    55.332
----------------------------------------------------------------------------------------------------
| Eval 186 at step    74400 | time: 135.73s | valid loss  4.21 | valid ppl    67.100
----------------------------------------------------------------------------------------------------
| epoch 171 step    74450 |    330 batches | lr 0.000123 | ms/batch 434.24 | loss  3.90 | ppl    49.530
| epoch 171 step    74500 |    380 batches | lr 0.000123 | ms/batch 331.14 | loss  3.94 | ppl    51.306
| epoch 171 step    74550 |    430 batches | lr 0.000123 | ms/batch 331.14 | loss  3.97 | ppl    53.076
| epoch 172 step    74600 |     44 batches | lr 0.000123 | ms/batch 319.08 | loss  3.93 | ppl    50.973
| epoch 172 step    74650 |     94 batches | lr 0.000122 | ms/batch 324.63 | loss  3.88 | ppl    48.570
| epoch 172 step    74700 |    144 batches | lr 0.000122 | ms/batch 327.25 | loss  3.94 | ppl    51.634
| epoch 172 step    74750 |    194 batches | lr 0.000122 | ms/batch 327.30 | loss  3.94 | ppl    51.487
| epoch 172 step    74800 |    244 batches | lr 0.000122 | ms/batch 326.47 | loss  3.98 | ppl    53.668
----------------------------------------------------------------------------------------------------
| Eval 187 at step    74800 | time: 136.05s | valid loss  4.21 | valid ppl    67.249
----------------------------------------------------------------------------------------------------
| epoch 172 step    74850 |    294 batches | lr 0.000122 | ms/batch 433.24 | loss  4.00 | ppl    54.598
| epoch 172 step    74900 |    344 batches | lr 0.000121 | ms/batch 329.69 | loss  3.85 | ppl    47.021
| epoch 172 step    74950 |    394 batches | lr 0.000121 | ms/batch 331.06 | loss  3.98 | ppl    53.488
| epoch 173 step    75000 |      8 batches | lr 0.000121 | ms/batch 321.70 | loss  3.99 | ppl    54.038
| epoch 173 step    75050 |     58 batches | lr 0.000121 | ms/batch 323.47 | loss  3.89 | ppl    48.827
| epoch 173 step    75100 |    108 batches | lr 0.000121 | ms/batch 323.56 | loss  3.91 | ppl    49.973
| epoch 173 step    75150 |    158 batches | lr 0.00012 | ms/batch 324.24 | loss  3.94 | ppl    51.388
| epoch 173 step    75200 |    208 batches | lr 0.00012 | ms/batch 323.77 | loss  3.98 | ppl    53.273
----------------------------------------------------------------------------------------------------
| Eval 188 at step    75200 | time: 135.51s | valid loss  4.21 | valid ppl    67.497
----------------------------------------------------------------------------------------------------
| epoch 173 step    75250 |    258 batches | lr 0.00012 | ms/batch 432.35 | loss  3.98 | ppl    53.724
| epoch 173 step    75300 |    308 batches | lr 0.00012 | ms/batch 330.46 | loss  3.97 | ppl    52.873
| epoch 173 step    75350 |    358 batches | lr 0.000119 | ms/batch 329.51 | loss  3.87 | ppl    48.053
| epoch 173 step    75400 |    408 batches | lr 0.000119 | ms/batch 327.96 | loss  3.92 | ppl    50.521
| epoch 174 step    75450 |     22 batches | lr 0.000119 | ms/batch 321.05 | loss  3.99 | ppl    53.958
| epoch 174 step    75500 |     72 batches | lr 0.000119 | ms/batch 326.31 | loss  3.88 | ppl    48.627
| epoch 174 step    75550 |    122 batches | lr 0.000119 | ms/batch 325.42 | loss  3.94 | ppl    51.234
| epoch 174 step    75600 |    172 batches | lr 0.000118 | ms/batch 326.30 | loss  3.94 | ppl    51.523
----------------------------------------------------------------------------------------------------
| Eval 189 at step    75600 | time: 135.99s | valid loss  4.19 | valid ppl    66.303
----------------------------------------------------------------------------------------------------
| epoch 174 step    75650 |    222 batches | lr 0.000118 | ms/batch 467.47 | loss  3.97 | ppl    53.163
| epoch 174 step    75700 |    272 batches | lr 0.000118 | ms/batch 330.52 | loss  3.95 | ppl    51.822
| epoch 174 step    75750 |    322 batches | lr 0.000118 | ms/batch 329.82 | loss  3.91 | ppl    49.883
| epoch 174 step    75800 |    372 batches | lr 0.000118 | ms/batch 329.94 | loss  3.91 | ppl    50.006
| epoch 174 step    75850 |    422 batches | lr 0.000117 | ms/batch 328.88 | loss  3.96 | ppl    52.314
| epoch 175 step    75900 |     36 batches | lr 0.000117 | ms/batch 319.56 | loss  3.96 | ppl    52.320
| epoch 175 step    75950 |     86 batches | lr 0.000117 | ms/batch 325.12 | loss  3.88 | ppl    48.477
| epoch 175 step    76000 |    136 batches | lr 0.000117 | ms/batch 324.46 | loss  3.96 | ppl    52.230
----------------------------------------------------------------------------------------------------
| Eval 190 at step    76000 | time: 136.06s | valid loss  4.21 | valid ppl    67.211
----------------------------------------------------------------------------------------------------
| epoch 175 step    76050 |    186 batches | lr 0.000117 | ms/batch 434.24 | loss  3.95 | ppl    52.010
| epoch 175 step    76100 |    236 batches | lr 0.000116 | ms/batch 329.14 | loss  3.94 | ppl    51.188
| epoch 175 step    76150 |    286 batches | lr 0.000116 | ms/batch 329.66 | loss  3.99 | ppl    54.068
| epoch 175 step    76200 |    336 batches | lr 0.000116 | ms/batch 328.33 | loss  3.89 | ppl    48.833
| epoch 175 step    76250 |    386 batches | lr 0.000116 | ms/batch 328.97 | loss  3.94 | ppl    51.515
| epoch 175 step    76300 |    436 batches | lr 0.000116 | ms/batch 326.39 | loss  3.95 | ppl    51.956
| epoch 176 step    76350 |     50 batches | lr 0.000115 | ms/batch 324.68 | loss  3.92 | ppl    50.550
| epoch 176 step    76400 |    100 batches | lr 0.000115 | ms/batch 326.64 | loss  3.92 | ppl    50.163
----------------------------------------------------------------------------------------------------
| Eval 191 at step    76400 | time: 136.44s | valid loss  4.21 | valid ppl    67.575
----------------------------------------------------------------------------------------------------
| epoch 176 step    76450 |    150 batches | lr 0.000115 | ms/batch 433.82 | loss  3.94 | ppl    51.298
| epoch 176 step    76500 |    200 batches | lr 0.000115 | ms/batch 329.97 | loss  3.94 | ppl    51.324
| epoch 176 step    76550 |    250 batches | lr 0.000114 | ms/batch 329.18 | loss  3.94 | ppl    51.642
| epoch 176 step    76600 |    300 batches | lr 0.000114 | ms/batch 330.41 | loss  3.98 | ppl    53.574
| epoch 176 step    76650 |    350 batches | lr 0.000114 | ms/batch 328.69 | loss  3.85 | ppl    46.923
| epoch 176 step    76700 |    400 batches | lr 0.000114 | ms/batch 329.36 | loss  3.93 | ppl    50.959
| epoch 177 step    76750 |     14 batches | lr 0.000114 | ms/batch 322.85 | loss  3.98 | ppl    53.296
| epoch 177 step    76800 |     64 batches | lr 0.000113 | ms/batch 325.89 | loss  3.88 | ppl    48.608
----------------------------------------------------------------------------------------------------
| Eval 192 at step    76800 | time: 136.50s | valid loss  4.20 | valid ppl    66.944
----------------------------------------------------------------------------------------------------
| epoch 177 step    76850 |    114 batches | lr 0.000113 | ms/batch 432.57 | loss  3.93 | ppl    50.849
| epoch 177 step    76900 |    164 batches | lr 0.000113 | ms/batch 329.42 | loss  3.93 | ppl    50.863
| epoch 177 step    76950 |    214 batches | lr 0.000113 | ms/batch 329.78 | loss  3.96 | ppl    52.230
| epoch 177 step    77000 |    264 batches | lr 0.000113 | ms/batch 329.56 | loss  3.97 | ppl    52.879
| epoch 177 step    77050 |    314 batches | lr 0.000112 | ms/batch 329.54 | loss  3.96 | ppl    52.644
| epoch 177 step    77100 |    364 batches | lr 0.000112 | ms/batch 329.00 | loss  3.88 | ppl    48.354
| epoch 177 step    77150 |    414 batches | lr 0.000112 | ms/batch 330.44 | loss  3.93 | ppl    50.925
| epoch 178 step    77200 |     28 batches | lr 0.000112 | ms/batch 320.71 | loss  3.99 | ppl    53.834
----------------------------------------------------------------------------------------------------
| Eval 193 at step    77200 | time: 136.52s | valid loss  4.20 | valid ppl    66.869
----------------------------------------------------------------------------------------------------
| epoch 178 step    77250 |     78 batches | lr 0.000112 | ms/batch 433.73 | loss  3.89 | ppl    48.682
| epoch 178 step    77300 |    128 batches | lr 0.000111 | ms/batch 329.04 | loss  3.92 | ppl    50.458
| epoch 178 step    77350 |    178 batches | lr 0.000111 | ms/batch 328.67 | loss  3.92 | ppl    50.381
| epoch 178 step    77400 |    228 batches | lr 0.000111 | ms/batch 329.23 | loss  3.92 | ppl    50.645
| epoch 178 step    77450 |    278 batches | lr 0.000111 | ms/batch 329.39 | loss  3.98 | ppl    53.712
| epoch 178 step    77500 |    328 batches | lr 0.000111 | ms/batch 330.81 | loss  3.89 | ppl    49.095
| epoch 178 step    77550 |    378 batches | lr 0.00011 | ms/batch 331.13 | loss  3.92 | ppl    50.159
| epoch 178 step    77600 |    428 batches | lr 0.00011 | ms/batch 329.89 | loss  3.96 | ppl    52.451
----------------------------------------------------------------------------------------------------
| Eval 194 at step    77600 | time: 137.15s | valid loss  4.21 | valid ppl    67.109
----------------------------------------------------------------------------------------------------
| epoch 179 step    77650 |     42 batches | lr 0.00011 | ms/batch 428.73 | loss  3.90 | ppl    49.279
| epoch 179 step    77700 |     92 batches | lr 0.00011 | ms/batch 331.03 | loss  3.85 | ppl    47.138
| epoch 179 step    77750 |    142 batches | lr 0.00011 | ms/batch 330.42 | loss  3.94 | ppl    51.443
| epoch 179 step    77800 |    192 batches | lr 0.000109 | ms/batch 329.66 | loss  3.94 | ppl    51.344
| epoch 179 step    77850 |    242 batches | lr 0.000109 | ms/batch 330.17 | loss  3.96 | ppl    52.218
| epoch 179 step    77900 |    292 batches | lr 0.000109 | ms/batch 330.36 | loss  3.97 | ppl    52.908
| epoch 179 step    77950 |    342 batches | lr 0.000109 | ms/batch 331.29 | loss  3.84 | ppl    46.482
| epoch 179 step    78000 |    392 batches | lr 0.000109 | ms/batch 332.20 | loss  3.91 | ppl    49.998
----------------------------------------------------------------------------------------------------
| Eval 195 at step    78000 | time: 137.17s | valid loss  4.20 | valid ppl    66.856
----------------------------------------------------------------------------------------------------
| epoch 180 step    78050 |      6 batches | lr 0.000108 | ms/batch 425.71 | loss  3.97 | ppl    52.830
| epoch 180 step    78100 |     56 batches | lr 0.000108 | ms/batch 331.71 | loss  3.87 | ppl    48.044
| epoch 180 step    78150 |    106 batches | lr 0.000108 | ms/batch 329.08 | loss  3.91 | ppl    49.739
| epoch 180 step    78200 |    156 batches | lr 0.000108 | ms/batch 329.82 | loss  3.93 | ppl    51.104
| epoch 180 step    78250 |    206 batches | lr 0.000108 | ms/batch 331.68 | loss  3.97 | ppl    52.902
| epoch 180 step    78300 |    256 batches | lr 0.000107 | ms/batch 331.79 | loss  3.96 | ppl    52.650
| epoch 180 step    78350 |    306 batches | lr 0.000107 | ms/batch 329.72 | loss  3.98 | ppl    53.571
| epoch 180 step    78400 |    356 batches | lr 0.000107 | ms/batch 329.69 | loss  3.87 | ppl    47.995
----------------------------------------------------------------------------------------------------
| Eval 196 at step    78400 | time: 136.97s | valid loss  4.21 | valid ppl    67.023
----------------------------------------------------------------------------------------------------
| epoch 180 step    78450 |    406 batches | lr 0.000107 | ms/batch 429.90 | loss  3.91 | ppl    49.844
| epoch 181 step    78500 |     20 batches | lr 0.000107 | ms/batch 321.81 | loss  3.97 | ppl    53.032
| epoch 181 step    78550 |     70 batches | lr 0.000106 | ms/batch 329.51 | loss  3.86 | ppl    47.545
| epoch 181 step    78600 |    120 batches | lr 0.000106 | ms/batch 330.22 | loss  3.90 | ppl    49.540
| epoch 181 step    78650 |    170 batches | lr 0.000106 | ms/batch 330.30 | loss  3.92 | ppl    50.334
| epoch 181 step    78700 |    220 batches | lr 0.000106 | ms/batch 329.47 | loss  3.95 | ppl    51.974
| epoch 181 step    78750 |    270 batches | lr 0.000105 | ms/batch 331.12 | loss  3.95 | ppl    51.692
| epoch 181 step    78800 |    320 batches | lr 0.000105 | ms/batch 329.12 | loss  3.92 | ppl    50.641
----------------------------------------------------------------------------------------------------
| Eval 197 at step    78800 | time: 136.54s | valid loss  4.21 | valid ppl    67.284
----------------------------------------------------------------------------------------------------
| epoch 181 step    78850 |    370 batches | lr 0.000105 | ms/batch 428.72 | loss  3.89 | ppl    49.081
| epoch 181 step    78900 |    420 batches | lr 0.000105 | ms/batch 327.10 | loss  3.92 | ppl    50.169
| epoch 182 step    78950 |     34 batches | lr 0.000105 | ms/batch 322.72 | loss  3.92 | ppl    50.302
| epoch 182 step    79000 |     84 batches | lr 0.000104 | ms/batch 330.08 | loss  3.87 | ppl    47.754
| epoch 182 step    79050 |    134 batches | lr 0.000104 | ms/batch 330.76 | loss  3.94 | ppl    51.370
| epoch 182 step    79100 |    184 batches | lr 0.000104 | ms/batch 331.02 | loss  3.92 | ppl    50.517
| epoch 182 step    79150 |    234 batches | lr 0.000104 | ms/batch 331.10 | loss  3.92 | ppl    50.226
| epoch 182 step    79200 |    284 batches | lr 0.000104 | ms/batch 329.86 | loss  3.96 | ppl    52.558
----------------------------------------------------------------------------------------------------
| Eval 198 at step    79200 | time: 136.59s | valid loss  4.21 | valid ppl    67.572
----------------------------------------------------------------------------------------------------
| epoch 182 step    79250 |    334 batches | lr 0.000103 | ms/batch 429.59 | loss  3.85 | ppl    46.839
| epoch 182 step    79300 |    384 batches | lr 0.000103 | ms/batch 325.40 | loss  3.89 | ppl    48.989
| epoch 182 step    79350 |    434 batches | lr 0.000103 | ms/batch 324.16 | loss  3.95 | ppl    51.927
| epoch 183 step    79400 |     48 batches | lr 0.000103 | ms/batch 322.66 | loss  3.89 | ppl    48.997
| epoch 183 step    79450 |     98 batches | lr 0.000103 | ms/batch 330.66 | loss  3.88 | ppl    48.538
| epoch 183 step    79500 |    148 batches | lr 0.000102 | ms/batch 330.16 | loss  3.92 | ppl    50.627
| epoch 183 step    79550 |    198 batches | lr 0.000102 | ms/batch 332.11 | loss  3.92 | ppl    50.614
| epoch 183 step    79600 |    248 batches | lr 0.000102 | ms/batch 330.10 | loss  3.92 | ppl    50.627
----------------------------------------------------------------------------------------------------
| Eval 199 at step    79600 | time: 136.26s | valid loss  4.21 | valid ppl    67.145
----------------------------------------------------------------------------------------------------
| epoch 183 step    79650 |    298 batches | lr 0.000102 | ms/batch 431.14 | loss  3.97 | ppl    52.726
| epoch 183 step    79700 |    348 batches | lr 0.000102 | ms/batch 326.61 | loss  3.84 | ppl    46.380
| epoch 183 step    79750 |    398 batches | lr 0.000101 | ms/batch 327.43 | loss  3.92 | ppl    50.511
| epoch 184 step    79800 |     12 batches | lr 0.000101 | ms/batch 321.60 | loss  3.95 | ppl    51.879
| epoch 184 step    79850 |     62 batches | lr 0.000101 | ms/batch 329.96 | loss  3.84 | ppl    46.704
| epoch 184 step    79900 |    112 batches | lr 0.000101 | ms/batch 330.67 | loss  3.88 | ppl    48.574
| epoch 184 step    79950 |    162 batches | lr 0.000101 | ms/batch 332.36 | loss  3.93 | ppl    51.074
| epoch 184 step    80000 |    212 batches | lr 0.0001 | ms/batch 329.44 | loss  3.89 | ppl    48.877
----------------------------------------------------------------------------------------------------
| Eval 200 at step    80000 | time: 136.63s | valid loss  4.20 | valid ppl    66.840
----------------------------------------------------------------------------------------------------
| epoch 184 step    80050 |    262 batches | lr 0.0001 | ms/batch 429.61 | loss  3.92 | ppl    50.473
| epoch 184 step    80100 |    312 batches | lr 0.0001 | ms/batch 326.38 | loss  3.92 | ppl    50.639
| epoch 184 step    80150 |    362 batches | lr 9.99e-05 | ms/batch 326.13 | loss  3.86 | ppl    47.493
| epoch 184 step    80200 |    412 batches | lr 9.97e-05 | ms/batch 325.95 | loss  3.90 | ppl    49.327
| epoch 185 step    80250 |     26 batches | lr 9.95e-05 | ms/batch 321.53 | loss  3.94 | ppl    51.348
| epoch 185 step    80300 |     76 batches | lr 9.93e-05 | ms/batch 329.66 | loss  3.88 | ppl    48.224
| epoch 185 step    80350 |    126 batches | lr 9.91e-05 | ms/batch 330.61 | loss  3.92 | ppl    50.324
| epoch 185 step    80400 |    176 batches | lr 9.89e-05 | ms/batch 329.78 | loss  3.89 | ppl    49.077
----------------------------------------------------------------------------------------------------
| Eval 201 at step    80400 | time: 135.99s | valid loss  4.20 | valid ppl    66.820
----------------------------------------------------------------------------------------------------
| epoch 185 step    80450 |    226 batches | lr 9.87e-05 | ms/batch 429.41 | loss  3.93 | ppl    51.128
| epoch 185 step    80500 |    276 batches | lr 9.85e-05 | ms/batch 328.02 | loss  3.95 | ppl    51.978
| epoch 185 step    80550 |    326 batches | lr 9.83e-05 | ms/batch 326.33 | loss  3.88 | ppl    48.441
| epoch 185 step    80600 |    376 batches | lr 9.81e-05 | ms/batch 325.63 | loss  3.87 | ppl    47.875
| epoch 185 step    80650 |    426 batches | lr 9.79e-05 | ms/batch 326.17 | loss  3.92 | ppl    50.475
| epoch 186 step    80700 |     40 batches | lr 9.77e-05 | ms/batch 323.95 | loss  3.88 | ppl    48.305
| epoch 186 step    80750 |     90 batches | lr 9.75e-05 | ms/batch 331.40 | loss  3.88 | ppl    48.599
| epoch 186 step    80800 |    140 batches | lr 9.73e-05 | ms/batch 330.43 | loss  3.90 | ppl    49.171
----------------------------------------------------------------------------------------------------
| Eval 202 at step    80800 | time: 136.08s | valid loss  4.21 | valid ppl    67.140
----------------------------------------------------------------------------------------------------
| epoch 186 step    80850 |    190 batches | lr 9.71e-05 | ms/batch 433.04 | loss  3.93 | ppl    50.726
| epoch 186 step    80900 |    240 batches | lr 9.69e-05 | ms/batch 328.19 | loss  3.91 | ppl    50.057
| epoch 186 step    80950 |    290 batches | lr 9.67e-05 | ms/batch 327.54 | loss  3.96 | ppl    52.500
| epoch 186 step    81000 |    340 batches | lr 9.65e-05 | ms/batch 325.65 | loss  3.83 | ppl    46.261
| epoch 186 step    81050 |    390 batches | lr 9.63e-05 | ms/batch 324.77 | loss  3.91 | ppl    50.069
| epoch 187 step    81100 |      4 batches | lr 9.61e-05 | ms/batch 322.07 | loss  3.95 | ppl    51.860
| epoch 187 step    81150 |     54 batches | lr 9.59e-05 | ms/batch 332.39 | loss  3.86 | ppl    47.579
| epoch 187 step    81200 |    104 batches | lr 9.57e-05 | ms/batch 331.41 | loss  3.88 | ppl    48.213
----------------------------------------------------------------------------------------------------
| Eval 203 at step    81200 | time: 136.24s | valid loss  4.21 | valid ppl    67.593
----------------------------------------------------------------------------------------------------
| epoch 187 step    81250 |    154 batches | lr 9.56e-05 | ms/batch 428.42 | loss  3.91 | ppl    50.032
| epoch 187 step    81300 |    204 batches | lr 9.54e-05 | ms/batch 324.93 | loss  3.89 | ppl    49.148
| epoch 187 step    81350 |    254 batches | lr 9.52e-05 | ms/batch 325.25 | loss  3.93 | ppl    51.036
| epoch 187 step    81400 |    304 batches | lr 9.5e-05 | ms/batch 325.47 | loss  3.94 | ppl    51.324
| epoch 187 step    81450 |    354 batches | lr 9.48e-05 | ms/batch 324.87 | loss  3.85 | ppl    46.845
| epoch 187 step    81500 |    404 batches | lr 9.46e-05 | ms/batch 325.94 | loss  3.89 | ppl    48.859
| epoch 188 step    81550 |     18 batches | lr 9.44e-05 | ms/batch 320.84 | loss  3.95 | ppl    52.092
| epoch 188 step    81600 |     68 batches | lr 9.42e-05 | ms/batch 329.14 | loss  3.86 | ppl    47.451
----------------------------------------------------------------------------------------------------
| Eval 204 at step    81600 | time: 135.21s | valid loss  4.20 | valid ppl    67.006
----------------------------------------------------------------------------------------------------
| epoch 188 step    81650 |    118 batches | lr 9.4e-05 | ms/batch 429.16 | loss  3.89 | ppl    48.695
| epoch 188 step    81700 |    168 batches | lr 9.38e-05 | ms/batch 324.95 | loss  3.90 | ppl    49.277
| epoch 188 step    81750 |    218 batches | lr 9.36e-05 | ms/batch 326.24 | loss  3.92 | ppl    50.265
| epoch 188 step    81800 |    268 batches | lr 9.34e-05 | ms/batch 325.43 | loss  3.96 | ppl    52.564
| epoch 188 step    81850 |    318 batches | lr 9.32e-05 | ms/batch 324.97 | loss  3.92 | ppl    50.540
| epoch 188 step    81900 |    368 batches | lr 9.3e-05 | ms/batch 325.82 | loss  3.84 | ppl    46.722
| epoch 188 step    81950 |    418 batches | lr 9.28e-05 | ms/batch 325.67 | loss  3.88 | ppl    48.432
| epoch 189 step    82000 |     32 batches | lr 9.26e-05 | ms/batch 321.07 | loss  3.92 | ppl    50.237
----------------------------------------------------------------------------------------------------
| Eval 205 at step    82000 | time: 135.18s | valid loss  4.21 | valid ppl    67.124
----------------------------------------------------------------------------------------------------
| epoch 189 step    82050 |     82 batches | lr 9.24e-05 | ms/batch 428.79 | loss  3.83 | ppl    46.286
| epoch 189 step    82100 |    132 batches | lr 9.22e-05 | ms/batch 324.57 | loss  3.89 | ppl    49.141
| epoch 189 step    82150 |    182 batches | lr 9.2e-05 | ms/batch 325.51 | loss  3.89 | ppl    49.133
| epoch 189 step    82200 |    232 batches | lr 9.19e-05 | ms/batch 324.16 | loss  3.91 | ppl    49.794
| epoch 189 step    82250 |    282 batches | lr 9.17e-05 | ms/batch 325.10 | loss  3.94 | ppl    51.517
| epoch 189 step    82300 |    332 batches | lr 9.15e-05 | ms/batch 325.73 | loss  3.82 | ppl    45.826
| epoch 189 step    82350 |    382 batches | lr 9.13e-05 | ms/batch 325.87 | loss  3.90 | ppl    49.480
| epoch 189 step    82400 |    432 batches | lr 9.11e-05 | ms/batch 326.26 | loss  3.92 | ppl    50.387
----------------------------------------------------------------------------------------------------
| Eval 206 at step    82400 | time: 135.34s | valid loss  4.20 | valid ppl    66.815
----------------------------------------------------------------------------------------------------
| epoch 190 step    82450 |     46 batches | lr 9.09e-05 | ms/batch 424.71 | loss  3.89 | ppl    48.671
| epoch 190 step    82500 |     96 batches | lr 9.07e-05 | ms/batch 327.28 | loss  3.87 | ppl    48.179
| epoch 190 step    82550 |    146 batches | lr 9.05e-05 | ms/batch 326.20 | loss  3.90 | ppl    49.368
| epoch 190 step    82600 |    196 batches | lr 9.03e-05 | ms/batch 326.26 | loss  3.90 | ppl    49.435
| epoch 190 step    82650 |    246 batches | lr 9.01e-05 | ms/batch 326.58 | loss  3.93 | ppl    50.877
| epoch 190 step    82700 |    296 batches | lr 8.99e-05 | ms/batch 325.82 | loss  3.95 | ppl    51.688
| epoch 190 step    82750 |    346 batches | lr 8.97e-05 | ms/batch 326.62 | loss  3.81 | ppl    45.184
| epoch 190 step    82800 |    396 batches | lr 8.95e-05 | ms/batch 326.89 | loss  3.90 | ppl    49.356
----------------------------------------------------------------------------------------------------
| Eval 207 at step    82800 | time: 135.50s | valid loss  4.20 | valid ppl    66.937
----------------------------------------------------------------------------------------------------
| epoch 191 step    82850 |     10 batches | lr 8.93e-05 | ms/batch 427.54 | loss  3.93 | ppl    51.062
| epoch 191 step    82900 |     60 batches | lr 8.92e-05 | ms/batch 326.07 | loss  3.85 | ppl    47.125
| epoch 191 step    82950 |    110 batches | lr 8.9e-05 | ms/batch 326.48 | loss  3.87 | ppl    47.929
| epoch 191 step    83000 |    160 batches | lr 8.88e-05 | ms/batch 326.90 | loss  3.89 | ppl    48.690
| epoch 191 step    83050 |    210 batches | lr 8.86e-05 | ms/batch 325.44 | loss  3.93 | ppl    50.752
| epoch 191 step    83100 |    260 batches | lr 8.84e-05 | ms/batch 324.88 | loss  3.95 | ppl    51.911
| epoch 191 step    83150 |    310 batches | lr 8.82e-05 | ms/batch 327.32 | loss  3.91 | ppl    49.699
| epoch 191 step    83200 |    360 batches | lr 8.8e-05 | ms/batch 327.04 | loss  3.85 | ppl    46.821
----------------------------------------------------------------------------------------------------
| Eval 208 at step    83200 | time: 135.61s | valid loss  4.20 | valid ppl    66.915
----------------------------------------------------------------------------------------------------
| epoch 191 step    83250 |    410 batches | lr 8.78e-05 | ms/batch 436.43 | loss  3.89 | ppl    48.800
| epoch 192 step    83300 |     24 batches | lr 8.76e-05 | ms/batch 322.27 | loss  3.92 | ppl    50.544
| epoch 192 step    83350 |     74 batches | lr 8.74e-05 | ms/batch 327.72 | loss  3.84 | ppl    46.391
| epoch 192 step    83400 |    124 batches | lr 8.72e-05 | ms/batch 325.93 | loss  3.88 | ppl    48.532
| epoch 192 step    83450 |    174 batches | lr 8.71e-05 | ms/batch 326.03 | loss  3.92 | ppl    50.151
| epoch 192 step    83500 |    224 batches | lr 8.69e-05 | ms/batch 326.91 | loss  3.90 | ppl    49.445
| epoch 192 step    83550 |    274 batches | lr 8.67e-05 | ms/batch 326.74 | loss  3.94 | ppl    51.455
| epoch 192 step    83600 |    324 batches | lr 8.65e-05 | ms/batch 326.50 | loss  3.85 | ppl    47.092
----------------------------------------------------------------------------------------------------
| Eval 209 at step    83600 | time: 135.91s | valid loss  4.21 | valid ppl    67.029
----------------------------------------------------------------------------------------------------
| epoch 192 step    83650 |    374 batches | lr 8.63e-05 | ms/batch 434.86 | loss  3.89 | ppl    48.844
| epoch 192 step    83700 |    424 batches | lr 8.61e-05 | ms/batch 329.97 | loss  3.88 | ppl    48.536
| epoch 193 step    83750 |     38 batches | lr 8.59e-05 | ms/batch 321.09 | loss  3.89 | ppl    48.806
| epoch 193 step    83800 |     88 batches | lr 8.57e-05 | ms/batch 327.42 | loss  3.83 | ppl    46.081
| epoch 193 step    83850 |    138 batches | lr 8.55e-05 | ms/batch 326.53 | loss  3.89 | ppl    49.152
| epoch 193 step    83900 |    188 batches | lr 8.54e-05 | ms/batch 325.55 | loss  3.89 | ppl    48.713
| epoch 193 step    83950 |    238 batches | lr 8.52e-05 | ms/batch 326.28 | loss  3.91 | ppl    49.720
| epoch 193 step    84000 |    288 batches | lr 8.5e-05 | ms/batch 326.03 | loss  3.95 | ppl    51.948
----------------------------------------------------------------------------------------------------
| Eval 210 at step    84000 | time: 135.90s | valid loss  4.20 | valid ppl    66.794
----------------------------------------------------------------------------------------------------
| epoch 193 step    84050 |    338 batches | lr 8.48e-05 | ms/batch 434.01 | loss  3.80 | ppl    44.918
| epoch 193 step    84100 |    388 batches | lr 8.46e-05 | ms/batch 329.17 | loss  3.92 | ppl    50.371
| epoch 194 step    84150 |      2 batches | lr 8.44e-05 | ms/batch 323.81 | loss  3.93 | ppl    50.865
| epoch 194 step    84200 |     52 batches | lr 8.42e-05 | ms/batch 325.92 | loss  3.86 | ppl    47.473
| epoch 194 step    84250 |    102 batches | lr 8.4e-05 | ms/batch 327.57 | loss  3.85 | ppl    47.056
| epoch 194 step    84300 |    152 batches | lr 8.38e-05 | ms/batch 325.87 | loss  3.89 | ppl    49.022
| epoch 194 step    84350 |    202 batches | lr 8.37e-05 | ms/batch 326.73 | loss  3.87 | ppl    48.175
| epoch 194 step    84400 |    252 batches | lr 8.35e-05 | ms/batch 325.81 | loss  3.91 | ppl    50.030
----------------------------------------------------------------------------------------------------
| Eval 211 at step    84400 | time: 135.93s | valid loss  4.21 | valid ppl    67.223
----------------------------------------------------------------------------------------------------
| epoch 194 step    84450 |    302 batches | lr 8.33e-05 | ms/batch 433.29 | loss  3.95 | ppl    51.747
| epoch 194 step    84500 |    352 batches | lr 8.31e-05 | ms/batch 328.95 | loss  3.80 | ppl    44.504
| epoch 194 step    84550 |    402 batches | lr 8.29e-05 | ms/batch 329.09 | loss  3.90 | ppl    49.555
| epoch 195 step    84600 |     16 batches | lr 8.27e-05 | ms/batch 323.15 | loss  3.90 | ppl    49.619
| epoch 195 step    84650 |     66 batches | lr 8.25e-05 | ms/batch 327.17 | loss  3.85 | ppl    46.991
| epoch 195 step    84700 |    116 batches | lr 8.23e-05 | ms/batch 325.21 | loss  3.88 | ppl    48.220
| epoch 195 step    84750 |    166 batches | lr 8.22e-05 | ms/batch 325.63 | loss  3.86 | ppl    47.506
| epoch 195 step    84800 |    216 batches | lr 8.2e-05 | ms/batch 325.35 | loss  3.89 | ppl    49.131
----------------------------------------------------------------------------------------------------
| Eval 212 at step    84800 | time: 135.95s | valid loss  4.20 | valid ppl    66.688
----------------------------------------------------------------------------------------------------
| epoch 195 step    84850 |    266 batches | lr 8.18e-05 | ms/batch 435.89 | loss  3.91 | ppl    49.741
| epoch 195 step    84900 |    316 batches | lr 8.16e-05 | ms/batch 330.49 | loss  3.88 | ppl    48.585
| epoch 195 step    84950 |    366 batches | lr 8.14e-05 | ms/batch 330.80 | loss  3.84 | ppl    46.418
| epoch 195 step    85000 |    416 batches | lr 8.12e-05 | ms/batch 330.25 | loss  3.90 | ppl    49.402
| epoch 196 step    85050 |     30 batches | lr 8.1e-05 | ms/batch 332.00 | loss  3.91 | ppl    49.689
| epoch 196 step    85100 |     80 batches | lr 8.09e-05 | ms/batch 341.88 | loss  3.82 | ppl    45.572
| epoch 196 step    85150 |    130 batches | lr 8.07e-05 | ms/batch 332.60 | loss  3.87 | ppl    47.811
| epoch 196 step    85200 |    180 batches | lr 8.05e-05 | ms/batch 325.64 | loss  3.89 | ppl    49.135
----------------------------------------------------------------------------------------------------
| Eval 213 at step    85200 | time: 137.95s | valid loss  4.20 | valid ppl    66.504
----------------------------------------------------------------------------------------------------
| epoch 196 step    85250 |    230 batches | lr 8.03e-05 | ms/batch 433.17 | loss  3.88 | ppl    48.183
| epoch 196 step    85300 |    280 batches | lr 8.01e-05 | ms/batch 330.46 | loss  3.93 | ppl    50.923
| epoch 196 step    85350 |    330 batches | lr 7.99e-05 | ms/batch 330.81 | loss  3.84 | ppl    46.540
| epoch 196 step    85400 |    380 batches | lr 7.97e-05 | ms/batch 332.42 | loss  3.87 | ppl    47.873
| epoch 196 step    85450 |    430 batches | lr 7.96e-05 | ms/batch 331.51 | loss  3.89 | ppl    49.062
| epoch 197 step    85500 |     44 batches | lr 7.94e-05 | ms/batch 321.53 | loss  3.87 | ppl    47.741
| epoch 197 step    85550 |     94 batches | lr 7.92e-05 | ms/batch 326.15 | loss  3.84 | ppl    46.731
| epoch 197 step    85600 |    144 batches | lr 7.9e-05 | ms/batch 327.13 | loss  3.89 | ppl    48.675
----------------------------------------------------------------------------------------------------
| Eval 214 at step    85600 | time: 136.65s | valid loss  4.20 | valid ppl    66.680
----------------------------------------------------------------------------------------------------
| epoch 197 step    85650 |    194 batches | lr 7.88e-05 | ms/batch 433.84 | loss  3.90 | ppl    49.166
| epoch 197 step    85700 |    244 batches | lr 7.86e-05 | ms/batch 331.86 | loss  3.88 | ppl    48.544
| epoch 197 step    85750 |    294 batches | lr 7.85e-05 | ms/batch 331.62 | loss  3.94 | ppl    51.596
| epoch 197 step    85800 |    344 batches | lr 7.83e-05 | ms/batch 330.73 | loss  3.79 | ppl    44.274
| epoch 197 step    85850 |    394 batches | lr 7.81e-05 | ms/batch 338.58 | loss  3.88 | ppl    48.311
| epoch 198 step    85900 |      8 batches | lr 7.79e-05 | ms/batch 337.16 | loss  3.90 | ppl    49.633
| epoch 198 step    85950 |     58 batches | lr 7.77e-05 | ms/batch 340.51 | loss  3.82 | ppl    45.432
| epoch 198 step    86000 |    108 batches | lr 7.75e-05 | ms/batch 342.49 | loss  3.85 | ppl    46.955
----------------------------------------------------------------------------------------------------
| Eval 215 at step    86000 | time: 139.50s | valid loss  4.21 | valid ppl    67.173
----------------------------------------------------------------------------------------------------
| epoch 198 step    86050 |    158 batches | lr 7.74e-05 | ms/batch 450.94 | loss  3.89 | ppl    48.848
| epoch 198 step    86100 |    208 batches | lr 7.72e-05 | ms/batch 343.16 | loss  3.87 | ppl    48.171
| epoch 198 step    86150 |    258 batches | lr 7.7e-05 | ms/batch 332.44 | loss  3.90 | ppl    49.428
| epoch 198 step    86200 |    308 batches | lr 7.68e-05 | ms/batch 330.12 | loss  3.89 | ppl    48.684
| epoch 198 step    86250 |    358 batches | lr 7.66e-05 | ms/batch 328.17 | loss  3.83 | ppl    46.090
| epoch 198 step    86300 |    408 batches | lr 7.65e-05 | ms/batch 329.97 | loss  3.85 | ppl    46.909
| epoch 199 step    86350 |     22 batches | lr 7.63e-05 | ms/batch 321.32 | loss  3.95 | ppl    51.846
| epoch 199 step    86400 |     72 batches | lr 7.61e-05 | ms/batch 326.57 | loss  3.82 | ppl    45.391
----------------------------------------------------------------------------------------------------
| Eval 216 at step    86400 | time: 137.97s | valid loss  4.20 | valid ppl    66.515
----------------------------------------------------------------------------------------------------
| epoch 199 step    86450 |    122 batches | lr 7.59e-05 | ms/batch 433.32 | loss  3.89 | ppl    48.966
| epoch 199 step    86500 |    172 batches | lr 7.57e-05 | ms/batch 330.03 | loss  3.88 | ppl    48.245
| epoch 199 step    86550 |    222 batches | lr 7.55e-05 | ms/batch 329.82 | loss  3.89 | ppl    48.711
| epoch 199 step    86600 |    272 batches | lr 7.54e-05 | ms/batch 329.73 | loss  3.91 | ppl    49.844
| epoch 199 step    86650 |    322 batches | lr 7.52e-05 | ms/batch 329.46 | loss  3.85 | ppl    46.768
| epoch 199 step    86700 |    372 batches | lr 7.5e-05 | ms/batch 330.41 | loss  3.86 | ppl    47.593
| epoch 199 step    86750 |    422 batches | lr 7.48e-05 | ms/batch 329.60 | loss  3.89 | ppl    48.695
| epoch 200 step    86800 |     36 batches | lr 7.46e-05 | ms/batch 320.88 | loss  3.90 | ppl    49.339
----------------------------------------------------------------------------------------------------
| Eval 217 at step    86800 | time: 136.68s | valid loss  4.20 | valid ppl    66.576
----------------------------------------------------------------------------------------------------
| epoch 200 step    86850 |     86 batches | lr 7.45e-05 | ms/batch 434.86 | loss  3.80 | ppl    44.745
| epoch 200 step    86900 |    136 batches | lr 7.43e-05 | ms/batch 329.92 | loss  3.88 | ppl    48.371
| epoch 200 step    86950 |    186 batches | lr 7.41e-05 | ms/batch 329.96 | loss  3.87 | ppl    47.834
| epoch 200 step    87000 |    236 batches | lr 7.39e-05 | ms/batch 329.99 | loss  3.87 | ppl    47.787
| epoch 200 step    87050 |    286 batches | lr 7.37e-05 | ms/batch 329.57 | loss  3.95 | ppl    51.808
| epoch 200 step    87100 |    336 batches | lr 7.36e-05 | ms/batch 330.08 | loss  3.81 | ppl    45.140
| epoch 200 step    87150 |    386 batches | lr 7.34e-05 | ms/batch 329.32 | loss  3.86 | ppl    47.685
| epoch 200 step    87200 |    436 batches | lr 7.32e-05 | ms/batch 326.23 | loss  3.89 | ppl    48.692
----------------------------------------------------------------------------------------------------
| Eval 218 at step    87200 | time: 136.97s | valid loss  4.20 | valid ppl    66.357
----------------------------------------------------------------------------------------------------
| epoch 201 step    87250 |     50 batches | lr 7.3e-05 | ms/batch 431.53 | loss  3.85 | ppl    46.830
| epoch 201 step    87300 |    100 batches | lr 7.29e-05 | ms/batch 331.67 | loss  3.85 | ppl    46.761
| epoch 201 step    87350 |    150 batches | lr 7.27e-05 | ms/batch 331.37 | loss  3.86 | ppl    47.497
| epoch 201 step    87400 |    200 batches | lr 7.25e-05 | ms/batch 330.78 | loss  3.86 | ppl    47.326
| epoch 201 step    87450 |    250 batches | lr 7.23e-05 | ms/batch 329.52 | loss  3.89 | ppl    48.766
| epoch 201 step    87500 |    300 batches | lr 7.21e-05 | ms/batch 330.68 | loss  3.91 | ppl    49.691
| epoch 201 step    87550 |    350 batches | lr 7.2e-05 | ms/batch 335.39 | loss  3.79 | ppl    44.459
| epoch 201 step    87600 |    400 batches | lr 7.18e-05 | ms/batch 328.61 | loss  3.86 | ppl    47.284
----------------------------------------------------------------------------------------------------
| Eval 219 at step    87600 | time: 137.49s | valid loss  4.20 | valid ppl    66.503
----------------------------------------------------------------------------------------------------
| epoch 202 step    87650 |     14 batches | lr 7.16e-05 | ms/batch 423.85 | loss  3.89 | ppl    48.945
| epoch 202 step    87700 |     64 batches | lr 7.14e-05 | ms/batch 330.34 | loss  3.82 | ppl    45.734
| epoch 202 step    87750 |    114 batches | lr 7.13e-05 | ms/batch 329.96 | loss  3.84 | ppl    46.417
| epoch 202 step    87800 |    164 batches | lr 7.11e-05 | ms/batch 329.83 | loss  3.87 | ppl    47.787
| epoch 202 step    87850 |    214 batches | lr 7.09e-05 | ms/batch 329.73 | loss  3.88 | ppl    48.498
| epoch 202 step    87900 |    264 batches | lr 7.07e-05 | ms/batch 329.17 | loss  3.88 | ppl    48.205
| epoch 202 step    87950 |    314 batches | lr 7.05e-05 | ms/batch 330.23 | loss  3.86 | ppl    47.428
| epoch 202 step    88000 |    364 batches | lr 7.04e-05 | ms/batch 329.67 | loss  3.81 | ppl    45.309
----------------------------------------------------------------------------------------------------
| Eval 220 at step    88000 | time: 136.61s | valid loss  4.20 | valid ppl    66.619
----------------------------------------------------------------------------------------------------
| epoch 202 step    88050 |    414 batches | lr 7.02e-05 | ms/batch 427.58 | loss  3.85 | ppl    47.078
| epoch 203 step    88100 |     28 batches | lr 7e-05 | ms/batch 320.91 | loss  3.89 | ppl    48.846
| epoch 203 step    88150 |     78 batches | lr 6.98e-05 | ms/batch 330.06 | loss  3.82 | ppl    45.727
| epoch 203 step    88200 |    128 batches | lr 6.97e-05 | ms/batch 329.25 | loss  3.87 | ppl    47.901
| epoch 203 step    88250 |    178 batches | lr 6.95e-05 | ms/batch 329.16 | loss  3.86 | ppl    47.514
| epoch 203 step    88300 |    228 batches | lr 6.93e-05 | ms/batch 328.47 | loss  3.86 | ppl    47.238
| epoch 203 step    88350 |    278 batches | lr 6.91e-05 | ms/batch 330.09 | loss  3.92 | ppl    50.157
| epoch 203 step    88400 |    328 batches | lr 6.9e-05 | ms/batch 328.45 | loss  3.84 | ppl    46.720
----------------------------------------------------------------------------------------------------
| Eval 221 at step    88400 | time: 136.19s | valid loss  4.21 | valid ppl    67.194
----------------------------------------------------------------------------------------------------
| epoch 203 step    88450 |    378 batches | lr 6.88e-05 | ms/batch 426.84 | loss  3.84 | ppl    46.447
| epoch 203 step    88500 |    428 batches | lr 6.86e-05 | ms/batch 324.36 | loss  3.88 | ppl    48.637
| epoch 204 step    88550 |     42 batches | lr 6.84e-05 | ms/batch 322.20 | loss  3.86 | ppl    47.266
| epoch 204 step    88600 |     92 batches | lr 6.83e-05 | ms/batch 328.88 | loss  3.77 | ppl    43.395
| epoch 204 step    88650 |    142 batches | lr 6.81e-05 | ms/batch 329.22 | loss  3.86 | ppl    47.375
| epoch 204 step    88700 |    192 batches | lr 6.79e-05 | ms/batch 328.25 | loss  3.87 | ppl    47.894
| epoch 204 step    88750 |    242 batches | lr 6.77e-05 | ms/batch 329.61 | loss  3.88 | ppl    48.638
| epoch 204 step    88800 |    292 batches | lr 6.76e-05 | ms/batch 329.23 | loss  3.91 | ppl    49.704
----------------------------------------------------------------------------------------------------
| Eval 222 at step    88800 | time: 135.94s | valid loss  4.20 | valid ppl    66.519
----------------------------------------------------------------------------------------------------
| epoch 204 step    88850 |    342 batches | lr 6.74e-05 | ms/batch 427.26 | loss  3.77 | ppl    43.451
| epoch 204 step    88900 |    392 batches | lr 6.72e-05 | ms/batch 325.00 | loss  3.87 | ppl    47.836
| epoch 205 step    88950 |      6 batches | lr 6.7e-05 | ms/batch 319.85 | loss  3.90 | ppl    49.401
| epoch 205 step    89000 |     56 batches | lr 6.69e-05 | ms/batch 330.74 | loss  3.84 | ppl    46.353
| epoch 205 step    89050 |    106 batches | lr 6.67e-05 | ms/batch 329.66 | loss  3.80 | ppl    44.569
| epoch 205 step    89100 |    156 batches | lr 6.65e-05 | ms/batch 328.96 | loss  3.87 | ppl    47.907
| epoch 205 step    89150 |    206 batches | lr 6.64e-05 | ms/batch 329.87 | loss  3.86 | ppl    47.681
| epoch 205 step    89200 |    256 batches | lr 6.62e-05 | ms/batch 329.91 | loss  3.89 | ppl    48.754
----------------------------------------------------------------------------------------------------
| Eval 223 at step    89200 | time: 136.07s | valid loss  4.21 | valid ppl    67.055
----------------------------------------------------------------------------------------------------
| epoch 205 step    89250 |    306 batches | lr 6.6e-05 | ms/batch 428.03 | loss  3.90 | ppl    49.162
| epoch 205 step    89300 |    356 batches | lr 6.58e-05 | ms/batch 325.42 | loss  3.81 | ppl    45.128
| epoch 205 step    89350 |    406 batches | lr 6.57e-05 | ms/batch 324.29 | loss  3.87 | ppl    47.899
| epoch 206 step    89400 |     20 batches | lr 6.55e-05 | ms/batch 320.14 | loss  3.90 | ppl    49.586
| epoch 206 step    89450 |     70 batches | lr 6.53e-05 | ms/batch 329.70 | loss  3.78 | ppl    43.734
| epoch 206 step    89500 |    120 batches | lr 6.52e-05 | ms/batch 327.90 | loss  3.84 | ppl    46.728
| epoch 206 step    89550 |    170 batches | lr 6.5e-05 | ms/batch 332.19 | loss  3.86 | ppl    47.290
| epoch 206 step    89600 |    220 batches | lr 6.48e-05 | ms/batch 329.77 | loss  3.87 | ppl    48.128
----------------------------------------------------------------------------------------------------
| Eval 224 at step    89600 | time: 135.91s | valid loss  4.20 | valid ppl    66.891
----------------------------------------------------------------------------------------------------
| epoch 206 step    89650 |    270 batches | lr 6.46e-05 | ms/batch 429.54 | loss  3.89 | ppl    48.878
| epoch 206 step    89700 |    320 batches | lr 6.45e-05 | ms/batch 326.62 | loss  3.85 | ppl    46.879
| epoch 206 step    89750 |    370 batches | lr 6.43e-05 | ms/batch 323.60 | loss  3.82 | ppl    45.398
| epoch 206 step    89800 |    420 batches | lr 6.41e-05 | ms/batch 325.42 | loss  3.85 | ppl    47.068
| epoch 207 step    89850 |     34 batches | lr 6.4e-05 | ms/batch 320.85 | loss  3.87 | ppl    47.791
| epoch 207 step    89900 |     84 batches | lr 6.38e-05 | ms/batch 327.43 | loss  3.78 | ppl    43.842
| epoch 207 step    89950 |    134 batches | lr 6.36e-05 | ms/batch 331.86 | loss  3.88 | ppl    48.235
| epoch 207 step    90000 |    184 batches | lr 6.35e-05 | ms/batch 330.40 | loss  3.84 | ppl    46.398
----------------------------------------------------------------------------------------------------
| Eval 225 at step    90000 | time: 135.76s | valid loss  4.20 | valid ppl    66.388
----------------------------------------------------------------------------------------------------
| epoch 207 step    90050 |    234 batches | lr 6.33e-05 | ms/batch 428.59 | loss  3.86 | ppl    47.234
| epoch 207 step    90100 |    284 batches | lr 6.31e-05 | ms/batch 326.56 | loss  3.89 | ppl    49.093
| epoch 207 step    90150 |    334 batches | lr 6.29e-05 | ms/batch 325.23 | loss  3.78 | ppl    43.773
| epoch 207 step    90200 |    384 batches | lr 6.28e-05 | ms/batch 324.81 | loss  3.84 | ppl    46.711
| epoch 207 step    90250 |    434 batches | lr 6.26e-05 | ms/batch 325.28 | loss  3.87 | ppl    47.830
| epoch 208 step    90300 |     48 batches | lr 6.24e-05 | ms/batch 322.96 | loss  3.82 | ppl    45.451
| epoch 208 step    90350 |     98 batches | lr 6.23e-05 | ms/batch 329.41 | loss  3.82 | ppl    45.394
| epoch 208 step    90400 |    148 batches | lr 6.21e-05 | ms/batch 329.47 | loss  3.85 | ppl    47.157
----------------------------------------------------------------------------------------------------
| Eval 226 at step    90400 | time: 135.63s | valid loss  4.20 | valid ppl    66.761
----------------------------------------------------------------------------------------------------
| epoch 208 step    90450 |    198 batches | lr 6.19e-05 | ms/batch 429.35 | loss  3.88 | ppl    48.247
| epoch 208 step    90500 |    248 batches | lr 6.18e-05 | ms/batch 325.58 | loss  3.88 | ppl    48.606
| epoch 208 step    90550 |    298 batches | lr 6.16e-05 | ms/batch 324.51 | loss  3.90 | ppl    49.609
| epoch 208 step    90600 |    348 batches | lr 6.14e-05 | ms/batch 324.89 | loss  3.77 | ppl    43.511
| epoch 208 step    90650 |    398 batches | lr 6.13e-05 | ms/batch 325.21 | loss  3.86 | ppl    47.321
| epoch 209 step    90700 |     12 batches | lr 6.11e-05 | ms/batch 318.98 | loss  3.89 | ppl    48.890
| epoch 209 step    90750 |     62 batches | lr 6.09e-05 | ms/batch 331.73 | loss  3.81 | ppl    45.375
| epoch 209 step    90800 |    112 batches | lr 6.08e-05 | ms/batch 331.28 | loss  3.86 | ppl    47.312
----------------------------------------------------------------------------------------------------
| Eval 227 at step    90800 | time: 135.56s | valid loss  4.21 | valid ppl    67.206
----------------------------------------------------------------------------------------------------
| epoch 209 step    90850 |    162 batches | lr 6.06e-05 | ms/batch 430.41 | loss  3.87 | ppl    47.888
| epoch 209 step    90900 |    212 batches | lr 6.04e-05 | ms/batch 326.65 | loss  3.85 | ppl    47.006
| epoch 209 step    90950 |    262 batches | lr 6.03e-05 | ms/batch 325.08 | loss  3.87 | ppl    48.153
| epoch 209 step    91000 |    312 batches | lr 6.01e-05 | ms/batch 325.14 | loss  3.87 | ppl    47.965
| epoch 209 step    91050 |    362 batches | lr 5.99e-05 | ms/batch 324.39 | loss  3.81 | ppl    45.196
| epoch 209 step    91100 |    412 batches | lr 5.98e-05 | ms/batch 324.01 | loss  3.85 | ppl    46.782
| epoch 210 step    91150 |     26 batches | lr 5.96e-05 | ms/batch 320.48 | loss  3.87 | ppl    47.810
| epoch 210 step    91200 |     76 batches | lr 5.94e-05 | ms/batch 328.18 | loss  3.78 | ppl    44.013
----------------------------------------------------------------------------------------------------
| Eval 228 at step    91200 | time: 135.20s | valid loss  4.20 | valid ppl    66.778
----------------------------------------------------------------------------------------------------
| epoch 210 step    91250 |    126 batches | lr 5.93e-05 | ms/batch 427.31 | loss  3.84 | ppl    46.331
| epoch 210 step    91300 |    176 batches | lr 5.91e-05 | ms/batch 323.49 | loss  3.84 | ppl    46.348
| epoch 210 step    91350 |    226 batches | lr 5.89e-05 | ms/batch 324.50 | loss  3.89 | ppl    48.684
| epoch 210 step    91400 |    276 batches | lr 5.88e-05 | ms/batch 325.92 | loss  3.90 | ppl    49.194
| epoch 210 step    91450 |    326 batches | lr 5.86e-05 | ms/batch 325.73 | loss  3.85 | ppl    46.841
| epoch 210 step    91500 |    376 batches | lr 5.84e-05 | ms/batch 326.36 | loss  3.84 | ppl    46.312
| epoch 210 step    91550 |    426 batches | lr 5.83e-05 | ms/batch 327.42 | loss  3.86 | ppl    47.362
| epoch 211 step    91600 |     40 batches | lr 5.81e-05 | ms/batch 324.95 | loss  3.84 | ppl    46.413
----------------------------------------------------------------------------------------------------
| Eval 229 at step    91600 | time: 135.32s | valid loss  4.20 | valid ppl    66.906
----------------------------------------------------------------------------------------------------
| epoch 211 step    91650 |     90 batches | lr 5.8e-05 | ms/batch 430.23 | loss  3.78 | ppl    43.729
| epoch 211 step    91700 |    140 batches | lr 5.78e-05 | ms/batch 325.70 | loss  3.83 | ppl    46.265
| epoch 211 step    91750 |    190 batches | lr 5.76e-05 | ms/batch 323.85 | loss  3.85 | ppl    46.922
| epoch 211 step    91800 |    240 batches | lr 5.75e-05 | ms/batch 325.33 | loss  3.86 | ppl    47.514
| epoch 211 step    91850 |    290 batches | lr 5.73e-05 | ms/batch 324.92 | loss  3.91 | ppl    49.741
| epoch 211 step    91900 |    340 batches | lr 5.71e-05 | ms/batch 324.85 | loss  3.76 | ppl    42.786
| epoch 211 step    91950 |    390 batches | lr 5.7e-05 | ms/batch 324.83 | loss  3.87 | ppl    47.770
| epoch 212 step    92000 |      4 batches | lr 5.68e-05 | ms/batch 318.42 | loss  3.87 | ppl    47.819
----------------------------------------------------------------------------------------------------
| Eval 230 at step    92000 | time: 134.89s | valid loss  4.20 | valid ppl    66.539
----------------------------------------------------------------------------------------------------
| epoch 212 step    92050 |     54 batches | lr 5.67e-05 | ms/batch 428.92 | loss  3.80 | ppl    44.834
| epoch 212 step    92100 |    104 batches | lr 5.65e-05 | ms/batch 326.91 | loss  3.80 | ppl    44.876
| epoch 212 step    92150 |    154 batches | lr 5.63e-05 | ms/batch 324.33 | loss  3.87 | ppl    48.064
| epoch 212 step    92200 |    204 batches | lr 5.62e-05 | ms/batch 325.07 | loss  3.86 | ppl    47.282
| epoch 212 step    92250 |    254 batches | lr 5.6e-05 | ms/batch 325.51 | loss  3.88 | ppl    48.260
| epoch 212 step    92300 |    304 batches | lr 5.58e-05 | ms/batch 325.75 | loss  3.90 | ppl    49.516
| epoch 212 step    92350 |    354 batches | lr 5.57e-05 | ms/batch 326.16 | loss  3.77 | ppl    43.546
| epoch 212 step    92400 |    404 batches | lr 5.55e-05 | ms/batch 327.83 | loss  3.85 | ppl    47.035
----------------------------------------------------------------------------------------------------
| Eval 231 at step    92400 | time: 135.53s | valid loss  4.20 | valid ppl    66.631
----------------------------------------------------------------------------------------------------
| epoch 213 step    92450 |     18 batches | lr 5.54e-05 | ms/batch 427.96 | loss  3.90 | ppl    49.229
| epoch 213 step    92500 |     68 batches | lr 5.52e-05 | ms/batch 327.32 | loss  3.79 | ppl    44.416
| epoch 213 step    92550 |    118 batches | lr 5.5e-05 | ms/batch 325.96 | loss  3.84 | ppl    46.321
| epoch 213 step    92600 |    168 batches | lr 5.49e-05 | ms/batch 326.26 | loss  3.83 | ppl    45.957
| epoch 213 step    92650 |    218 batches | lr 5.47e-05 | ms/batch 326.23 | loss  3.87 | ppl    47.905
| epoch 213 step    92700 |    268 batches | lr 5.46e-05 | ms/batch 325.98 | loss  3.84 | ppl    46.724
| epoch 213 step    92750 |    318 batches | lr 5.44e-05 | ms/batch 326.14 | loss  3.85 | ppl    47.039
| epoch 213 step    92800 |    368 batches | lr 5.42e-05 | ms/batch 324.55 | loss  3.81 | ppl    45.352
----------------------------------------------------------------------------------------------------
| Eval 232 at step    92800 | time: 135.52s | valid loss  4.20 | valid ppl    66.455
----------------------------------------------------------------------------------------------------
| epoch 213 step    92850 |    418 batches | lr 5.41e-05 | ms/batch 432.61 | loss  3.83 | ppl    46.252
| epoch 214 step    92900 |     32 batches | lr 5.39e-05 | ms/batch 323.35 | loss  3.86 | ppl    47.672
| epoch 214 step    92950 |     82 batches | lr 5.38e-05 | ms/batch 341.13 | loss  3.79 | ppl    44.117
| epoch 214 step    93000 |    132 batches | lr 5.36e-05 | ms/batch 340.66 | loss  3.85 | ppl    46.801
| epoch 214 step    93050 |    182 batches | lr 5.35e-05 | ms/batch 330.74 | loss  3.82 | ppl    45.717
| epoch 214 step    93100 |    232 batches | lr 5.33e-05 | ms/batch 327.76 | loss  3.87 | ppl    47.776
| epoch 214 step    93150 |    282 batches | lr 5.31e-05 | ms/batch 326.30 | loss  3.87 | ppl    47.722
| epoch 214 step    93200 |    332 batches | lr 5.3e-05 | ms/batch 326.28 | loss  3.79 | ppl    44.468
----------------------------------------------------------------------------------------------------
| Eval 233 at step    93200 | time: 137.46s | valid loss  4.20 | valid ppl    66.889
----------------------------------------------------------------------------------------------------
| epoch 214 step    93250 |    382 batches | lr 5.28e-05 | ms/batch 435.16 | loss  3.84 | ppl    46.339
| epoch 214 step    93300 |    432 batches | lr 5.27e-05 | ms/batch 331.01 | loss  3.85 | ppl    46.978
| epoch 215 step    93350 |     46 batches | lr 5.25e-05 | ms/batch 319.38 | loss  3.81 | ppl    45.322
| epoch 215 step    93400 |     96 batches | lr 5.23e-05 | ms/batch 324.52 | loss  3.79 | ppl    44.279
| epoch 215 step    93450 |    146 batches | lr 5.22e-05 | ms/batch 324.86 | loss  3.84 | ppl    46.362
| epoch 215 step    93500 |    196 batches | lr 5.2e-05 | ms/batch 326.05 | loss  3.86 | ppl    47.288
| epoch 215 step    93550 |    246 batches | lr 5.19e-05 | ms/batch 324.75 | loss  3.85 | ppl    47.138
| epoch 215 step    93600 |    296 batches | lr 5.17e-05 | ms/batch 325.06 | loss  3.91 | ppl    50.008
----------------------------------------------------------------------------------------------------
| Eval 234 at step    93600 | time: 135.55s | valid loss  4.19 | valid ppl    66.336
----------------------------------------------------------------------------------------------------
| epoch 215 step    93650 |    346 batches | lr 5.16e-05 | ms/batch 433.84 | loss  3.75 | ppl    42.708
| epoch 215 step    93700 |    396 batches | lr 5.14e-05 | ms/batch 332.24 | loss  3.83 | ppl    46.151
| epoch 216 step    93750 |     10 batches | lr 5.13e-05 | ms/batch 322.69 | loss  3.89 | ppl    48.796
| epoch 216 step    93800 |     60 batches | lr 5.11e-05 | ms/batch 325.79 | loss  3.78 | ppl    43.929
| epoch 216 step    93850 |    110 batches | lr 5.09e-05 | ms/batch 325.84 | loss  3.81 | ppl    45.334
| epoch 216 step    93900 |    160 batches | lr 5.08e-05 | ms/batch 325.28 | loss  3.82 | ppl    45.681
| epoch 216 step    93950 |    210 batches | lr 5.06e-05 | ms/batch 324.81 | loss  3.82 | ppl    45.677
| epoch 216 step    94000 |    260 batches | lr 5.05e-05 | ms/batch 324.82 | loss  3.89 | ppl    48.680
----------------------------------------------------------------------------------------------------
| Eval 235 at step    94000 | time: 135.75s | valid loss  4.20 | valid ppl    66.898
----------------------------------------------------------------------------------------------------
| epoch 216 step    94050 |    310 batches | lr 5.03e-05 | ms/batch 435.23 | loss  3.85 | ppl    46.881
| epoch 216 step    94100 |    360 batches | lr 5.02e-05 | ms/batch 329.09 | loss  3.78 | ppl    43.871
| epoch 216 step    94150 |    410 batches | lr 5e-05 | ms/batch 327.75 | loss  3.85 | ppl    46.967
| epoch 217 step    94200 |     24 batches | lr 4.99e-05 | ms/batch 321.03 | loss  3.87 | ppl    48.117
| epoch 217 step    94250 |     74 batches | lr 4.97e-05 | ms/batch 326.43 | loss  3.78 | ppl    43.778
| epoch 217 step    94300 |    124 batches | lr 4.96e-05 | ms/batch 324.52 | loss  3.84 | ppl    46.302
| epoch 217 step    94350 |    174 batches | lr 4.94e-05 | ms/batch 324.37 | loss  3.85 | ppl    46.823
| epoch 217 step    94400 |    224 batches | lr 4.93e-05 | ms/batch 326.11 | loss  3.86 | ppl    47.358
----------------------------------------------------------------------------------------------------
| Eval 236 at step    94400 | time: 135.73s | valid loss  4.20 | valid ppl    66.726
----------------------------------------------------------------------------------------------------
| epoch 217 step    94450 |    274 batches | lr 4.91e-05 | ms/batch 433.96 | loss  3.88 | ppl    48.385
| epoch 217 step    94500 |    324 batches | lr 4.89e-05 | ms/batch 329.08 | loss  3.81 | ppl    45.048
| epoch 217 step    94550 |    374 batches | lr 4.88e-05 | ms/batch 329.15 | loss  3.81 | ppl    44.945
| epoch 217 step    94600 |    424 batches | lr 4.86e-05 | ms/batch 328.86 | loss  3.82 | ppl    45.419
| epoch 218 step    94650 |     38 batches | lr 4.85e-05 | ms/batch 318.86 | loss  3.86 | ppl    47.378
| epoch 218 step    94700 |     88 batches | lr 4.83e-05 | ms/batch 324.97 | loss  3.78 | ppl    43.991
| epoch 218 step    94750 |    138 batches | lr 4.82e-05 | ms/batch 324.26 | loss  3.86 | ppl    47.295
| epoch 218 step    94800 |    188 batches | lr 4.8e-05 | ms/batch 324.08 | loss  3.81 | ppl    45.138
----------------------------------------------------------------------------------------------------
| Eval 237 at step    94800 | time: 135.63s | valid loss  4.20 | valid ppl    66.404
----------------------------------------------------------------------------------------------------
| epoch 218 step    94850 |    238 batches | lr 4.79e-05 | ms/batch 434.78 | loss  3.83 | ppl    46.286
| epoch 218 step    94900 |    288 batches | lr 4.77e-05 | ms/batch 343.65 | loss  3.89 | ppl    48.989
| epoch 218 step    94950 |    338 batches | lr 4.76e-05 | ms/batch 343.22 | loss  3.75 | ppl    42.377
| epoch 218 step    95000 |    388 batches | lr 4.74e-05 | ms/batch 329.27 | loss  3.86 | ppl    47.258
| epoch 219 step    95050 |      2 batches | lr 4.73e-05 | ms/batch 328.96 | loss  3.85 | ppl    47.192
| epoch 219 step    95100 |     52 batches | lr 4.71e-05 | ms/batch 341.62 | loss  3.80 | ppl    44.853
| epoch 219 step    95150 |    102 batches | lr 4.7e-05 | ms/batch 339.79 | loss  3.80 | ppl    44.762
| epoch 219 step    95200 |    152 batches | lr 4.68e-05 | ms/batch 340.42 | loss  3.84 | ppl    46.312
----------------------------------------------------------------------------------------------------
| Eval 238 at step    95200 | time: 140.35s | valid loss  4.20 | valid ppl    66.690
----------------------------------------------------------------------------------------------------
| epoch 219 step    95250 |    202 batches | lr 4.67e-05 | ms/batch 450.40 | loss  3.84 | ppl    46.507
| epoch 219 step    95300 |    252 batches | lr 4.65e-05 | ms/batch 330.05 | loss  3.86 | ppl    47.310
| epoch 219 step    95350 |    302 batches | lr 4.64e-05 | ms/batch 330.91 | loss  3.88 | ppl    48.230
| epoch 219 step    95400 |    352 batches | lr 4.62e-05 | ms/batch 331.61 | loss  3.77 | ppl    43.199
| epoch 219 step    95450 |    402 batches | lr 4.61e-05 | ms/batch 329.69 | loss  3.84 | ppl    46.382
| epoch 220 step    95500 |     16 batches | lr 4.59e-05 | ms/batch 322.87 | loss  3.88 | ppl    48.356
| epoch 220 step    95550 |     66 batches | lr 4.58e-05 | ms/batch 331.08 | loss  3.77 | ppl    43.587
| epoch 220 step    95600 |    116 batches | lr 4.56e-05 | ms/batch 326.55 | loss  3.82 | ppl    45.602
----------------------------------------------------------------------------------------------------
| Eval 239 at step    95600 | time: 137.42s | valid loss  4.21 | valid ppl    67.137
----------------------------------------------------------------------------------------------------
| epoch 220 step    95650 |    166 batches | lr 4.55e-05 | ms/batch 432.56 | loss  3.83 | ppl    46.138
| epoch 220 step    95700 |    216 batches | lr 4.53e-05 | ms/batch 330.18 | loss  3.85 | ppl    46.857
| epoch 220 step    95750 |    266 batches | lr 4.52e-05 | ms/batch 328.78 | loss  3.86 | ppl    47.605
| epoch 220 step    95800 |    316 batches | lr 4.5e-05 | ms/batch 329.19 | loss  3.83 | ppl    45.874
| epoch 220 step    95850 |    366 batches | lr 4.49e-05 | ms/batch 329.13 | loss  3.78 | ppl    43.765
| epoch 220 step    95900 |    416 batches | lr 4.48e-05 | ms/batch 329.18 | loss  3.83 | ppl    45.940
| epoch 221 step    95950 |     30 batches | lr 4.46e-05 | ms/batch 320.41 | loss  3.85 | ppl    47.197
| epoch 221 step    96000 |     80 batches | lr 4.45e-05 | ms/batch 324.35 | loss  3.77 | ppl    43.405
----------------------------------------------------------------------------------------------------
| Eval 240 at step    96000 | time: 136.21s | valid loss  4.20 | valid ppl    66.434
----------------------------------------------------------------------------------------------------
| epoch 221 step    96050 |    130 batches | lr 4.43e-05 | ms/batch 432.72 | loss  3.83 | ppl    45.971
| epoch 221 step    96100 |    180 batches | lr 4.42e-05 | ms/batch 329.21 | loss  3.84 | ppl    46.398
| epoch 221 step    96150 |    230 batches | lr 4.4e-05 | ms/batch 330.81 | loss  3.86 | ppl    47.295
| epoch 221 step    96200 |    280 batches | lr 4.39e-05 | ms/batch 328.83 | loss  3.87 | ppl    48.092
| epoch 221 step    96250 |    330 batches | lr 4.37e-05 | ms/batch 328.21 | loss  3.77 | ppl    43.416
| epoch 221 step    96300 |    380 batches | lr 4.36e-05 | ms/batch 329.82 | loss  3.82 | ppl    45.421
| epoch 221 step    96350 |    430 batches | lr 4.34e-05 | ms/batch 329.75 | loss  3.83 | ppl    46.167
| epoch 222 step    96400 |     44 batches | lr 4.33e-05 | ms/batch 319.41 | loss  3.83 | ppl    46.201
----------------------------------------------------------------------------------------------------
| Eval 241 at step    96400 | time: 136.44s | valid loss  4.19 | valid ppl    66.171
----------------------------------------------------------------------------------------------------
| epoch 222 step    96450 |     94 batches | lr 4.32e-05 | ms/batch 480.72 | loss  3.76 | ppl    42.957
| epoch 222 step    96500 |    144 batches | lr 4.3e-05 | ms/batch 328.47 | loss  3.81 | ppl    45.271
| epoch 222 step    96550 |    194 batches | lr 4.29e-05 | ms/batch 328.76 | loss  3.82 | ppl    45.715
| epoch 222 step    96600 |    244 batches | lr 4.27e-05 | ms/batch 329.73 | loss  3.84 | ppl    46.404
| epoch 222 step    96650 |    294 batches | lr 4.26e-05 | ms/batch 328.76 | loss  3.88 | ppl    48.597
| epoch 222 step    96700 |    344 batches | lr 4.24e-05 | ms/batch 328.04 | loss  3.74 | ppl    41.936
| epoch 222 step    96750 |    394 batches | lr 4.23e-05 | ms/batch 328.78 | loss  3.81 | ppl    45.355
| epoch 223 step    96800 |      8 batches | lr 4.21e-05 | ms/batch 322.83 | loss  3.84 | ppl    46.658
----------------------------------------------------------------------------------------------------
| Eval 242 at step    96800 | time: 136.38s | valid loss  4.19 | valid ppl    66.339
----------------------------------------------------------------------------------------------------
| epoch 223 step    96850 |     58 batches | lr 4.2e-05 | ms/batch 431.49 | loss  3.77 | ppl    43.331
| epoch 223 step    96900 |    108 batches | lr 4.19e-05 | ms/batch 328.39 | loss  3.78 | ppl    43.883
| epoch 223 step    96950 |    158 batches | lr 4.17e-05 | ms/batch 329.09 | loss  3.83 | ppl    46.117
| epoch 223 step    97000 |    208 batches | lr 4.16e-05 | ms/batch 330.06 | loss  3.86 | ppl    47.410
| epoch 223 step    97050 |    258 batches | lr 4.14e-05 | ms/batch 329.15 | loss  3.86 | ppl    47.254
| epoch 223 step    97100 |    308 batches | lr 4.13e-05 | ms/batch 331.85 | loss  3.84 | ppl    46.618
| epoch 223 step    97150 |    358 batches | lr 4.11e-05 | ms/batch 330.33 | loss  3.78 | ppl    43.849
| epoch 223 step    97200 |    408 batches | lr 4.1e-05 | ms/batch 331.19 | loss  3.81 | ppl    44.974
----------------------------------------------------------------------------------------------------
| Eval 243 at step    97200 | time: 137.08s | valid loss  4.19 | valid ppl    66.313
----------------------------------------------------------------------------------------------------
| epoch 224 step    97250 |     22 batches | lr 4.09e-05 | ms/batch 425.75 | loss  3.83 | ppl    45.869
| epoch 224 step    97300 |     72 batches | lr 4.07e-05 | ms/batch 329.66 | loss  3.75 | ppl    42.576
| epoch 224 step    97350 |    122 batches | lr 4.06e-05 | ms/batch 336.25 | loss  3.82 | ppl    45.829
| epoch 224 step    97400 |    172 batches | lr 4.04e-05 | ms/batch 344.54 | loss  3.83 | ppl    46.263
| epoch 224 step    97450 |    222 batches | lr 4.03e-05 | ms/batch 344.00 | loss  3.83 | ppl    45.976
| epoch 224 step    97500 |    272 batches | lr 4.02e-05 | ms/batch 343.47 | loss  3.87 | ppl    47.795
| epoch 224 step    97550 |    322 batches | lr 4e-05 | ms/batch 343.04 | loss  3.79 | ppl    44.179
| epoch 224 step    97600 |    372 batches | lr 3.99e-05 | ms/batch 343.48 | loss  3.82 | ppl    45.829
----------------------------------------------------------------------------------------------------
| Eval 244 at step    97600 | time: 140.74s | valid loss  4.19 | valid ppl    66.265
----------------------------------------------------------------------------------------------------
| epoch 224 step    97650 |    422 batches | lr 3.97e-05 | ms/batch 448.38 | loss  3.84 | ppl    46.739
| epoch 225 step    97700 |     36 batches | lr 3.96e-05 | ms/batch 336.79 | loss  3.83 | ppl    45.930
| epoch 225 step    97750 |     86 batches | lr 3.95e-05 | ms/batch 344.74 | loss  3.76 | ppl    42.967
| epoch 225 step    97800 |    136 batches | lr 3.93e-05 | ms/batch 343.22 | loss  3.83 | ppl    45.845
| epoch 225 step    97850 |    186 batches | lr 3.92e-05 | ms/batch 340.78 | loss  3.83 | ppl    46.066
| epoch 225 step    97900 |    236 batches | lr 3.9e-05 | ms/batch 330.17 | loss  3.85 | ppl    46.995
| epoch 225 step    97950 |    286 batches | lr 3.89e-05 | ms/batch 330.81 | loss  3.88 | ppl    48.430
| epoch 225 step    98000 |    336 batches | lr 3.88e-05 | ms/batch 329.70 | loss  3.74 | ppl    41.978
----------------------------------------------------------------------------------------------------
| Eval 245 at step    98000 | time: 140.01s | valid loss  4.20 | valid ppl    66.423
----------------------------------------------------------------------------------------------------
| epoch 225 step    98050 |    386 batches | lr 3.86e-05 | ms/batch 428.60 | loss  3.81 | ppl    44.981
| epoch 225 step    98100 |    436 batches | lr 3.85e-05 | ms/batch 322.15 | loss  3.83 | ppl    45.940
| epoch 226 step    98150 |     50 batches | lr 3.84e-05 | ms/batch 330.25 | loss  3.79 | ppl    44.385
| epoch 226 step    98200 |    100 batches | lr 3.82e-05 | ms/batch 329.92 | loss  3.77 | ppl    43.596
| epoch 226 step    98250 |    150 batches | lr 3.81e-05 | ms/batch 330.72 | loss  3.82 | ppl    45.729
| epoch 226 step    98300 |    200 batches | lr 3.79e-05 | ms/batch 329.80 | loss  3.84 | ppl    46.473
| epoch 226 step    98350 |    250 batches | lr 3.78e-05 | ms/batch 330.16 | loss  3.84 | ppl    46.511
| epoch 226 step    98400 |    300 batches | lr 3.77e-05 | ms/batch 331.70 | loss  3.86 | ppl    47.488
----------------------------------------------------------------------------------------------------
| Eval 246 at step    98400 | time: 136.69s | valid loss  4.19 | valid ppl    66.100
----------------------------------------------------------------------------------------------------
| epoch 226 step    98450 |    350 batches | lr 3.75e-05 | ms/batch 464.74 | loss  3.75 | ppl    42.408
| epoch 226 step    98500 |    400 batches | lr 3.74e-05 | ms/batch 324.89 | loss  3.82 | ppl    45.553
| epoch 227 step    98550 |     14 batches | lr 3.73e-05 | ms/batch 322.41 | loss  3.84 | ppl    46.553
| epoch 227 step    98600 |     64 batches | lr 3.71e-05 | ms/batch 330.10 | loss  3.77 | ppl    43.412
| epoch 227 step    98650 |    114 batches | lr 3.7e-05 | ms/batch 331.07 | loss  3.78 | ppl    44.006
| epoch 227 step    98700 |    164 batches | lr 3.69e-05 | ms/batch 329.15 | loss  3.82 | ppl    45.711
| epoch 227 step    98750 |    214 batches | lr 3.67e-05 | ms/batch 330.33 | loss  3.82 | ppl    45.531
| epoch 227 step    98800 |    264 batches | lr 3.66e-05 | ms/batch 329.00 | loss  3.87 | ppl    47.718
----------------------------------------------------------------------------------------------------
| Eval 247 at step    98800 | time: 136.34s | valid loss  4.20 | valid ppl    66.723
----------------------------------------------------------------------------------------------------
| epoch 227 step    98850 |    314 batches | lr 3.65e-05 | ms/batch 428.05 | loss  3.84 | ppl    46.571
| epoch 227 step    98900 |    364 batches | lr 3.63e-05 | ms/batch 325.61 | loss  3.76 | ppl    42.801
| epoch 227 step    98950 |    414 batches | lr 3.62e-05 | ms/batch 325.33 | loss  3.81 | ppl    45.085
| epoch 228 step    99000 |     28 batches | lr 3.61e-05 | ms/batch 322.21 | loss  3.84 | ppl    46.629
| epoch 228 step    99050 |     78 batches | lr 3.59e-05 | ms/batch 330.11 | loss  3.77 | ppl    43.355
| epoch 228 step    99100 |    128 batches | lr 3.58e-05 | ms/batch 330.30 | loss  3.81 | ppl    45.207
| epoch 228 step    99150 |    178 batches | lr 3.57e-05 | ms/batch 330.65 | loss  3.84 | ppl    46.542
| epoch 228 step    99200 |    228 batches | lr 3.55e-05 | ms/batch 329.22 | loss  3.83 | ppl    46.212
----------------------------------------------------------------------------------------------------
| Eval 248 at step    99200 | time: 136.09s | valid loss  4.20 | valid ppl    66.565
----------------------------------------------------------------------------------------------------
| epoch 228 step    99250 |    278 batches | lr 3.54e-05 | ms/batch 427.96 | loss  3.87 | ppl    47.853
| epoch 228 step    99300 |    328 batches | lr 3.53e-05 | ms/batch 324.95 | loss  3.78 | ppl    43.688
| epoch 228 step    99350 |    378 batches | lr 3.51e-05 | ms/batch 326.02 | loss  3.81 | ppl    45.163
| epoch 228 step    99400 |    428 batches | lr 3.5e-05 | ms/batch 327.61 | loss  3.85 | ppl    46.881
| epoch 229 step    99450 |     42 batches | lr 3.49e-05 | ms/batch 323.35 | loss  3.80 | ppl    44.525
| epoch 229 step    99500 |     92 batches | lr 3.47e-05 | ms/batch 329.78 | loss  3.74 | ppl    42.021
| epoch 229 step    99550 |    142 batches | lr 3.46e-05 | ms/batch 330.42 | loss  3.83 | ppl    46.163
| epoch 229 step    99600 |    192 batches | lr 3.45e-05 | ms/batch 329.49 | loss  3.82 | ppl    45.496
----------------------------------------------------------------------------------------------------
| Eval 249 at step    99600 | time: 135.97s | valid loss  4.19 | valid ppl    66.206
----------------------------------------------------------------------------------------------------
| epoch 229 step    99650 |    242 batches | lr 3.43e-05 | ms/batch 428.38 | loss  3.83 | ppl    46.145
| epoch 229 step    99700 |    292 batches | lr 3.42e-05 | ms/batch 325.80 | loss  3.88 | ppl    48.277
| epoch 229 step    99750 |    342 batches | lr 3.41e-05 | ms/batch 325.13 | loss  3.73 | ppl    41.821
| epoch 229 step    99800 |    392 batches | lr 3.39e-05 | ms/batch 324.89 | loss  3.81 | ppl    45.290
| epoch 230 step    99850 |      6 batches | lr 3.38e-05 | ms/batch 318.84 | loss  3.84 | ppl    46.740
| epoch 230 step    99900 |     56 batches | lr 3.37e-05 | ms/batch 328.59 | loss  3.79 | ppl    44.343
| epoch 230 step    99950 |    106 batches | lr 3.36e-05 | ms/batch 329.85 | loss  3.77 | ppl    43.302
| epoch 230 step   100000 |    156 batches | lr 3.34e-05 | ms/batch 329.93 | loss  3.83 | ppl    45.885
----------------------------------------------------------------------------------------------------
| Eval 250 at step   100000 | time: 135.60s | valid loss  4.20 | valid ppl    66.484
----------------------------------------------------------------------------------------------------
| epoch 230 step   100050 |    206 batches | lr 3.33e-05 | ms/batch 429.61 | loss  3.81 | ppl    45.371
| epoch 230 step   100100 |    256 batches | lr 3.32e-05 | ms/batch 326.49 | loss  3.84 | ppl    46.520
| epoch 230 step   100150 |    306 batches | lr 3.3e-05 | ms/batch 324.65 | loss  3.83 | ppl    46.108
| epoch 230 step   100200 |    356 batches | lr 3.29e-05 | ms/batch 326.04 | loss  3.78 | ppl    43.700
| epoch 230 step   100250 |    406 batches | lr 3.28e-05 | ms/batch 326.05 | loss  3.81 | ppl    45.216
| epoch 231 step   100300 |     20 batches | lr 3.27e-05 | ms/batch 320.81 | loss  3.83 | ppl    46.210
| epoch 231 step   100350 |     70 batches | lr 3.25e-05 | ms/batch 329.75 | loss  3.76 | ppl    42.801
| epoch 231 step   100400 |    120 batches | lr 3.24e-05 | ms/batch 329.11 | loss  3.82 | ppl    45.473
----------------------------------------------------------------------------------------------------
| Eval 251 at step   100400 | time: 135.59s | valid loss  4.20 | valid ppl    66.628
----------------------------------------------------------------------------------------------------
| epoch 231 step   100450 |    170 batches | lr 3.23e-05 | ms/batch 428.44 | loss  3.83 | ppl    46.095
| epoch 231 step   100500 |    220 batches | lr 3.21e-05 | ms/batch 325.92 | loss  3.81 | ppl    45.068
| epoch 231 step   100550 |    270 batches | lr 3.2e-05 | ms/batch 325.27 | loss  3.84 | ppl    46.575
| epoch 231 step   100600 |    320 batches | lr 3.19e-05 | ms/batch 326.28 | loss  3.79 | ppl    44.165
| epoch 231 step   100650 |    370 batches | lr 3.18e-05 | ms/batch 326.30 | loss  3.76 | ppl    43.117
| epoch 231 step   100700 |    420 batches | lr 3.16e-05 | ms/batch 323.82 | loss  3.83 | ppl    45.980
| epoch 232 step   100750 |     34 batches | lr 3.15e-05 | ms/batch 321.73 | loss  3.84 | ppl    46.313
| epoch 232 step   100800 |     84 batches | lr 3.14e-05 | ms/batch 328.55 | loss  3.75 | ppl    42.731
----------------------------------------------------------------------------------------------------
| Eval 252 at step   100800 | time: 135.32s | valid loss  4.19 | valid ppl    66.159
----------------------------------------------------------------------------------------------------
| epoch 232 step   100850 |    134 batches | lr 3.13e-05 | ms/batch 427.85 | loss  3.81 | ppl    45.230
| epoch 232 step   100900 |    184 batches | lr 3.11e-05 | ms/batch 324.18 | loss  3.81 | ppl    45.032
| epoch 232 step   100950 |    234 batches | lr 3.1e-05 | ms/batch 323.54 | loss  3.84 | ppl    46.326
| epoch 232 step   101000 |    284 batches | lr 3.09e-05 | ms/batch 325.32 | loss  3.85 | ppl    46.969
| epoch 232 step   101050 |    334 batches | lr 3.08e-05 | ms/batch 324.34 | loss  3.76 | ppl    43.048
| epoch 232 step   101100 |    384 batches | lr 3.06e-05 | ms/batch 324.49 | loss  3.79 | ppl    44.141
| epoch 232 step   101150 |    434 batches | lr 3.05e-05 | ms/batch 325.14 | loss  3.86 | ppl    47.245
| epoch 233 step   101200 |     48 batches | lr 3.04e-05 | ms/batch 322.10 | loss  3.78 | ppl    43.655
----------------------------------------------------------------------------------------------------
| Eval 253 at step   101200 | time: 134.85s | valid loss  4.20 | valid ppl    66.396
----------------------------------------------------------------------------------------------------
| epoch 233 step   101250 |     98 batches | lr 3.03e-05 | ms/batch 428.37 | loss  3.76 | ppl    43.093
| epoch 233 step   101300 |    148 batches | lr 3.01e-05 | ms/batch 324.62 | loss  3.80 | ppl    44.485
| epoch 233 step   101350 |    198 batches | lr 3e-05 | ms/batch 325.12 | loss  3.83 | ppl    46.189
| epoch 233 step   101400 |    248 batches | lr 2.99e-05 | ms/batch 324.37 | loss  3.85 | ppl    46.953
| epoch 233 step   101450 |    298 batches | lr 2.98e-05 | ms/batch 324.00 | loss  3.87 | ppl    47.967
| epoch 233 step   101500 |    348 batches | lr 2.96e-05 | ms/batch 324.37 | loss  3.72 | ppl    41.065
| epoch 233 step   101550 |    398 batches | lr 2.95e-05 | ms/batch 325.93 | loss  3.82 | ppl    45.579
| epoch 234 step   101600 |     12 batches | lr 2.94e-05 | ms/batch 319.85 | loss  3.83 | ppl    46.171
----------------------------------------------------------------------------------------------------
| Eval 254 at step   101600 | time: 134.86s | valid loss  4.19 | valid ppl    66.222
----------------------------------------------------------------------------------------------------
| epoch 234 step   101650 |     62 batches | lr 2.93e-05 | ms/batch 427.82 | loss  3.76 | ppl    42.953
| epoch 234 step   101700 |    112 batches | lr 2.92e-05 | ms/batch 330.02 | loss  3.80 | ppl    44.494
| epoch 234 step   101750 |    162 batches | lr 2.9e-05 | ms/batch 340.70 | loss  3.83 | ppl    46.172
| epoch 234 step   101800 |    212 batches | lr 2.89e-05 | ms/batch 341.44 | loss  3.81 | ppl    45.230
| epoch 234 step   101850 |    262 batches | lr 2.88e-05 | ms/batch 340.74 | loss  3.85 | ppl    47.052
| epoch 234 step   101900 |    312 batches | lr 2.87e-05 | ms/batch 341.10 | loss  3.82 | ppl    45.387
| epoch 234 step   101950 |    362 batches | lr 2.86e-05 | ms/batch 331.58 | loss  3.77 | ppl    43.322
| epoch 234 step   102000 |    412 batches | lr 2.84e-05 | ms/batch 324.90 | loss  3.78 | ppl    44.027
----------------------------------------------------------------------------------------------------
| Eval 255 at step   102000 | time: 138.90s | valid loss  4.20 | valid ppl    66.457
----------------------------------------------------------------------------------------------------
| epoch 235 step   102050 |     26 batches | lr 2.83e-05 | ms/batch 426.02 | loss  3.82 | ppl    45.829
| epoch 235 step   102100 |     76 batches | lr 2.82e-05 | ms/batch 325.23 | loss  3.76 | ppl    42.878
| epoch 235 step   102150 |    126 batches | lr 2.81e-05 | ms/batch 325.22 | loss  3.79 | ppl    44.369
| epoch 235 step   102200 |    176 batches | lr 2.8e-05 | ms/batch 325.26 | loss  3.78 | ppl    43.811
| epoch 235 step   102250 |    226 batches | lr 2.78e-05 | ms/batch 325.92 | loss  3.81 | ppl    45.246
| epoch 235 step   102300 |    276 batches | lr 2.77e-05 | ms/batch 326.25 | loss  3.84 | ppl    46.627
| epoch 235 step   102350 |    326 batches | lr 2.76e-05 | ms/batch 325.26 | loss  3.78 | ppl    43.773
| epoch 235 step   102400 |    376 batches | lr 2.75e-05 | ms/batch 324.46 | loss  3.80 | ppl    44.567
----------------------------------------------------------------------------------------------------
| Eval 256 at step   102400 | time: 135.18s | valid loss  4.19 | valid ppl    66.272
----------------------------------------------------------------------------------------------------
| epoch 235 step   102450 |    426 batches | lr 2.74e-05 | ms/batch 432.26 | loss  3.79 | ppl    44.262
| epoch 236 step   102500 |     40 batches | lr 2.72e-05 | ms/batch 318.06 | loss  3.80 | ppl    44.670
| epoch 236 step   102550 |     90 batches | lr 2.71e-05 | ms/batch 324.47 | loss  3.75 | ppl    42.450
| epoch 236 step   102600 |    140 batches | lr 2.7e-05 | ms/batch 324.82 | loss  3.81 | ppl    45.324
| epoch 236 step   102650 |    190 batches | lr 2.69e-05 | ms/batch 324.78 | loss  3.80 | ppl    44.824
| epoch 236 step   102700 |    240 batches | lr 2.68e-05 | ms/batch 324.19 | loss  3.82 | ppl    45.586
| epoch 236 step   102750 |    290 batches | lr 2.67e-05 | ms/batch 324.00 | loss  3.86 | ppl    47.382
| epoch 236 step   102800 |    340 batches | lr 2.65e-05 | ms/batch 324.18 | loss  3.73 | ppl    41.606
----------------------------------------------------------------------------------------------------
| Eval 257 at step   102800 | time: 134.81s | valid loss  4.20 | valid ppl    66.453
----------------------------------------------------------------------------------------------------
| epoch 236 step   102850 |    390 batches | lr 2.64e-05 | ms/batch 431.35 | loss  3.81 | ppl    44.936
| epoch 237 step   102900 |      4 batches | lr 2.63e-05 | ms/batch 322.64 | loss  3.84 | ppl    46.493
| epoch 237 step   102950 |     54 batches | lr 2.62e-05 | ms/batch 323.05 | loss  3.77 | ppl    43.451
| epoch 237 step   103000 |    104 batches | lr 2.61e-05 | ms/batch 324.31 | loss  3.75 | ppl    42.503
| epoch 237 step   103050 |    154 batches | lr 2.6e-05 | ms/batch 323.61 | loss  3.79 | ppl    44.298
| epoch 237 step   103100 |    204 batches | lr 2.58e-05 | ms/batch 324.32 | loss  3.79 | ppl    44.412
| epoch 237 step   103150 |    254 batches | lr 2.57e-05 | ms/batch 324.98 | loss  3.84 | ppl    46.651
| epoch 237 step   103200 |    304 batches | lr 2.56e-05 | ms/batch 323.60 | loss  3.84 | ppl    46.397
----------------------------------------------------------------------------------------------------
| Eval 258 at step   103200 | time: 134.90s | valid loss  4.19 | valid ppl    66.205
----------------------------------------------------------------------------------------------------
| epoch 237 step   103250 |    354 batches | lr 2.55e-05 | ms/batch 433.60 | loss  3.74 | ppl    42.086
| epoch 237 step   103300 |    404 batches | lr 2.54e-05 | ms/batch 330.26 | loss  3.81 | ppl    45.168
| epoch 238 step   103350 |     18 batches | lr 2.53e-05 | ms/batch 321.88 | loss  3.84 | ppl    46.371
| epoch 238 step   103400 |     68 batches | lr 2.52e-05 | ms/batch 325.72 | loss  3.74 | ppl    41.934
| epoch 238 step   103450 |    118 batches | lr 2.5e-05 | ms/batch 326.05 | loss  3.79 | ppl    44.156
| epoch 238 step   103500 |    168 batches | lr 2.49e-05 | ms/batch 328.98 | loss  3.81 | ppl    45.029
| epoch 238 step   103550 |    218 batches | lr 2.48e-05 | ms/batch 336.85 | loss  3.84 | ppl    46.565
| epoch 238 step   103600 |    268 batches | lr 2.47e-05 | ms/batch 323.73 | loss  3.82 | ppl    45.526
----------------------------------------------------------------------------------------------------
| Eval 259 at step   103600 | time: 136.37s | valid loss  4.20 | valid ppl    66.566
----------------------------------------------------------------------------------------------------
| epoch 238 step   103650 |    318 batches | lr 2.46e-05 | ms/batch 434.10 | loss  3.81 | ppl    44.974
| epoch 238 step   103700 |    368 batches | lr 2.45e-05 | ms/batch 330.80 | loss  3.75 | ppl    42.319
| epoch 238 step   103750 |    418 batches | lr 2.44e-05 | ms/batch 329.42 | loss  3.80 | ppl    44.796
| epoch 239 step   103800 |     32 batches | lr 2.43e-05 | ms/batch 320.14 | loss  3.82 | ppl    45.763
| epoch 239 step   103850 |     82 batches | lr 2.41e-05 | ms/batch 325.31 | loss  3.76 | ppl    43.031
| epoch 239 step   103900 |    132 batches | lr 2.4e-05 | ms/batch 326.10 | loss  3.82 | ppl    45.515
| epoch 239 step   103950 |    182 batches | lr 2.39e-05 | ms/batch 325.69 | loss  3.80 | ppl    44.850
| epoch 239 step   104000 |    232 batches | lr 2.38e-05 | ms/batch 324.75 | loss  3.83 | ppl    45.948
----------------------------------------------------------------------------------------------------
| Eval 260 at step   104000 | time: 135.84s | valid loss  4.19 | valid ppl    66.265
----------------------------------------------------------------------------------------------------
| epoch 239 step   104050 |    282 batches | lr 2.37e-05 | ms/batch 433.69 | loss  3.82 | ppl    45.686
| epoch 239 step   104100 |    332 batches | lr 2.36e-05 | ms/batch 329.69 | loss  3.75 | ppl    42.430
| epoch 239 step   104150 |    382 batches | lr 2.35e-05 | ms/batch 328.71 | loss  3.79 | ppl    44.075
| epoch 239 step   104200 |    432 batches | lr 2.34e-05 | ms/batch 328.98 | loss  3.81 | ppl    45.168
| epoch 240 step   104250 |     46 batches | lr 2.33e-05 | ms/batch 319.47 | loss  3.77 | ppl    43.473
| epoch 240 step   104300 |     96 batches | lr 2.32e-05 | ms/batch 328.17 | loss  3.75 | ppl    42.495
| epoch 240 step   104350 |    146 batches | lr 2.3e-05 | ms/batch 340.82 | loss  3.81 | ppl    45.173
| epoch 240 step   104400 |    196 batches | lr 2.29e-05 | ms/batch 340.82 | loss  3.83 | ppl    46.104
----------------------------------------------------------------------------------------------------
| Eval 261 at step   104400 | time: 137.69s | valid loss  4.19 | valid ppl    66.100
----------------------------------------------------------------------------------------------------
| epoch 240 step   104450 |    246 batches | lr 2.28e-05 | ms/batch 450.21 | loss  3.83 | ppl    45.998
| epoch 240 step   104500 |    296 batches | lr 2.27e-05 | ms/batch 342.57 | loss  3.86 | ppl    47.319
| epoch 240 step   104550 |    346 batches | lr 2.26e-05 | ms/batch 328.90 | loss  3.72 | ppl    41.145
| epoch 240 step   104600 |    396 batches | lr 2.25e-05 | ms/batch 329.24 | loss  3.80 | ppl    44.490
| epoch 241 step   104650 |     10 batches | lr 2.24e-05 | ms/batch 321.09 | loss  3.83 | ppl    46.070
| epoch 241 step   104700 |     60 batches | lr 2.23e-05 | ms/batch 325.86 | loss  3.75 | ppl    42.529
| epoch 241 step   104750 |    110 batches | lr 2.22e-05 | ms/batch 325.29 | loss  3.76 | ppl    43.132
| epoch 241 step   104800 |    160 batches | lr 2.21e-05 | ms/batch 325.24 | loss  3.79 | ppl    44.381
----------------------------------------------------------------------------------------------------
| Eval 262 at step   104800 | time: 137.25s | valid loss  4.19 | valid ppl    66.122
----------------------------------------------------------------------------------------------------
| epoch 241 step   104850 |    210 batches | lr 2.2e-05 | ms/batch 432.63 | loss  3.78 | ppl    43.673
| epoch 241 step   104900 |    260 batches | lr 2.19e-05 | ms/batch 328.95 | loss  3.82 | ppl    45.661
| epoch 241 step   104950 |    310 batches | lr 2.18e-05 | ms/batch 329.10 | loss  3.81 | ppl    45.301
| epoch 241 step   105000 |    360 batches | lr 2.16e-05 | ms/batch 329.29 | loss  3.74 | ppl    41.996
| epoch 241 step   105050 |    410 batches | lr 2.15e-05 | ms/batch 329.26 | loss  3.80 | ppl    44.572
| epoch 242 step   105100 |     24 batches | lr 2.14e-05 | ms/batch 320.97 | loss  3.85 | ppl    46.936
| epoch 242 step   105150 |     74 batches | lr 2.13e-05 | ms/batch 326.19 | loss  3.76 | ppl    42.855
| epoch 242 step   105200 |    124 batches | lr 2.12e-05 | ms/batch 324.90 | loss  3.78 | ppl    43.642
----------------------------------------------------------------------------------------------------
| Eval 263 at step   105200 | time: 136.10s | valid loss  4.20 | valid ppl    66.650
----------------------------------------------------------------------------------------------------
| epoch 242 step   105250 |    174 batches | lr 2.11e-05 | ms/batch 434.19 | loss  3.81 | ppl    44.955
| epoch 242 step   105300 |    224 batches | lr 2.1e-05 | ms/batch 330.69 | loss  3.82 | ppl    45.604
| epoch 242 step   105350 |    274 batches | lr 2.09e-05 | ms/batch 331.73 | loss  3.84 | ppl    46.711
| epoch 242 step   105400 |    324 batches | lr 2.08e-05 | ms/batch 331.32 | loss  3.79 | ppl    44.341
| epoch 242 step   105450 |    374 batches | lr 2.07e-05 | ms/batch 329.79 | loss  3.77 | ppl    43.502
| epoch 242 step   105500 |    424 batches | lr 2.06e-05 | ms/batch 331.47 | loss  3.80 | ppl    44.544
| epoch 243 step   105550 |     38 batches | lr 2.05e-05 | ms/batch 319.79 | loss  3.81 | ppl    45.106
| epoch 243 step   105600 |     88 batches | lr 2.04e-05 | ms/batch 324.14 | loss  3.73 | ppl    41.733
----------------------------------------------------------------------------------------------------
| Eval 264 at step   105600 | time: 136.59s | valid loss  4.19 | valid ppl    66.281
----------------------------------------------------------------------------------------------------
| epoch 243 step   105650 |    138 batches | lr 2.03e-05 | ms/batch 431.36 | loss  3.78 | ppl    43.763
| epoch 243 step   105700 |    188 batches | lr 2.02e-05 | ms/batch 329.59 | loss  3.81 | ppl    45.279
| epoch 243 step   105750 |    238 batches | lr 2.01e-05 | ms/batch 331.27 | loss  3.82 | ppl    45.412
| epoch 243 step   105800 |    288 batches | lr 2e-05 | ms/batch 330.54 | loss  3.86 | ppl    47.515
| epoch 243 step   105850 |    338 batches | lr 1.99e-05 | ms/batch 330.73 | loss  3.69 | ppl    40.031
| epoch 243 step   105900 |    388 batches | lr 1.98e-05 | ms/batch 330.26 | loss  3.78 | ppl    43.869
| epoch 244 step   105950 |      2 batches | lr 1.97e-05 | ms/batch 323.90 | loss  3.83 | ppl    45.856
| epoch 244 step   106000 |     52 batches | lr 1.96e-05 | ms/batch 325.20 | loss  3.73 | ppl    41.819
----------------------------------------------------------------------------------------------------
| Eval 265 at step   106000 | time: 136.65s | valid loss  4.20 | valid ppl    66.481
----------------------------------------------------------------------------------------------------
| epoch 244 step   106050 |    102 batches | lr 1.95e-05 | ms/batch 432.42 | loss  3.75 | ppl    42.701
| epoch 244 step   106100 |    152 batches | lr 1.94e-05 | ms/batch 328.29 | loss  3.79 | ppl    44.468
| epoch 244 step   106150 |    202 batches | lr 1.93e-05 | ms/batch 328.68 | loss  3.79 | ppl    44.475
| epoch 244 step   106200 |    252 batches | lr 1.92e-05 | ms/batch 329.42 | loss  3.81 | ppl    45.202
| epoch 244 step   106250 |    302 batches | lr 1.91e-05 | ms/batch 328.85 | loss  3.84 | ppl    46.524
| epoch 244 step   106300 |    352 batches | lr 1.9e-05 | ms/batch 327.90 | loss  3.71 | ppl    40.661
| epoch 244 step   106350 |    402 batches | lr 1.89e-05 | ms/batch 328.77 | loss  3.81 | ppl    45.131
| epoch 245 step   106400 |     16 batches | lr 1.88e-05 | ms/batch 322.89 | loss  3.80 | ppl    44.881
----------------------------------------------------------------------------------------------------
| Eval 266 at step   106400 | time: 136.39s | valid loss  4.19 | valid ppl    66.088
----------------------------------------------------------------------------------------------------
| epoch 245 step   106450 |     66 batches | lr 1.87e-05 | ms/batch 467.40 | loss  3.73 | ppl    41.721
| epoch 245 step   106500 |    116 batches | lr 1.86e-05 | ms/batch 329.82 | loss  3.76 | ppl    43.081
| epoch 245 step   106550 |    166 batches | lr 1.85e-05 | ms/batch 331.19 | loss  3.77 | ppl    43.167
| epoch 245 step   106600 |    216 batches | lr 1.84e-05 | ms/batch 327.98 | loss  3.81 | ppl    44.936
| epoch 245 step   106650 |    266 batches | lr 1.83e-05 | ms/batch 329.12 | loss  3.84 | ppl    46.438
| epoch 245 step   106700 |    316 batches | lr 1.82e-05 | ms/batch 328.45 | loss  3.80 | ppl    44.590
| epoch 245 step   106750 |    366 batches | lr 1.81e-05 | ms/batch 328.60 | loss  3.74 | ppl    42.268
| epoch 245 step   106800 |    416 batches | lr 1.8e-05 | ms/batch 328.36 | loss  3.79 | ppl    44.312
----------------------------------------------------------------------------------------------------
| Eval 267 at step   106800 | time: 136.88s | valid loss  4.19 | valid ppl    66.266
----------------------------------------------------------------------------------------------------
| epoch 246 step   106850 |     30 batches | lr 1.79e-05 | ms/batch 424.10 | loss  3.81 | ppl    45.223
| epoch 246 step   106900 |     80 batches | lr 1.78e-05 | ms/batch 328.96 | loss  3.73 | ppl    41.478
| epoch 246 step   106950 |    130 batches | lr 1.77e-05 | ms/batch 328.76 | loss  3.78 | ppl    43.785
| epoch 246 step   107000 |    180 batches | lr 1.76e-05 | ms/batch 329.21 | loss  3.77 | ppl    43.460
| epoch 246 step   107050 |    230 batches | lr 1.75e-05 | ms/batch 328.66 | loss  3.79 | ppl    44.449
| epoch 246 step   107100 |    280 batches | lr 1.74e-05 | ms/batch 329.26 | loss  3.84 | ppl    46.542
| epoch 246 step   107150 |    330 batches | lr 1.73e-05 | ms/batch 329.49 | loss  3.75 | ppl    42.733
| epoch 246 step   107200 |    380 batches | lr 1.72e-05 | ms/batch 330.91 | loss  3.76 | ppl    42.982
----------------------------------------------------------------------------------------------------
| Eval 268 at step   107200 | time: 136.47s | valid loss  4.19 | valid ppl    66.010
----------------------------------------------------------------------------------------------------
| epoch 246 step   107250 |    430 batches | lr 1.71e-05 | ms/batch 462.55 | loss  3.80 | ppl    44.740
| epoch 247 step   107300 |     44 batches | lr 1.7e-05 | ms/batch 323.43 | loss  3.79 | ppl    44.094
| epoch 247 step   107350 |     94 batches | lr 1.69e-05 | ms/batch 328.82 | loss  3.75 | ppl    42.553
| epoch 247 step   107400 |    144 batches | lr 1.68e-05 | ms/batch 329.08 | loss  3.80 | ppl    44.909
| epoch 247 step   107450 |    194 batches | lr 1.67e-05 | ms/batch 329.84 | loss  3.80 | ppl    44.567
| epoch 247 step   107500 |    244 batches | lr 1.67e-05 | ms/batch 330.24 | loss  3.79 | ppl    44.179
| epoch 247 step   107550 |    294 batches | lr 1.66e-05 | ms/batch 330.42 | loss  3.84 | ppl    46.560
| epoch 247 step   107600 |    344 batches | lr 1.65e-05 | ms/batch 330.75 | loss  3.70 | ppl    40.419
----------------------------------------------------------------------------------------------------
| Eval 269 at step   107600 | time: 136.60s | valid loss  4.19 | valid ppl    66.198
----------------------------------------------------------------------------------------------------
| epoch 247 step   107650 |    394 batches | lr 1.64e-05 | ms/batch 429.85 | loss  3.78 | ppl    43.645
| epoch 248 step   107700 |      8 batches | lr 1.63e-05 | ms/batch 321.35 | loss  3.82 | ppl    45.489
| epoch 248 step   107750 |     58 batches | lr 1.62e-05 | ms/batch 332.38 | loss  3.75 | ppl    42.430
| epoch 248 step   107800 |    108 batches | lr 1.61e-05 | ms/batch 343.61 | loss  3.77 | ppl    43.489
| epoch 248 step   107850 |    158 batches | lr 1.6e-05 | ms/batch 331.36 | loss  3.79 | ppl    44.180
| epoch 248 step   107900 |    208 batches | lr 1.59e-05 | ms/batch 330.60 | loss  3.77 | ppl    43.552
| epoch 248 step   107950 |    258 batches | lr 1.58e-05 | ms/batch 329.87 | loss  3.83 | ppl    46.000
| epoch 248 step   108000 |    308 batches | lr 1.57e-05 | ms/batch 330.26 | loss  3.83 | ppl    46.176
----------------------------------------------------------------------------------------------------
| Eval 270 at step   108000 | time: 137.50s | valid loss  4.19 | valid ppl    66.051
----------------------------------------------------------------------------------------------------
| epoch 248 step   108050 |    358 batches | lr 1.56e-05 | ms/batch 429.64 | loss  3.75 | ppl    42.611
| epoch 248 step   108100 |    408 batches | lr 1.55e-05 | ms/batch 325.35 | loss  3.78 | ppl    43.849
| epoch 249 step   108150 |     22 batches | lr 1.55e-05 | ms/batch 320.10 | loss  3.83 | ppl    45.888
| epoch 249 step   108200 |     72 batches | lr 1.54e-05 | ms/batch 329.81 | loss  3.73 | ppl    41.718
| epoch 249 step   108250 |    122 batches | lr 1.53e-05 | ms/batch 331.09 | loss  3.79 | ppl    44.137
| epoch 249 step   108300 |    172 batches | lr 1.52e-05 | ms/batch 329.32 | loss  3.77 | ppl    43.575
| epoch 249 step   108350 |    222 batches | lr 1.51e-05 | ms/batch 328.76 | loss  3.81 | ppl    45.332
| epoch 249 step   108400 |    272 batches | lr 1.5e-05 | ms/batch 330.04 | loss  3.86 | ppl    47.638
----------------------------------------------------------------------------------------------------
| Eval 271 at step   108400 | time: 136.17s | valid loss  4.19 | valid ppl    66.301
----------------------------------------------------------------------------------------------------
| epoch 249 step   108450 |    322 batches | lr 1.49e-05 | ms/batch 427.71 | loss  3.78 | ppl    43.813
| epoch 249 step   108500 |    372 batches | lr 1.48e-05 | ms/batch 324.72 | loss  3.77 | ppl    43.387
| epoch 249 step   108550 |    422 batches | lr 1.47e-05 | ms/batch 324.75 | loss  3.78 | ppl    43.953
| epoch 250 step   108600 |     36 batches | lr 1.47e-05 | ms/batch 321.62 | loss  3.79 | ppl    44.404
| epoch 250 step   108650 |     86 batches | lr 1.46e-05 | ms/batch 329.53 | loss  3.72 | ppl    41.352
| epoch 250 step   108700 |    136 batches | lr 1.45e-05 | ms/batch 329.43 | loss  3.79 | ppl    44.272
| epoch 250 step   108750 |    186 batches | lr 1.44e-05 | ms/batch 330.20 | loss  3.78 | ppl    43.785
| epoch 250 step   108800 |    236 batches | lr 1.43e-05 | ms/batch 330.63 | loss  3.82 | ppl    45.610
----------------------------------------------------------------------------------------------------
| Eval 272 at step   108800 | time: 135.98s | valid loss  4.19 | valid ppl    66.336
----------------------------------------------------------------------------------------------------
| epoch 250 step   108850 |    286 batches | lr 1.42e-05 | ms/batch 429.19 | loss  3.86 | ppl    47.402
| epoch 250 step   108900 |    336 batches | lr 1.41e-05 | ms/batch 326.54 | loss  3.75 | ppl    42.490
| epoch 250 step   108950 |    386 batches | lr 1.4e-05 | ms/batch 326.00 | loss  3.79 | ppl    44.378
| epoch 250 step   109000 |    436 batches | lr 1.4e-05 | ms/batch 321.34 | loss  3.79 | ppl    44.184
| epoch 251 step   109050 |     50 batches | lr 1.39e-05 | ms/batch 338.67 | loss  3.79 | ppl    44.222
| epoch 251 step   109100 |    100 batches | lr 1.38e-05 | ms/batch 343.09 | loss  3.72 | ppl    41.390
| epoch 251 step   109150 |    150 batches | lr 1.37e-05 | ms/batch 344.15 | loss  3.79 | ppl    44.251
| epoch 251 step   109200 |    200 batches | lr 1.36e-05 | ms/batch 337.73 | loss  3.79 | ppl    44.089
----------------------------------------------------------------------------------------------------
| Eval 273 at step   109200 | time: 138.26s | valid loss  4.19 | valid ppl    66.101
----------------------------------------------------------------------------------------------------
| epoch 251 step   109250 |    250 batches | lr 1.35e-05 | ms/batch 427.33 | loss  3.80 | ppl    44.832
| epoch 251 step   109300 |    300 batches | lr 1.34e-05 | ms/batch 326.38 | loss  3.84 | ppl    46.675
| epoch 251 step   109350 |    350 batches | lr 1.34e-05 | ms/batch 326.36 | loss  3.71 | ppl    40.685
| epoch 251 step   109400 |    400 batches | lr 1.33e-05 | ms/batch 325.05 | loss  3.77 | ppl    43.463
| epoch 252 step   109450 |     14 batches | lr 1.32e-05 | ms/batch 319.14 | loss  3.84 | ppl    46.538
| epoch 252 step   109500 |     64 batches | lr 1.31e-05 | ms/batch 328.30 | loss  3.74 | ppl    42.101
| epoch 252 step   109550 |    114 batches | lr 1.3e-05 | ms/batch 329.35 | loss  3.76 | ppl    43.098
| epoch 252 step   109600 |    164 batches | lr 1.29e-05 | ms/batch 329.46 | loss  3.79 | ppl    44.298
----------------------------------------------------------------------------------------------------
| Eval 274 at step   109600 | time: 135.56s | valid loss  4.19 | valid ppl    66.063
----------------------------------------------------------------------------------------------------
| epoch 252 step   109650 |    214 batches | lr 1.29e-05 | ms/batch 426.36 | loss  3.81 | ppl    45.309
| epoch 252 step   109700 |    264 batches | lr 1.28e-05 | ms/batch 324.94 | loss  3.82 | ppl    45.391
| epoch 252 step   109750 |    314 batches | lr 1.27e-05 | ms/batch 325.14 | loss  3.81 | ppl    45.147
| epoch 252 step   109800 |    364 batches | lr 1.26e-05 | ms/batch 326.09 | loss  3.75 | ppl    42.483
| epoch 252 step   109850 |    414 batches | lr 1.25e-05 | ms/batch 325.03 | loss  3.77 | ppl    43.176
| epoch 253 step   109900 |     28 batches | lr 1.25e-05 | ms/batch 319.97 | loss  3.80 | ppl    44.579
| epoch 253 step   109950 |     78 batches | lr 1.24e-05 | ms/batch 330.14 | loss  3.72 | ppl    41.123
| epoch 253 step   110000 |    128 batches | lr 1.23e-05 | ms/batch 329.20 | loss  3.77 | ppl    43.292
----------------------------------------------------------------------------------------------------
| Eval 275 at step   110000 | time: 135.35s | valid loss  4.19 | valid ppl    66.149
----------------------------------------------------------------------------------------------------
| epoch 253 step   110050 |    178 batches | lr 1.22e-05 | ms/batch 427.41 | loss  3.76 | ppl    43.090
| epoch 253 step   110100 |    228 batches | lr 1.21e-05 | ms/batch 326.01 | loss  3.81 | ppl    45.157
| epoch 253 step   110150 |    278 batches | lr 1.2e-05 | ms/batch 327.09 | loss  3.83 | ppl    45.928
| epoch 253 step   110200 |    328 batches | lr 1.2e-05 | ms/batch 328.50 | loss  3.74 | ppl    42.004
| epoch 253 step   110250 |    378 batches | lr 1.19e-05 | ms/batch 327.91 | loss  3.77 | ppl    43.203
| epoch 253 step   110300 |    428 batches | lr 1.18e-05 | ms/batch 325.69 | loss  3.80 | ppl    44.595
| epoch 254 step   110350 |     42 batches | lr 1.17e-05 | ms/batch 324.61 | loss  3.78 | ppl    43.885
| epoch 254 step   110400 |     92 batches | lr 1.16e-05 | ms/batch 331.57 | loss  3.73 | ppl    41.669
----------------------------------------------------------------------------------------------------
| Eval 276 at step   110400 | time: 136.00s | valid loss  4.19 | valid ppl    66.196
----------------------------------------------------------------------------------------------------
| epoch 254 step   110450 |    142 batches | lr 1.16e-05 | ms/batch 430.99 | loss  3.78 | ppl    43.948
| epoch 254 step   110500 |    192 batches | lr 1.15e-05 | ms/batch 326.82 | loss  3.77 | ppl    43.404
| epoch 254 step   110550 |    242 batches | lr 1.14e-05 | ms/batch 327.42 | loss  3.81 | ppl    45.004
| epoch 254 step   110600 |    292 batches | lr 1.13e-05 | ms/batch 328.20 | loss  3.84 | ppl    46.400
| epoch 254 step   110650 |    342 batches | lr 1.13e-05 | ms/batch 327.89 | loss  3.70 | ppl    40.392
| epoch 254 step   110700 |    392 batches | lr 1.12e-05 | ms/batch 326.09 | loss  3.80 | ppl    44.537
| epoch 255 step   110750 |      6 batches | lr 1.11e-05 | ms/batch 322.15 | loss  3.82 | ppl    45.761
| epoch 255 step   110800 |     56 batches | lr 1.1e-05 | ms/batch 330.56 | loss  3.75 | ppl    42.563
----------------------------------------------------------------------------------------------------
| Eval 277 at step   110800 | time: 135.99s | valid loss  4.19 | valid ppl    66.302
----------------------------------------------------------------------------------------------------
| epoch 255 step   110850 |    106 batches | lr 1.1e-05 | ms/batch 430.10 | loss  3.75 | ppl    42.596
| epoch 255 step   110900 |    156 batches | lr 1.09e-05 | ms/batch 325.50 | loss  3.81 | ppl    45.055
| epoch 255 step   110950 |    206 batches | lr 1.08e-05 | ms/batch 326.30 | loss  3.79 | ppl    44.139
| epoch 255 step   111000 |    256 batches | lr 1.07e-05 | ms/batch 327.05 | loss  3.81 | ppl    45.083
| epoch 255 step   111050 |    306 batches | lr 1.06e-05 | ms/batch 325.77 | loss  3.82 | ppl    45.604
| epoch 255 step   111100 |    356 batches | lr 1.06e-05 | ms/batch 324.47 | loss  3.72 | ppl    41.394
| epoch 255 step   111150 |    406 batches | lr 1.05e-05 | ms/batch 327.47 | loss  3.76 | ppl    43.096
| epoch 256 step   111200 |     20 batches | lr 1.04e-05 | ms/batch 320.52 | loss  3.81 | ppl    44.939
----------------------------------------------------------------------------------------------------
| Eval 278 at step   111200 | time: 135.35s | valid loss  4.19 | valid ppl    65.995
----------------------------------------------------------------------------------------------------
| epoch 256 step   111250 |     70 batches | lr 1.03e-05 | ms/batch 464.63 | loss  3.74 | ppl    41.916
| epoch 256 step   111300 |    120 batches | lr 1.03e-05 | ms/batch 325.54 | loss  3.79 | ppl    44.184
| epoch 256 step   111350 |    170 batches | lr 1.02e-05 | ms/batch 324.69 | loss  3.79 | ppl    44.046
| epoch 256 step   111400 |    220 batches | lr 1.01e-05 | ms/batch 325.58 | loss  3.80 | ppl    44.797
| epoch 256 step   111450 |    270 batches | lr 1e-05 | ms/batch 324.91 | loss  3.81 | ppl    45.214
| epoch 256 step   111500 |    320 batches | lr 9.98e-06 | ms/batch 325.39 | loss  3.78 | ppl    43.897
| epoch 256 step   111550 |    370 batches | lr 9.9e-06 | ms/batch 325.12 | loss  3.74 | ppl    42.180
| epoch 256 step   111600 |    420 batches | lr 9.83e-06 | ms/batch 336.20 | loss  3.78 | ppl    43.601
----------------------------------------------------------------------------------------------------
| Eval 279 at step   111600 | time: 136.10s | valid loss  4.19 | valid ppl    66.088
----------------------------------------------------------------------------------------------------
| epoch 257 step   111650 |     34 batches | lr 9.76e-06 | ms/batch 445.04 | loss  3.80 | ppl    44.694
| epoch 257 step   111700 |     84 batches | lr 9.69e-06 | ms/batch 341.51 | loss  3.74 | ppl    42.093
| epoch 257 step   111750 |    134 batches | lr 9.61e-06 | ms/batch 336.27 | loss  3.78 | ppl    43.831
| epoch 257 step   111800 |    184 batches | lr 9.54e-06 | ms/batch 325.32 | loss  3.81 | ppl    44.950
| epoch 257 step   111850 |    234 batches | lr 9.47e-06 | ms/batch 326.05 | loss  3.79 | ppl    44.456
| epoch 257 step   111900 |    284 batches | lr 9.4e-06 | ms/batch 325.57 | loss  3.81 | ppl    45.334
| epoch 257 step   111950 |    334 batches | lr 9.33e-06 | ms/batch 326.29 | loss  3.73 | ppl    41.556
| epoch 257 step   112000 |    384 batches | lr 9.26e-06 | ms/batch 326.10 | loss  3.80 | ppl    44.677
----------------------------------------------------------------------------------------------------
| Eval 280 at step   112000 | time: 137.36s | valid loss  4.19 | valid ppl    66.053
----------------------------------------------------------------------------------------------------
| epoch 257 step   112050 |    434 batches | lr 9.19e-06 | ms/batch 435.17 | loss  3.82 | ppl    45.813
| epoch 258 step   112100 |     48 batches | lr 9.12e-06 | ms/batch 321.78 | loss  3.75 | ppl    42.634
| epoch 258 step   112150 |     98 batches | lr 9.05e-06 | ms/batch 326.59 | loss  3.71 | ppl    40.787
| epoch 258 step   112200 |    148 batches | lr 8.98e-06 | ms/batch 326.78 | loss  3.79 | ppl    44.369
| epoch 258 step   112250 |    198 batches | lr 8.91e-06 | ms/batch 325.98 | loss  3.79 | ppl    44.084
| epoch 258 step   112300 |    248 batches | lr 8.84e-06 | ms/batch 326.40 | loss  3.82 | ppl    45.513
| epoch 258 step   112350 |    298 batches | lr 8.77e-06 | ms/batch 326.12 | loss  3.85 | ppl    46.964
| epoch 258 step   112400 |    348 batches | lr 8.7e-06 | ms/batch 326.53 | loss  3.71 | ppl    40.679
----------------------------------------------------------------------------------------------------
| Eval 281 at step   112400 | time: 135.78s | valid loss  4.19 | valid ppl    66.048
----------------------------------------------------------------------------------------------------
| epoch 258 step   112450 |    398 batches | lr 8.63e-06 | ms/batch 432.64 | loss  3.78 | ppl    43.765
| epoch 259 step   112500 |     12 batches | lr 8.57e-06 | ms/batch 322.09 | loss  3.82 | ppl    45.772
| epoch 259 step   112550 |     62 batches | lr 8.5e-06 | ms/batch 324.73 | loss  3.71 | ppl    41.010
| epoch 259 step   112600 |    112 batches | lr 8.43e-06 | ms/batch 324.48 | loss  3.76 | ppl    43.037
| epoch 259 step   112650 |    162 batches | lr 8.36e-06 | ms/batch 325.03 | loss  3.78 | ppl    43.830
| epoch 259 step   112700 |    212 batches | lr 8.3e-06 | ms/batch 325.31 | loss  3.78 | ppl    43.883
| epoch 259 step   112750 |    262 batches | lr 8.23e-06 | ms/batch 325.36 | loss  3.80 | ppl    44.818
| epoch 259 step   112800 |    312 batches | lr 8.16e-06 | ms/batch 323.71 | loss  3.80 | ppl    44.771
----------------------------------------------------------------------------------------------------
| Eval 282 at step   112800 | time: 135.15s | valid loss  4.19 | valid ppl    66.099
----------------------------------------------------------------------------------------------------
| epoch 259 step   112850 |    362 batches | lr 8.1e-06 | ms/batch 432.29 | loss  3.73 | ppl    41.795
| epoch 259 step   112900 |    412 batches | lr 8.03e-06 | ms/batch 329.57 | loss  3.77 | ppl    43.562
| epoch 260 step   112950 |     26 batches | lr 7.96e-06 | ms/batch 320.43 | loss  3.81 | ppl    45.179
| epoch 260 step   113000 |     76 batches | lr 7.9e-06 | ms/batch 325.68 | loss  3.74 | ppl    42.304
| epoch 260 step   113050 |    126 batches | lr 7.83e-06 | ms/batch 325.92 | loss  3.79 | ppl    44.423
| epoch 260 step   113100 |    176 batches | lr 7.77e-06 | ms/batch 325.48 | loss  3.80 | ppl    44.548
| epoch 260 step   113150 |    226 batches | lr 7.7e-06 | ms/batch 326.58 | loss  3.80 | ppl    44.537
| epoch 260 step   113200 |    276 batches | lr 7.64e-06 | ms/batch 325.43 | loss  3.84 | ppl    46.522
----------------------------------------------------------------------------------------------------
| Eval 283 at step   113200 | time: 135.59s | valid loss  4.19 | valid ppl    66.124
----------------------------------------------------------------------------------------------------
| epoch 260 step   113250 |    326 batches | lr 7.58e-06 | ms/batch 434.44 | loss  3.75 | ppl    42.591
| epoch 260 step   113300 |    376 batches | lr 7.51e-06 | ms/batch 330.99 | loss  3.77 | ppl    43.208
| epoch 260 step   113350 |    426 batches | lr 7.45e-06 | ms/batch 329.27 | loss  3.81 | ppl    45.038
| epoch 261 step   113400 |     40 batches | lr 7.38e-06 | ms/batch 318.78 | loss  3.77 | ppl    43.367
| epoch 261 step   113450 |     90 batches | lr 7.32e-06 | ms/batch 326.31 | loss  3.73 | ppl    41.591
| epoch 261 step   113500 |    140 batches | lr 7.26e-06 | ms/batch 327.62 | loss  3.75 | ppl    42.478
| epoch 261 step   113550 |    190 batches | lr 7.2e-06 | ms/batch 324.86 | loss  3.79 | ppl    44.260
| epoch 261 step   113600 |    240 batches | lr 7.13e-06 | ms/batch 324.45 | loss  3.82 | ppl    45.656
----------------------------------------------------------------------------------------------------
| Eval 284 at step   113600 | time: 135.83s | valid loss  4.19 | valid ppl    66.124
----------------------------------------------------------------------------------------------------
| epoch 261 step   113650 |    290 batches | lr 7.07e-06 | ms/batch 433.11 | loss  3.85 | ppl    47.032
| epoch 261 step   113700 |    340 batches | lr 7.01e-06 | ms/batch 329.23 | loss  3.68 | ppl    39.555
| epoch 261 step   113750 |    390 batches | lr 6.95e-06 | ms/batch 329.30 | loss  3.78 | ppl    43.698
| epoch 262 step   113800 |      4 batches | lr 6.89e-06 | ms/batch 322.82 | loss  3.82 | ppl    45.496
| epoch 262 step   113850 |     54 batches | lr 6.83e-06 | ms/batch 324.45 | loss  3.77 | ppl    43.189
| epoch 262 step   113900 |    104 batches | lr 6.77e-06 | ms/batch 325.77 | loss  3.76 | ppl    42.843
| epoch 262 step   113950 |    154 batches | lr 6.71e-06 | ms/batch 325.69 | loss  3.78 | ppl    43.804
| epoch 262 step   114000 |    204 batches | lr 6.65e-06 | ms/batch 326.54 | loss  3.79 | ppl    44.466
----------------------------------------------------------------------------------------------------
| Eval 285 at step   114000 | time: 135.86s | valid loss  4.19 | valid ppl    66.001
----------------------------------------------------------------------------------------------------
| epoch 262 step   114050 |    254 batches | lr 6.59e-06 | ms/batch 433.47 | loss  3.82 | ppl    45.758
| epoch 262 step   114100 |    304 batches | lr 6.53e-06 | ms/batch 329.18 | loss  3.82 | ppl    45.754
| epoch 262 step   114150 |    354 batches | lr 6.47e-06 | ms/batch 330.88 | loss  3.72 | ppl    41.189
| epoch 262 step   114200 |    404 batches | lr 6.41e-06 | ms/batch 330.54 | loss  3.78 | ppl    43.879
| epoch 263 step   114250 |     18 batches | lr 6.35e-06 | ms/batch 320.50 | loss  3.77 | ppl    43.499
| epoch 263 step   114300 |     68 batches | lr 6.29e-06 | ms/batch 325.03 | loss  3.73 | ppl    41.500
| epoch 263 step   114350 |    118 batches | lr 6.23e-06 | ms/batch 325.14 | loss  3.75 | ppl    42.546
| epoch 263 step   114400 |    168 batches | lr 6.17e-06 | ms/batch 325.14 | loss  3.78 | ppl    43.879
----------------------------------------------------------------------------------------------------
| Eval 286 at step   114400 | time: 136.01s | valid loss  4.19 | valid ppl    66.020
----------------------------------------------------------------------------------------------------
| epoch 263 step   114450 |    218 batches | lr 6.12e-06 | ms/batch 433.65 | loss  3.81 | ppl    45.133
| epoch 263 step   114500 |    268 batches | lr 6.06e-06 | ms/batch 328.87 | loss  3.83 | ppl    46.145
| epoch 263 step   114550 |    318 batches | lr 6e-06 | ms/batch 330.70 | loss  3.77 | ppl    43.580
| epoch 263 step   114600 |    368 batches | lr 5.94e-06 | ms/batch 328.92 | loss  3.74 | ppl    42.072
| epoch 263 step   114650 |    418 batches | lr 5.89e-06 | ms/batch 328.82 | loss  3.78 | ppl    43.823
| epoch 264 step   114700 |     32 batches | lr 5.83e-06 | ms/batch 320.47 | loss  3.80 | ppl    44.699
| epoch 264 step   114750 |     82 batches | lr 5.77e-06 | ms/batch 324.42 | loss  3.75 | ppl    42.415
| epoch 264 step   114800 |    132 batches | lr 5.72e-06 | ms/batch 323.34 | loss  3.78 | ppl    43.988
----------------------------------------------------------------------------------------------------
| Eval 287 at step   114800 | time: 135.94s | valid loss  4.19 | valid ppl    66.025
----------------------------------------------------------------------------------------------------
| epoch 264 step   114850 |    182 batches | lr 5.66e-06 | ms/batch 433.11 | loss  3.78 | ppl    43.676
| epoch 264 step   114900 |    232 batches | lr 5.61e-06 | ms/batch 328.22 | loss  3.80 | ppl    44.899
| epoch 264 step   114950 |    282 batches | lr 5.55e-06 | ms/batch 329.58 | loss  3.80 | ppl    44.869
| epoch 264 step   115000 |    332 batches | lr 5.5e-06 | ms/batch 329.54 | loss  3.72 | ppl    41.298
| epoch 264 step   115050 |    382 batches | lr 5.44e-06 | ms/batch 329.25 | loss  3.75 | ppl    42.576
| epoch 264 step   115100 |    432 batches | lr 5.39e-06 | ms/batch 329.05 | loss  3.81 | ppl    45.272
| epoch 265 step   115150 |     46 batches | lr 5.34e-06 | ms/batch 319.10 | loss  3.75 | ppl    42.613
| epoch 265 step   115200 |     96 batches | lr 5.28e-06 | ms/batch 326.01 | loss  3.72 | ppl    41.137
----------------------------------------------------------------------------------------------------
| Eval 288 at step   115200 | time: 136.20s | valid loss  4.19 | valid ppl    66.074
----------------------------------------------------------------------------------------------------
| epoch 265 step   115250 |    146 batches | lr 5.23e-06 | ms/batch 436.25 | loss  3.76 | ppl    43.147
| epoch 265 step   115300 |    196 batches | lr 5.17e-06 | ms/batch 329.99 | loss  3.78 | ppl    43.957
| epoch 265 step   115350 |    246 batches | lr 5.12e-06 | ms/batch 329.26 | loss  3.81 | ppl    45.216
| epoch 265 step   115400 |    296 batches | lr 5.07e-06 | ms/batch 328.96 | loss  3.87 | ppl    47.929
| epoch 265 step   115450 |    346 batches | lr 5.02e-06 | ms/batch 328.24 | loss  3.69 | ppl    39.901
| epoch 265 step   115500 |    396 batches | lr 4.96e-06 | ms/batch 329.03 | loss  3.78 | ppl    43.766
| epoch 266 step   115550 |     10 batches | lr 4.91e-06 | ms/batch 322.07 | loss  3.81 | ppl    45.332
| epoch 266 step   115600 |     60 batches | lr 4.86e-06 | ms/batch 325.27 | loss  3.75 | ppl    42.311
----------------------------------------------------------------------------------------------------
| Eval 289 at step   115600 | time: 136.44s | valid loss  4.19 | valid ppl    66.041
----------------------------------------------------------------------------------------------------
| epoch 266 step   115650 |    110 batches | lr 4.81e-06 | ms/batch 433.41 | loss  3.76 | ppl    42.900
| epoch 266 step   115700 |    160 batches | lr 4.76e-06 | ms/batch 329.26 | loss  3.78 | ppl    43.903
| epoch 266 step   115750 |    210 batches | lr 4.71e-06 | ms/batch 329.22 | loss  3.79 | ppl    44.237
| epoch 266 step   115800 |    260 batches | lr 4.66e-06 | ms/batch 329.56 | loss  3.82 | ppl    45.453
| epoch 266 step   115850 |    310 batches | lr 4.61e-06 | ms/batch 331.43 | loss  3.79 | ppl    44.187
| epoch 266 step   115900 |    360 batches | lr 4.56e-06 | ms/batch 329.90 | loss  3.74 | ppl    42.098
| epoch 266 step   115950 |    410 batches | lr 4.51e-06 | ms/batch 329.20 | loss  3.78 | ppl    43.632
| epoch 267 step   116000 |     24 batches | lr 4.46e-06 | ms/batch 323.32 | loss  3.81 | ppl    45.108
----------------------------------------------------------------------------------------------------
| Eval 290 at step   116000 | time: 136.78s | valid loss  4.19 | valid ppl    65.952
----------------------------------------------------------------------------------------------------
| epoch 267 step   116050 |     74 batches | lr 4.41e-06 | ms/batch 466.15 | loss  3.73 | ppl    41.774
| epoch 267 step   116100 |    124 batches | lr 4.36e-06 | ms/batch 331.20 | loss  3.77 | ppl    43.228
| epoch 267 step   116150 |    174 batches | lr 4.31e-06 | ms/batch 330.06 | loss  3.77 | ppl    43.550
| epoch 267 step   116200 |    224 batches | lr 4.26e-06 | ms/batch 328.79 | loss  3.78 | ppl    43.693
| epoch 267 step   116250 |    274 batches | lr 4.21e-06 | ms/batch 330.46 | loss  3.82 | ppl    45.690
| epoch 267 step   116300 |    324 batches | lr 4.17e-06 | ms/batch 330.77 | loss  3.76 | ppl    43.019
| epoch 267 step   116350 |    374 batches | lr 4.12e-06 | ms/batch 329.85 | loss  3.77 | ppl    43.328
| epoch 267 step   116400 |    424 batches | lr 4.07e-06 | ms/batch 329.18 | loss  3.79 | ppl    44.248
----------------------------------------------------------------------------------------------------
| Eval 291 at step   116400 | time: 137.18s | valid loss  4.19 | valid ppl    65.912
----------------------------------------------------------------------------------------------------
| epoch 268 step   116450 |     38 batches | lr 4.02e-06 | ms/batch 459.64 | loss  3.80 | ppl    44.536
| epoch 268 step   116500 |     88 batches | lr 3.98e-06 | ms/batch 329.34 | loss  3.72 | ppl    41.260
| epoch 268 step   116550 |    138 batches | lr 3.93e-06 | ms/batch 329.21 | loss  3.80 | ppl    44.508
| epoch 268 step   116600 |    188 batches | lr 3.89e-06 | ms/batch 328.84 | loss  3.78 | ppl    43.914
| epoch 268 step   116650 |    238 batches | lr 3.84e-06 | ms/batch 329.63 | loss  3.79 | ppl    44.255
| epoch 268 step   116700 |    288 batches | lr 3.79e-06 | ms/batch 329.16 | loss  3.86 | ppl    47.286
| epoch 268 step   116750 |    338 batches | lr 3.75e-06 | ms/batch 329.86 | loss  3.71 | ppl    40.656
| epoch 268 step   116800 |    388 batches | lr 3.7e-06 | ms/batch 330.79 | loss  3.79 | ppl    44.156
----------------------------------------------------------------------------------------------------
| Eval 292 at step   116800 | time: 136.67s | valid loss  4.19 | valid ppl    66.042
----------------------------------------------------------------------------------------------------
| epoch 269 step   116850 |      2 batches | lr 3.66e-06 | ms/batch 424.45 | loss  3.80 | ppl    44.876
| epoch 269 step   116900 |     52 batches | lr 3.61e-06 | ms/batch 331.68 | loss  3.75 | ppl    42.518
| epoch 269 step   116950 |    102 batches | lr 3.57e-06 | ms/batch 331.30 | loss  3.73 | ppl    41.850
| epoch 269 step   117000 |    152 batches | lr 3.53e-06 | ms/batch 329.75 | loss  3.78 | ppl    43.743
| epoch 269 step   117050 |    202 batches | lr 3.48e-06 | ms/batch 331.19 | loss  3.78 | ppl    43.945
| epoch 269 step   117100 |    252 batches | lr 3.44e-06 | ms/batch 330.65 | loss  3.81 | ppl    45.274
| epoch 269 step   117150 |    302 batches | lr 3.39e-06 | ms/batch 330.19 | loss  3.83 | ppl    45.965
| epoch 269 step   117200 |    352 batches | lr 3.35e-06 | ms/batch 332.20 | loss  3.70 | ppl    40.493
----------------------------------------------------------------------------------------------------
| Eval 293 at step   117200 | time: 137.29s | valid loss  4.19 | valid ppl    65.989
----------------------------------------------------------------------------------------------------
| epoch 269 step   117250 |    402 batches | lr 3.31e-06 | ms/batch 446.79 | loss  3.80 | ppl    44.659
| epoch 270 step   117300 |     16 batches | lr 3.27e-06 | ms/batch 334.58 | loss  3.82 | ppl    45.570
| epoch 270 step   117350 |     66 batches | lr 3.22e-06 | ms/batch 337.06 | loss  3.73 | ppl    41.715
| epoch 270 step   117400 |    116 batches | lr 3.18e-06 | ms/batch 327.85 | loss  3.78 | ppl    43.722
| epoch 270 step   117450 |    166 batches | lr 3.14e-06 | ms/batch 329.10 | loss  3.80 | ppl    44.881
| epoch 270 step   117500 |    216 batches | lr 3.1e-06 | ms/batch 329.13 | loss  3.79 | ppl    44.383
| epoch 270 step   117550 |    266 batches | lr 3.06e-06 | ms/batch 329.80 | loss  3.81 | ppl    45.073
| epoch 270 step   117600 |    316 batches | lr 3.02e-06 | ms/batch 330.71 | loss  3.79 | ppl    44.224
----------------------------------------------------------------------------------------------------
| Eval 294 at step   117600 | time: 138.04s | valid loss  4.19 | valid ppl    65.968
----------------------------------------------------------------------------------------------------
| epoch 270 step   117650 |    366 batches | lr 2.98e-06 | ms/batch 428.44 | loss  3.73 | ppl    41.831
| epoch 270 step   117700 |    416 batches | lr 2.94e-06 | ms/batch 325.28 | loss  3.77 | ppl    43.182
| epoch 271 step   117750 |     30 batches | lr 2.9e-06 | ms/batch 322.82 | loss  3.84 | ppl    46.413
| epoch 271 step   117800 |     80 batches | lr 2.86e-06 | ms/batch 329.20 | loss  3.73 | ppl    41.754
| epoch 271 step   117850 |    130 batches | lr 2.82e-06 | ms/batch 329.42 | loss  3.78 | ppl    43.826
| epoch 271 step   117900 |    180 batches | lr 2.78e-06 | ms/batch 329.43 | loss  3.78 | ppl    43.666
| epoch 271 step   117950 |    230 batches | lr 2.74e-06 | ms/batch 330.38 | loss  3.79 | ppl    44.385
| epoch 271 step   118000 |    280 batches | lr 2.7e-06 | ms/batch 330.26 | loss  3.83 | ppl    46.147
----------------------------------------------------------------------------------------------------
| Eval 295 at step   118000 | time: 136.25s | valid loss  4.19 | valid ppl    65.947
----------------------------------------------------------------------------------------------------
| epoch 271 step   118050 |    330 batches | lr 2.66e-06 | ms/batch 429.76 | loss  3.71 | ppl    41.002
| epoch 271 step   118100 |    380 batches | lr 2.62e-06 | ms/batch 327.46 | loss  3.77 | ppl    43.378
| epoch 271 step   118150 |    430 batches | lr 2.59e-06 | ms/batch 326.55 | loss  3.82 | ppl    45.441
| epoch 272 step   118200 |     44 batches | lr 2.55e-06 | ms/batch 323.94 | loss  3.75 | ppl    42.583
| epoch 272 step   118250 |     94 batches | lr 2.51e-06 | ms/batch 331.40 | loss  3.74 | ppl    42.222
| epoch 272 step   118300 |    144 batches | lr 2.48e-06 | ms/batch 329.93 | loss  3.77 | ppl    43.460
| epoch 272 step   118350 |    194 batches | lr 2.44e-06 | ms/batch 331.51 | loss  3.77 | ppl    43.426
| epoch 272 step   118400 |    244 batches | lr 2.4e-06 | ms/batch 331.65 | loss  3.79 | ppl    44.430
----------------------------------------------------------------------------------------------------
| Eval 296 at step   118400 | time: 136.62s | valid loss  4.19 | valid ppl    65.968
----------------------------------------------------------------------------------------------------
| epoch 272 step   118450 |    294 batches | lr 2.37e-06 | ms/batch 430.74 | loss  3.85 | ppl    46.964
| epoch 272 step   118500 |    344 batches | lr 2.33e-06 | ms/batch 325.97 | loss  3.70 | ppl    40.575
| epoch 272 step   118550 |    394 batches | lr 2.29e-06 | ms/batch 324.25 | loss  3.78 | ppl    43.962
| epoch 273 step   118600 |      8 batches | lr 2.26e-06 | ms/batch 319.76 | loss  3.82 | ppl    45.818
| epoch 273 step   118650 |     58 batches | lr 2.22e-06 | ms/batch 332.08 | loss  3.76 | ppl    42.948
| epoch 273 step   118700 |    108 batches | lr 2.19e-06 | ms/batch 330.45 | loss  3.75 | ppl    42.574
| epoch 273 step   118750 |    158 batches | lr 2.15e-06 | ms/batch 332.34 | loss  3.80 | ppl    44.541
| epoch 273 step   118800 |    208 batches | lr 2.12e-06 | ms/batch 329.34 | loss  3.77 | ppl    43.598
----------------------------------------------------------------------------------------------------
| Eval 297 at step   118800 | time: 136.22s | valid loss  4.19 | valid ppl    65.948
----------------------------------------------------------------------------------------------------
| epoch 273 step   118850 |    258 batches | lr 2.09e-06 | ms/batch 427.02 | loss  3.82 | ppl    45.513
| epoch 273 step   118900 |    308 batches | lr 2.05e-06 | ms/batch 324.06 | loss  3.81 | ppl    45.089
| epoch 273 step   118950 |    358 batches | lr 2.02e-06 | ms/batch 325.15 | loss  3.73 | ppl    41.744
| epoch 273 step   119000 |    408 batches | lr 1.99e-06 | ms/batch 325.76 | loss  3.79 | ppl    44.466
| epoch 274 step   119050 |     22 batches | lr 1.95e-06 | ms/batch 320.81 | loss  3.81 | ppl    45.061
| epoch 274 step   119100 |     72 batches | lr 1.92e-06 | ms/batch 328.47 | loss  3.71 | ppl    41.009
| epoch 274 step   119150 |    122 batches | lr 1.89e-06 | ms/batch 330.13 | loss  3.78 | ppl    43.654
| epoch 274 step   119200 |    172 batches | lr 1.86e-06 | ms/batch 329.64 | loss  3.76 | ppl    42.945
----------------------------------------------------------------------------------------------------
| Eval 298 at step   119200 | time: 135.55s | valid loss  4.19 | valid ppl    65.984
----------------------------------------------------------------------------------------------------
| epoch 274 step   119250 |    222 batches | lr 1.82e-06 | ms/batch 426.93 | loss  3.80 | ppl    44.687
| epoch 274 step   119300 |    272 batches | lr 1.79e-06 | ms/batch 324.58 | loss  3.80 | ppl    44.652
| epoch 274 step   119350 |    322 batches | lr 1.76e-06 | ms/batch 324.42 | loss  3.75 | ppl    42.579
| epoch 274 step   119400 |    372 batches | lr 1.73e-06 | ms/batch 324.99 | loss  3.78 | ppl    43.823
| epoch 274 step   119450 |    422 batches | lr 1.7e-06 | ms/batch 325.28 | loss  3.78 | ppl    43.719
| epoch 275 step   119500 |     36 batches | lr 1.67e-06 | ms/batch 319.76 | loss  3.77 | ppl    43.516
| epoch 275 step   119550 |     86 batches | lr 1.64e-06 | ms/batch 329.28 | loss  3.72 | ppl    41.094
| epoch 275 step   119600 |    136 batches | lr 1.61e-06 | ms/batch 329.57 | loss  3.76 | ppl    43.006
----------------------------------------------------------------------------------------------------
| Eval 299 at step   119600 | time: 135.26s | valid loss  4.19 | valid ppl    65.988
----------------------------------------------------------------------------------------------------
| epoch 275 step   119650 |    186 batches | lr 1.58e-06 | ms/batch 428.97 | loss  3.76 | ppl    42.962
| epoch 275 step   119700 |    236 batches | lr 1.55e-06 | ms/batch 326.85 | loss  3.77 | ppl    43.555
| epoch 275 step   119750 |    286 batches | lr 1.52e-06 | ms/batch 325.78 | loss  3.83 | ppl    45.940
| epoch 275 step   119800 |    336 batches | lr 1.49e-06 | ms/batch 328.77 | loss  3.70 | ppl    40.291
| epoch 275 step   119850 |    386 batches | lr 1.46e-06 | ms/batch 326.87 | loss  3.77 | ppl    43.587
| epoch 275 step   119900 |    436 batches | lr 1.44e-06 | ms/batch 321.41 | loss  3.80 | ppl    44.768
| epoch 276 step   119950 |     50 batches | lr 1.41e-06 | ms/batch 328.12 | loss  3.75 | ppl    42.604
| epoch 276 step   120000 |    100 batches | lr 1.38e-06 | ms/batch 330.30 | loss  3.72 | ppl    41.318
----------------------------------------------------------------------------------------------------
| Eval 300 at step   120000 | time: 135.85s | valid loss  4.19 | valid ppl    66.029
----------------------------------------------------------------------------------------------------
| epoch 276 step   120050 |    150 batches | lr 1.35e-06 | ms/batch 444.37 | loss  3.76 | ppl    43.014
| epoch 276 step   120100 |    200 batches | lr 1.33e-06 | ms/batch 341.35 | loss  3.78 | ppl    43.763
| epoch 276 step   120150 |    250 batches | lr 1.3e-06 | ms/batch 339.84 | loss  3.80 | ppl    44.762
| epoch 276 step   120200 |    300 batches | lr 1.27e-06 | ms/batch 341.38 | loss  3.81 | ppl    45.271
| epoch 276 step   120250 |    350 batches | lr 1.25e-06 | ms/batch 341.58 | loss  3.70 | ppl    40.443
| epoch 276 step   120300 |    400 batches | lr 1.22e-06 | ms/batch 333.60 | loss  3.78 | ppl    43.710
| epoch 277 step   120350 |     14 batches | lr 1.19e-06 | ms/batch 320.57 | loss  3.80 | ppl    44.848
| epoch 277 step   120400 |     64 batches | lr 1.17e-06 | ms/batch 328.87 | loss  3.72 | ppl    41.426
----------------------------------------------------------------------------------------------------
| Eval 301 at step   120400 | time: 139.58s | valid loss  4.19 | valid ppl    66.013
----------------------------------------------------------------------------------------------------
| epoch 277 step   120450 |    114 batches | lr 1.14e-06 | ms/batch 427.55 | loss  3.76 | ppl    42.974
| epoch 277 step   120500 |    164 batches | lr 1.12e-06 | ms/batch 325.32 | loss  3.77 | ppl    43.412
| epoch 277 step   120550 |    214 batches | lr 1.09e-06 | ms/batch 323.98 | loss  3.78 | ppl    44.001
| epoch 277 step   120600 |    264 batches | lr 1.07e-06 | ms/batch 324.68 | loss  3.80 | ppl    44.817
| epoch 277 step   120650 |    314 batches | lr 1.04e-06 | ms/batch 324.91 | loss  3.77 | ppl    43.404
| epoch 277 step   120700 |    364 batches | lr 1.02e-06 | ms/batch 325.55 | loss  3.72 | ppl    41.266
| epoch 277 step   120750 |    414 batches | lr 9.97e-07 | ms/batch 325.95 | loss  3.76 | ppl    42.972
| epoch 278 step   120800 |     28 batches | lr 9.74e-07 | ms/batch 322.88 | loss  3.82 | ppl    45.467
----------------------------------------------------------------------------------------------------
| Eval 302 at step   120800 | time: 135.05s | valid loss  4.19 | valid ppl    65.955
----------------------------------------------------------------------------------------------------
| epoch 278 step   120850 |     78 batches | lr 9.51e-07 | ms/batch 428.66 | loss  3.73 | ppl    41.573
| epoch 278 step   120900 |    128 batches | lr 9.28e-07 | ms/batch 324.77 | loss  3.79 | ppl    44.277
| epoch 278 step   120950 |    178 batches | lr 9.06e-07 | ms/batch 324.17 | loss  3.77 | ppl    43.570
| epoch 278 step   121000 |    228 batches | lr 8.84e-07 | ms/batch 324.77 | loss  3.78 | ppl    43.962
| epoch 278 step   121050 |    278 batches | lr 8.62e-07 | ms/batch 323.05 | loss  3.80 | ppl    44.764
| epoch 278 step   121100 |    328 batches | lr 8.4e-07 | ms/batch 323.58 | loss  3.76 | ppl    42.831
| epoch 278 step   121150 |    378 batches | lr 8.19e-07 | ms/batch 323.74 | loss  3.78 | ppl    43.977
| epoch 278 step   121200 |    428 batches | lr 7.97e-07 | ms/batch 324.00 | loss  3.82 | ppl    45.537
----------------------------------------------------------------------------------------------------
| Eval 303 at step   121200 | time: 134.80s | valid loss  4.19 | valid ppl    65.995
----------------------------------------------------------------------------------------------------
| epoch 279 step   121250 |     42 batches | lr 7.77e-07 | ms/batch 422.17 | loss  3.74 | ppl    42.240
| epoch 279 step   121300 |     92 batches | lr 7.56e-07 | ms/batch 323.42 | loss  3.72 | ppl    41.116
| epoch 279 step   121350 |    142 batches | lr 7.36e-07 | ms/batch 324.29 | loss  3.78 | ppl    43.888
| epoch 279 step   121400 |    192 batches | lr 7.16e-07 | ms/batch 324.11 | loss  3.80 | ppl    44.731
| epoch 279 step   121450 |    242 batches | lr 6.96e-07 | ms/batch 324.29 | loss  3.82 | ppl    45.761
| epoch 279 step   121500 |    292 batches | lr 6.77e-07 | ms/batch 322.93 | loss  3.82 | ppl    45.400
| epoch 279 step   121550 |    342 batches | lr 6.57e-07 | ms/batch 323.93 | loss  3.70 | ppl    40.425
| epoch 279 step   121600 |    392 batches | lr 6.39e-07 | ms/batch 323.99 | loss  3.78 | ppl    43.637
----------------------------------------------------------------------------------------------------
| Eval 304 at step   121600 | time: 134.45s | valid loss  4.19 | valid ppl    65.996
----------------------------------------------------------------------------------------------------
| epoch 280 step   121650 |      6 batches | lr 6.2e-07 | ms/batch 423.43 | loss  3.81 | ppl    44.925
| epoch 280 step   121700 |     56 batches | lr 6.02e-07 | ms/batch 322.70 | loss  3.72 | ppl    41.308
| epoch 280 step   121750 |    106 batches | lr 5.83e-07 | ms/batch 323.09 | loss  3.73 | ppl    41.648
| epoch 280 step   121800 |    156 batches | lr 5.66e-07 | ms/batch 322.88 | loss  3.78 | ppl    43.763
| epoch 280 step   121850 |    206 batches | lr 5.48e-07 | ms/batch 323.91 | loss  3.79 | ppl    44.348
| epoch 280 step   121900 |    256 batches | lr 5.31e-07 | ms/batch 322.79 | loss  3.82 | ppl    45.569
| epoch 280 step   121950 |    306 batches | lr 5.14e-07 | ms/batch 323.60 | loss  3.82 | ppl    45.822
| epoch 280 step   122000 |    356 batches | lr 4.97e-07 | ms/batch 322.49 | loss  3.69 | ppl    39.911
----------------------------------------------------------------------------------------------------
| Eval 305 at step   122000 | time: 134.24s | valid loss  4.19 | valid ppl    65.968
----------------------------------------------------------------------------------------------------
| epoch 280 step   122050 |    406 batches | lr 4.81e-07 | ms/batch 427.45 | loss  3.77 | ppl    43.299
| epoch 281 step   122100 |     20 batches | lr 4.65e-07 | ms/batch 317.67 | loss  3.81 | ppl    45.089
| epoch 281 step   122150 |     70 batches | lr 4.49e-07 | ms/batch 322.19 | loss  3.74 | ppl    41.952
| epoch 281 step   122200 |    120 batches | lr 4.33e-07 | ms/batch 322.18 | loss  3.75 | ppl    42.428
| epoch 281 step   122250 |    170 batches | lr 4.18e-07 | ms/batch 322.09 | loss  3.78 | ppl    43.792
| epoch 281 step   122300 |    220 batches | lr 4.03e-07 | ms/batch 322.55 | loss  3.78 | ppl    43.826
| epoch 281 step   122350 |    270 batches | lr 3.88e-07 | ms/batch 322.30 | loss  3.81 | ppl    45.013
| epoch 281 step   122400 |    320 batches | lr 3.73e-07 | ms/batch 322.94 | loss  3.79 | ppl    44.055
----------------------------------------------------------------------------------------------------
| Eval 306 at step   122400 | time: 133.97s | valid loss  4.19 | valid ppl    65.968
----------------------------------------------------------------------------------------------------
| epoch 281 step   122450 |    370 batches | lr 3.59e-07 | ms/batch 427.64 | loss  3.76 | ppl    42.940
| epoch 281 step   122500 |    420 batches | lr 3.45e-07 | ms/batch 324.51 | loss  3.78 | ppl    43.977
| epoch 282 step   122550 |     34 batches | lr 3.32e-07 | ms/batch 315.86 | loss  3.81 | ppl    45.195
| epoch 282 step   122600 |     84 batches | lr 3.18e-07 | ms/batch 321.97 | loss  3.72 | ppl    41.432
| epoch 282 step   122650 |    134 batches | lr 3.05e-07 | ms/batch 321.69 | loss  3.78 | ppl    43.869
| epoch 282 step   122700 |    184 batches | lr 2.92e-07 | ms/batch 322.06 | loss  3.76 | ppl    43.142
| epoch 282 step   122750 |    234 batches | lr 2.8e-07 | ms/batch 322.17 | loss  3.80 | ppl    44.838
| epoch 282 step   122800 |    284 batches | lr 2.67e-07 | ms/batch 321.99 | loss  3.83 | ppl    45.991
----------------------------------------------------------------------------------------------------
| Eval 307 at step   122800 | time: 133.89s | valid loss  4.19 | valid ppl    65.979
----------------------------------------------------------------------------------------------------
| epoch 282 step   122850 |    334 batches | lr 2.55e-07 | ms/batch 426.89 | loss  3.71 | ppl    40.930
| epoch 282 step   122900 |    384 batches | lr 2.44e-07 | ms/batch 325.58 | loss  3.77 | ppl    43.490
| epoch 282 step   122950 |    434 batches | lr 2.32e-07 | ms/batch 326.17 | loss  3.82 | ppl    45.538
| epoch 283 step   123000 |     48 batches | lr 2.21e-07 | ms/batch 317.22 | loss  3.73 | ppl    41.785
| epoch 283 step   123050 |     98 batches | lr 2.1e-07 | ms/batch 322.99 | loss  3.72 | ppl    41.445
| epoch 283 step   123100 |    148 batches | lr 1.99e-07 | ms/batch 324.69 | loss  3.78 | ppl    43.850
| epoch 283 step   123150 |    198 batches | lr 1.89e-07 | ms/batch 324.15 | loss  3.81 | ppl    45.375
| epoch 283 step   123200 |    248 batches | lr 1.79e-07 | ms/batch 321.60 | loss  3.82 | ppl    45.448
----------------------------------------------------------------------------------------------------
| Eval 308 at step   123200 | time: 134.46s | valid loss  4.19 | valid ppl    65.976
----------------------------------------------------------------------------------------------------
| epoch 283 step   123250 |    298 batches | lr 1.69e-07 | ms/batch 428.18 | loss  3.84 | ppl    46.706
| epoch 283 step   123300 |    348 batches | lr 1.6e-07 | ms/batch 325.76 | loss  3.69 | ppl    40.187
| epoch 283 step   123350 |    398 batches | lr 1.5e-07 | ms/batch 324.78 | loss  3.80 | ppl    44.609
| epoch 284 step   123400 |     12 batches | lr 1.41e-07 | ms/batch 318.54 | loss  3.83 | ppl    45.860
| epoch 284 step   123450 |     62 batches | lr 1.33e-07 | ms/batch 321.87 | loss  3.72 | ppl    41.316
| epoch 284 step   123500 |    112 batches | lr 1.24e-07 | ms/batch 321.72 | loss  3.76 | ppl    42.778
| epoch 284 step   123550 |    162 batches | lr 1.16e-07 | ms/batch 323.28 | loss  3.78 | ppl    44.010
| epoch 284 step   123600 |    212 batches | lr 1.08e-07 | ms/batch 322.96 | loss  3.78 | ppl    43.618
----------------------------------------------------------------------------------------------------
| Eval 309 at step   123600 | time: 134.36s | valid loss  4.19 | valid ppl    65.977
----------------------------------------------------------------------------------------------------
| epoch 284 step   123650 |    262 batches | lr 1.01e-07 | ms/batch 430.24 | loss  3.80 | ppl    44.803
| epoch 284 step   123700 |    312 batches | lr 9.34e-08 | ms/batch 326.97 | loss  3.81 | ppl    45.226
| epoch 284 step   123750 |    362 batches | lr 8.64e-08 | ms/batch 325.74 | loss  3.74 | ppl    42.001
| epoch 284 step   123800 |    412 batches | lr 7.96e-08 | ms/batch 326.16 | loss  3.76 | ppl    42.801
| epoch 285 step   123850 |     26 batches | lr 7.31e-08 | ms/batch 318.73 | loss  3.79 | ppl    44.383
| epoch 285 step   123900 |     76 batches | lr 6.69e-08 | ms/batch 322.56 | loss  3.71 | ppl    40.862
| epoch 285 step   123950 |    126 batches | lr 6.09e-08 | ms/batch 332.70 | loss  3.78 | ppl    44.001
| epoch 285 step   124000 |    176 batches | lr 5.53e-08 | ms/batch 339.56 | loss  3.76 | ppl    43.019
----------------------------------------------------------------------------------------------------
| Eval 310 at step   124000 | time: 136.19s | valid loss  4.19 | valid ppl    65.970
----------------------------------------------------------------------------------------------------
| epoch 285 step   124050 |    226 batches | lr 4.99e-08 | ms/batch 430.93 | loss  3.80 | ppl    44.808
| epoch 285 step   124100 |    276 batches | lr 4.48e-08 | ms/batch 326.20 | loss  3.81 | ppl    45.255
| epoch 285 step   124150 |    326 batches | lr 3.99e-08 | ms/batch 325.44 | loss  3.75 | ppl    42.593
| epoch 285 step   124200 |    376 batches | lr 3.54e-08 | ms/batch 327.03 | loss  3.75 | ppl    42.405
| epoch 285 step   124250 |    426 batches | lr 3.11e-08 | ms/batch 326.37 | loss  3.78 | ppl    43.934
| epoch 286 step   124300 |     40 batches | lr 2.71e-08 | ms/batch 316.62 | loss  3.76 | ppl    43.108
| epoch 286 step   124350 |     90 batches | lr 2.34e-08 | ms/batch 322.39 | loss  3.72 | ppl    41.128
| epoch 286 step   124400 |    140 batches | lr 1.99e-08 | ms/batch 322.51 | loss  3.78 | ppl    43.768
----------------------------------------------------------------------------------------------------
| Eval 311 at step   124400 | time: 134.82s | valid loss  4.19 | valid ppl    65.975
----------------------------------------------------------------------------------------------------
| epoch 286 step   124450 |    190 batches | lr 1.67e-08 | ms/batch 428.30 | loss  3.77 | ppl    43.312
| epoch 286 step   124500 |    240 batches | lr 1.38e-08 | ms/batch 326.95 | loss  3.81 | ppl    45.069
| epoch 286 step   124550 |    290 batches | lr 1.12e-08 | ms/batch 328.26 | loss  3.84 | ppl    46.339
| epoch 286 step   124600 |    340 batches | lr 8.84e-09 | ms/batch 326.06 | loss  3.70 | ppl    40.403
| epoch 286 step   124650 |    390 batches | lr 6.77e-09 | ms/batch 327.08 | loss  3.77 | ppl    43.213
| epoch 287 step   124700 |      4 batches | lr 4.97e-09 | ms/batch 321.16 | loss  3.83 | ppl    46.064
| epoch 287 step   124750 |     54 batches | lr 3.45e-09 | ms/batch 322.32 | loss  3.75 | ppl    42.723
| epoch 287 step   124800 |    104 batches | lr 2.21e-09 | ms/batch 324.87 | loss  3.73 | ppl    41.796
----------------------------------------------------------------------------------------------------
| Eval 312 at step   124800 | time: 135.27s | valid loss  4.19 | valid ppl    65.975
----------------------------------------------------------------------------------------------------
| epoch 287 step   124850 |    154 batches | lr 1.24e-09 | ms/batch 431.23 | loss  3.80 | ppl    44.757
| epoch 287 step   124900 |    204 batches | lr 5.53e-10 | ms/batch 327.49 | loss  3.79 | ppl    44.473
| epoch 287 step   124950 |    254 batches | lr 1.38e-10 | ms/batch 329.24 | loss  3.81 | ppl    45.071
| epoch 287 step   125000 |    304 batches | lr 0 | ms/batch 329.60 | loss  3.82 | ppl    45.734
----------------------------------------------------------------------------------------------------
End of training
====================================================================================================
| End of training | test loss  4.15 | test ppl    63.427
====================================================================================================
