====================================================================================================
    - data : ../data/wikitext-2/
    - dataset : wt103
    - n_layer : 16
    - n_head : 10
    - d_head : 40
    - d_embed : 400
    - d_model : 400
    - d_inner : 900
    - dropout : 0.2
    - dropoute : 0.3
    - dropouto : 0.5
    - dropouti : 0.7
    - dropatt : 0.2
    - init : normal
    - emb_init : normal
    - init_range : 0.1
    - emb_init_range : 0.01
    - init_std : 0.02
    - proj_init_std : 0.01
    - optim : adam
    - lr : 0.00035
    - mom : 0.0
    - scheduler : cosine
    - warmup_step : 3000
    - decay_rate : 0.5
    - lr_min : 0.0
    - clip : 0.25
    - clip_nonemb : False
    - max_step : 125000
    - batch_size : 32
    - batch_chunk : 1
    - tgt_len : 150
    - eval_tgt_len : 150
    - ext_len : 0
    - mem_len : 150
    - not_tied : False
    - seed : 11
    - cuda : True
    - adaptive : True
    - div_val : 1
    - pre_lnorm : False
    - varlen : False
    - multi_gpu : False
    - log_interval : 50
    - eval_interval : 400
    - work_dir : LM-TFM-wt103-11
    - restart : False
    - restart_dir : 
    - debug : False
    - same_length : False
    - attn_type : 0
    - clamp_len : -1
    - eta_min : 0.0
    - gpu0_bsz : 1
    - max_eval_steps : -1
    - sample_softmax : -1
    - patience : 0
    - finetune_v2 : False
    - finetune_v3 : False
    - fp16 : True
    - static_loss_scale : 1
    - dynamic_loss_scale : True
    - wdecay : 1.2e-06
    - tied : True
    - n_token : 33278
    - n_all_param : 37712881
    - n_nonemb_param : 24366400
====================================================================================================
#params = 37712881
#non emb params = 24366400
| epoch   1 step       50 |     50 batches | lr 5.83e-06 | ms/batch 324.80 | loss 10.20 | ppl 26966.314
| epoch   1 step      100 |    100 batches | lr 1.17e-05 | ms/batch 316.94 | loss  9.68 | ppl 15947.084
| epoch   1 step      150 |    150 batches | lr 1.75e-05 | ms/batch 324.71 | loss  9.25 | ppl 10359.145
| epoch   1 step      200 |    200 batches | lr 2.33e-05 | ms/batch 317.41 | loss  8.93 | ppl  7588.392
| epoch   1 step      250 |    250 batches | lr 2.92e-05 | ms/batch 318.88 | loss  8.55 | ppl  5165.947
| epoch   1 step      300 |    300 batches | lr 3.5e-05 | ms/batch 318.09 | loss  8.12 | ppl  3372.331
| epoch   1 step      350 |    350 batches | lr 4.08e-05 | ms/batch 318.23 | loss  7.71 | ppl  2227.756
| epoch   1 step      400 |    400 batches | lr 4.67e-05 | ms/batch 318.13 | loss  7.40 | ppl  1640.592
----------------------------------------------------------------------------------------------------
| Eval   1 at step      400 | time: 133.00s | valid loss  6.91 | valid ppl  1004.192
----------------------------------------------------------------------------------------------------
| epoch   2 step      450 |     14 batches | lr 5.25e-05 | ms/batch 461.69 | loss  7.25 | ppl  1406.675
| epoch   2 step      500 |     64 batches | lr 5.83e-05 | ms/batch 329.98 | loss  7.13 | ppl  1247.317
| epoch   2 step      550 |    114 batches | lr 6.42e-05 | ms/batch 317.34 | loss  7.08 | ppl  1190.012
| epoch   2 step      600 |    164 batches | lr 7e-05 | ms/batch 318.27 | loss  7.06 | ppl  1162.355
| epoch   2 step      650 |    214 batches | lr 7.58e-05 | ms/batch 317.82 | loss  7.04 | ppl  1146.482
| epoch   2 step      700 |    264 batches | lr 8.17e-05 | ms/batch 318.04 | loss  6.97 | ppl  1059.908
| epoch   2 step      750 |    314 batches | lr 8.75e-05 | ms/batch 316.96 | loss  6.97 | ppl  1062.893
| epoch   2 step      800 |    364 batches | lr 9.33e-05 | ms/batch 317.91 | loss  6.91 | ppl  1001.543
----------------------------------------------------------------------------------------------------
| Eval   2 at step      800 | time: 134.16s | valid loss  6.41 | valid ppl   608.181
----------------------------------------------------------------------------------------------------
| epoch   2 step      850 |    414 batches | lr 9.92e-05 | ms/batch 448.75 | loss  6.90 | ppl   987.711
| epoch   3 step      900 |     28 batches | lr 0.000105 | ms/batch 311.73 | loss  6.84 | ppl   938.586
| epoch   3 step      950 |     78 batches | lr 0.000111 | ms/batch 314.85 | loss  6.82 | ppl   915.699
| epoch   3 step     1000 |    128 batches | lr 0.000117 | ms/batch 314.62 | loss  6.76 | ppl   860.152
| epoch   3 step     1050 |    178 batches | lr 0.000122 | ms/batch 316.02 | loss  6.75 | ppl   857.669
| epoch   3 step     1100 |    228 batches | lr 0.000128 | ms/batch 317.23 | loss  6.76 | ppl   860.018
| epoch   3 step     1150 |    278 batches | lr 0.000134 | ms/batch 315.82 | loss  6.69 | ppl   804.762
| epoch   3 step     1200 |    328 batches | lr 0.00014 | ms/batch 317.39 | loss  6.67 | ppl   788.457
----------------------------------------------------------------------------------------------------
| Eval   3 at step     1200 | time: 131.16s | valid loss  6.12 | valid ppl   456.037
----------------------------------------------------------------------------------------------------
| epoch   3 step     1250 |    378 batches | lr 0.000146 | ms/batch 447.46 | loss  6.63 | ppl   758.963
| epoch   3 step     1300 |    428 batches | lr 0.000152 | ms/batch 316.35 | loss  6.61 | ppl   743.412
| epoch   4 step     1350 |     42 batches | lr 0.000157 | ms/batch 311.44 | loss  6.58 | ppl   717.730
| epoch   4 step     1400 |     92 batches | lr 0.000163 | ms/batch 316.80 | loss  6.56 | ppl   706.989
| epoch   4 step     1450 |    142 batches | lr 0.000169 | ms/batch 322.14 | loss  6.52 | ppl   681.713
| epoch   4 step     1500 |    192 batches | lr 0.000175 | ms/batch 315.59 | loss  6.50 | ppl   662.859
| epoch   4 step     1550 |    242 batches | lr 0.000181 | ms/batch 316.65 | loss  6.50 | ppl   665.038
| epoch   4 step     1600 |    292 batches | lr 0.000187 | ms/batch 314.31 | loss  6.47 | ppl   643.973
----------------------------------------------------------------------------------------------------
| Eval   4 at step     1600 | time: 131.43s | valid loss  5.88 | valid ppl   356.069
----------------------------------------------------------------------------------------------------
| epoch   4 step     1650 |    342 batches | lr 0.000193 | ms/batch 448.51 | loss  6.42 | ppl   614.339
| epoch   4 step     1700 |    392 batches | lr 0.000198 | ms/batch 315.18 | loss  6.41 | ppl   610.607
| epoch   5 step     1750 |      6 batches | lr 0.000204 | ms/batch 309.52 | loss  6.39 | ppl   596.462
| epoch   5 step     1800 |     56 batches | lr 0.00021 | ms/batch 319.34 | loss  6.34 | ppl   568.837
| epoch   5 step     1850 |    106 batches | lr 0.000216 | ms/batch 319.23 | loss  6.34 | ppl   565.469
| epoch   5 step     1900 |    156 batches | lr 0.000222 | ms/batch 316.65 | loss  6.30 | ppl   542.534
| epoch   5 step     1950 |    206 batches | lr 0.000228 | ms/batch 317.27 | loss  6.33 | ppl   563.529
| epoch   5 step     2000 |    256 batches | lr 0.000233 | ms/batch 317.16 | loss  6.27 | ppl   526.540
----------------------------------------------------------------------------------------------------
| Eval   5 at step     2000 | time: 131.54s | valid loss  5.69 | valid ppl   296.967
----------------------------------------------------------------------------------------------------
| epoch   5 step     2050 |    306 batches | lr 0.000239 | ms/batch 468.93 | loss  6.26 | ppl   525.513
| epoch   5 step     2100 |    356 batches | lr 0.000245 | ms/batch 324.01 | loss  6.19 | ppl   487.579
| epoch   5 step     2150 |    406 batches | lr 0.000251 | ms/batch 317.59 | loss  6.21 | ppl   496.847
| epoch   6 step     2200 |     20 batches | lr 0.000257 | ms/batch 311.35 | loss  6.20 | ppl   492.634
| epoch   6 step     2250 |     70 batches | lr 0.000262 | ms/batch 315.17 | loss  6.17 | ppl   478.448
| epoch   6 step     2300 |    120 batches | lr 0.000268 | ms/batch 314.62 | loss  6.17 | ppl   475.801
| epoch   6 step     2350 |    170 batches | lr 0.000274 | ms/batch 314.63 | loss  6.11 | ppl   448.233
| epoch   6 step     2400 |    220 batches | lr 0.00028 | ms/batch 314.41 | loss  6.14 | ppl   462.100
----------------------------------------------------------------------------------------------------
| Eval   6 at step     2400 | time: 131.79s | valid loss  5.55 | valid ppl   256.459
----------------------------------------------------------------------------------------------------
| epoch   6 step     2450 |    270 batches | lr 0.000286 | ms/batch 450.81 | loss  6.08 | ppl   436.517
| epoch   6 step     2500 |    320 batches | lr 0.000292 | ms/batch 318.14 | loss  6.07 | ppl   432.748
| epoch   6 step     2550 |    370 batches | lr 0.000297 | ms/batch 317.39 | loss  6.03 | ppl   415.488
| epoch   6 step     2600 |    420 batches | lr 0.000303 | ms/batch 316.75 | loss  6.05 | ppl   424.246
| epoch   7 step     2650 |     34 batches | lr 0.000309 | ms/batch 310.74 | loss  6.01 | ppl   407.451
| epoch   7 step     2700 |     84 batches | lr 0.000315 | ms/batch 314.59 | loss  5.99 | ppl   397.764
| epoch   7 step     2750 |    134 batches | lr 0.000321 | ms/batch 316.14 | loss  5.97 | ppl   389.857
| epoch   7 step     2800 |    184 batches | lr 0.000327 | ms/batch 313.71 | loss  5.98 | ppl   394.545
----------------------------------------------------------------------------------------------------
| Eval   7 at step     2800 | time: 131.20s | valid loss  5.40 | valid ppl   221.254
----------------------------------------------------------------------------------------------------
| epoch   7 step     2850 |    234 batches | lr 0.000333 | ms/batch 458.28 | loss  5.98 | ppl   397.081
| epoch   7 step     2900 |    284 batches | lr 0.000338 | ms/batch 316.25 | loss  5.96 | ppl   386.461
| epoch   7 step     2950 |    334 batches | lr 0.000344 | ms/batch 316.20 | loss  5.93 | ppl   375.421
| epoch   7 step     3000 |    384 batches | lr 0.00035 | ms/batch 317.65 | loss  5.95 | ppl   383.124
| epoch   7 step     3050 |    434 batches | lr 0.000349 | ms/batch 317.10 | loss  5.94 | ppl   381.452
| epoch   8 step     3100 |     48 batches | lr 0.000349 | ms/batch 309.15 | loss  5.90 | ppl   363.529
| epoch   8 step     3150 |     98 batches | lr 0.000349 | ms/batch 314.63 | loss  5.86 | ppl   351.849
| epoch   8 step     3200 |    148 batches | lr 0.000349 | ms/batch 314.85 | loss  5.84 | ppl   342.359
----------------------------------------------------------------------------------------------------
| Eval   8 at step     3200 | time: 131.35s | valid loss  5.33 | valid ppl   206.477
----------------------------------------------------------------------------------------------------
| epoch   8 step     3250 |    198 batches | lr 0.000349 | ms/batch 447.43 | loss  5.87 | ppl   355.524
| epoch   8 step     3300 |    248 batches | lr 0.000349 | ms/batch 314.49 | loss  5.84 | ppl   344.559
| epoch   8 step     3350 |    298 batches | lr 0.000349 | ms/batch 313.14 | loss  5.84 | ppl   342.921
| epoch   8 step     3400 |    348 batches | lr 0.000349 | ms/batch 316.31 | loss  5.77 | ppl   319.987
| epoch   8 step     3450 |    398 batches | lr 0.000349 | ms/batch 315.29 | loss  5.83 | ppl   339.376
| epoch   9 step     3500 |     12 batches | lr 0.000349 | ms/batch 309.70 | loss  5.82 | ppl   337.130
| epoch   9 step     3550 |     62 batches | lr 0.000349 | ms/batch 314.70 | loss  5.75 | ppl   312.721
| epoch   9 step     3600 |    112 batches | lr 0.000349 | ms/batch 314.42 | loss  5.74 | ppl   311.551
----------------------------------------------------------------------------------------------------
| Eval   9 at step     3600 | time: 130.60s | valid loss  5.25 | valid ppl   189.744
----------------------------------------------------------------------------------------------------
| epoch   9 step     3650 |    162 batches | lr 0.000349 | ms/batch 461.26 | loss  5.74 | ppl   311.429
| epoch   9 step     3700 |    212 batches | lr 0.000349 | ms/batch 315.59 | loss  5.77 | ppl   320.538
| epoch   9 step     3750 |    262 batches | lr 0.000349 | ms/batch 315.77 | loss  5.72 | ppl   306.338
| epoch   9 step     3800 |    312 batches | lr 0.000349 | ms/batch 316.47 | loss  5.72 | ppl   304.833
| epoch   9 step     3850 |    362 batches | lr 0.000349 | ms/batch 316.51 | loss  5.68 | ppl   293.224
| epoch   9 step     3900 |    412 batches | lr 0.000349 | ms/batch 331.39 | loss  5.71 | ppl   300.365
| epoch  10 step     3950 |     26 batches | lr 0.000349 | ms/batch 314.06 | loss  5.74 | ppl   310.239
| epoch  10 step     4000 |     76 batches | lr 0.000349 | ms/batch 316.19 | loss  5.64 | ppl   281.463
----------------------------------------------------------------------------------------------------
| Eval  10 at step     4000 | time: 132.47s | valid loss  5.16 | valid ppl   173.504
----------------------------------------------------------------------------------------------------
| epoch  10 step     4050 |    126 batches | lr 0.000349 | ms/batch 455.39 | loss  5.66 | ppl   288.250
| epoch  10 step     4100 |    176 batches | lr 0.000349 | ms/batch 313.96 | loss  5.65 | ppl   284.003
| epoch  10 step     4150 |    226 batches | lr 0.000349 | ms/batch 314.76 | loss  5.67 | ppl   289.107
| epoch  10 step     4200 |    276 batches | lr 0.000349 | ms/batch 314.26 | loss  5.66 | ppl   288.227
| epoch  10 step     4250 |    326 batches | lr 0.000349 | ms/batch 313.60 | loss  5.63 | ppl   278.619
| epoch  10 step     4300 |    376 batches | lr 0.000349 | ms/batch 313.93 | loss  5.61 | ppl   272.888
| epoch  10 step     4350 |    426 batches | lr 0.000349 | ms/batch 314.73 | loss  5.64 | ppl   280.256
| epoch  11 step     4400 |     40 batches | lr 0.000349 | ms/batch 307.19 | loss  5.63 | ppl   277.988
----------------------------------------------------------------------------------------------------
| Eval  11 at step     4400 | time: 130.29s | valid loss  5.12 | valid ppl   166.895
----------------------------------------------------------------------------------------------------
| epoch  11 step     4450 |     90 batches | lr 0.000349 | ms/batch 447.76 | loss  5.60 | ppl   270.553
| epoch  11 step     4500 |    140 batches | lr 0.000349 | ms/batch 315.43 | loss  5.56 | ppl   260.067
| epoch  11 step     4550 |    190 batches | lr 0.000349 | ms/batch 313.87 | loss  5.64 | ppl   281.925
| epoch  11 step     4600 |    240 batches | lr 0.000349 | ms/batch 319.53 | loss  5.59 | ppl   268.448
| epoch  11 step     4650 |    290 batches | lr 0.000349 | ms/batch 314.20 | loss  5.63 | ppl   279.906
| epoch  11 step     4700 |    340 batches | lr 0.000349 | ms/batch 322.16 | loss  5.50 | ppl   245.880
| epoch  11 step     4750 |    390 batches | lr 0.000349 | ms/batch 315.33 | loss  5.57 | ppl   261.452
| epoch  12 step     4800 |      4 batches | lr 0.000349 | ms/batch 310.46 | loss  5.57 | ppl   263.461
----------------------------------------------------------------------------------------------------
| Eval  12 at step     4800 | time: 131.33s | valid loss  5.06 | valid ppl   157.597
----------------------------------------------------------------------------------------------------
| epoch  12 step     4850 |     54 batches | lr 0.000349 | ms/batch 451.56 | loss  5.54 | ppl   255.096
| epoch  12 step     4900 |    104 batches | lr 0.000349 | ms/batch 314.74 | loss  5.54 | ppl   254.758
| epoch  12 step     4950 |    154 batches | lr 0.000349 | ms/batch 314.15 | loss  5.51 | ppl   246.977
| epoch  12 step     5000 |    204 batches | lr 0.000349 | ms/batch 314.41 | loss  5.56 | ppl   260.026
| epoch  12 step     5050 |    254 batches | lr 0.000349 | ms/batch 314.72 | loss  5.52 | ppl   249.889
| epoch  12 step     5100 |    304 batches | lr 0.000349 | ms/batch 316.78 | loss  5.56 | ppl   259.843
| epoch  12 step     5150 |    354 batches | lr 0.000349 | ms/batch 316.82 | loss  5.44 | ppl   231.453
| epoch  12 step     5200 |    404 batches | lr 0.000349 | ms/batch 314.86 | loss  5.53 | ppl   251.986
----------------------------------------------------------------------------------------------------
| Eval  13 at step     5200 | time: 131.19s | valid loss  5.01 | valid ppl   149.824
----------------------------------------------------------------------------------------------------
| epoch  13 step     5250 |     18 batches | lr 0.000348 | ms/batch 473.74 | loss  5.53 | ppl   251.259
| epoch  13 step     5300 |     68 batches | lr 0.000348 | ms/batch 331.91 | loss  5.47 | ppl   237.906
| epoch  13 step     5350 |    118 batches | lr 0.000348 | ms/batch 319.12 | loss  5.50 | ppl   244.941
| epoch  13 step     5400 |    168 batches | lr 0.000348 | ms/batch 316.24 | loss  5.47 | ppl   237.572
| epoch  13 step     5450 |    218 batches | lr 0.000348 | ms/batch 315.21 | loss  5.53 | ppl   251.377
| epoch  13 step     5500 |    268 batches | lr 0.000348 | ms/batch 321.57 | loss  5.48 | ppl   239.734
| epoch  13 step     5550 |    318 batches | lr 0.000348 | ms/batch 315.05 | loss  5.47 | ppl   237.887
| epoch  13 step     5600 |    368 batches | lr 0.000348 | ms/batch 313.45 | loss  5.44 | ppl   229.418
----------------------------------------------------------------------------------------------------
| Eval  14 at step     5600 | time: 132.26s | valid loss  4.98 | valid ppl   145.163
----------------------------------------------------------------------------------------------------
| epoch  13 step     5650 |    418 batches | lr 0.000348 | ms/batch 473.64 | loss  5.44 | ppl   230.334
| epoch  14 step     5700 |     32 batches | lr 0.000348 | ms/batch 307.65 | loss  5.45 | ppl   232.104
| epoch  14 step     5750 |     82 batches | lr 0.000348 | ms/batch 314.65 | loss  5.42 | ppl   225.932
| epoch  14 step     5800 |    132 batches | lr 0.000348 | ms/batch 313.47 | loss  5.42 | ppl   225.685
| epoch  14 step     5850 |    182 batches | lr 0.000348 | ms/batch 313.55 | loss  5.45 | ppl   232.740
| epoch  14 step     5900 |    232 batches | lr 0.000348 | ms/batch 315.94 | loss  5.42 | ppl   226.622
| epoch  14 step     5950 |    282 batches | lr 0.000348 | ms/batch 313.70 | loss  5.44 | ppl   229.490
| epoch  14 step     6000 |    332 batches | lr 0.000348 | ms/batch 314.22 | loss  5.37 | ppl   215.738
----------------------------------------------------------------------------------------------------
| Eval  15 at step     6000 | time: 130.46s | valid loss  4.94 | valid ppl   140.195
----------------------------------------------------------------------------------------------------
| epoch  14 step     6050 |    382 batches | lr 0.000348 | ms/batch 444.66 | loss  5.41 | ppl   223.754
| epoch  14 step     6100 |    432 batches | lr 0.000348 | ms/batch 314.67 | loss  5.43 | ppl   227.651
| epoch  15 step     6150 |     46 batches | lr 0.000348 | ms/batch 308.73 | loss  5.40 | ppl   221.355
| epoch  15 step     6200 |     96 batches | lr 0.000348 | ms/batch 313.97 | loss  5.36 | ppl   213.307
| epoch  15 step     6250 |    146 batches | lr 0.000348 | ms/batch 316.61 | loss  5.35 | ppl   211.449
| epoch  15 step     6300 |    196 batches | lr 0.000348 | ms/batch 320.82 | loss  5.42 | ppl   225.791
| epoch  15 step     6350 |    246 batches | lr 0.000348 | ms/batch 329.17 | loss  5.40 | ppl   220.319
| epoch  15 step     6400 |    296 batches | lr 0.000348 | ms/batch 315.69 | loss  5.45 | ppl   232.268
----------------------------------------------------------------------------------------------------
| Eval  16 at step     6400 | time: 131.64s | valid loss  4.92 | valid ppl   137.284
----------------------------------------------------------------------------------------------------
| epoch  15 step     6450 |    346 batches | lr 0.000348 | ms/batch 468.28 | loss  5.31 | ppl   201.640
| epoch  15 step     6500 |    396 batches | lr 0.000348 | ms/batch 315.38 | loss  5.40 | ppl   220.509
| epoch  16 step     6550 |     10 batches | lr 0.000348 | ms/batch 308.89 | loss  5.40 | ppl   221.579
| epoch  16 step     6600 |     60 batches | lr 0.000348 | ms/batch 312.99 | loss  5.32 | ppl   204.992
| epoch  16 step     6650 |    110 batches | lr 0.000348 | ms/batch 313.54 | loss  5.30 | ppl   200.980
| epoch  16 step     6700 |    160 batches | lr 0.000348 | ms/batch 314.25 | loss  5.32 | ppl   203.428
| epoch  16 step     6750 |    210 batches | lr 0.000347 | ms/batch 314.13 | loss  5.37 | ppl   213.841
| epoch  16 step     6800 |    260 batches | lr 0.000347 | ms/batch 314.88 | loss  5.33 | ppl   206.438
----------------------------------------------------------------------------------------------------
| Eval  17 at step     6800 | time: 131.17s | valid loss  4.88 | valid ppl   131.629
----------------------------------------------------------------------------------------------------
| epoch  16 step     6850 |    310 batches | lr 0.000347 | ms/batch 469.81 | loss  5.36 | ppl   213.391
| epoch  16 step     6900 |    360 batches | lr 0.000347 | ms/batch 314.43 | loss  5.29 | ppl   199.120
| epoch  16 step     6950 |    410 batches | lr 0.000347 | ms/batch 314.22 | loss  5.31 | ppl   201.829
| epoch  17 step     7000 |     24 batches | lr 0.000347 | ms/batch 310.56 | loss  5.37 | ppl   213.958
| epoch  17 step     7050 |     74 batches | lr 0.000347 | ms/batch 315.29 | loss  5.30 | ppl   200.118
| epoch  17 step     7100 |    124 batches | lr 0.000347 | ms/batch 315.96 | loss  5.28 | ppl   196.631
| epoch  17 step     7150 |    174 batches | lr 0.000347 | ms/batch 316.11 | loss  5.32 | ppl   204.751
| epoch  17 step     7200 |    224 batches | lr 0.000347 | ms/batch 315.48 | loss  5.34 | ppl   207.489
----------------------------------------------------------------------------------------------------
| Eval  18 at step     7200 | time: 131.81s | valid loss  4.86 | valid ppl   129.277
----------------------------------------------------------------------------------------------------
| epoch  17 step     7250 |    274 batches | lr 0.000347 | ms/batch 568.59 | loss  5.34 | ppl   209.231
| epoch  17 step     7300 |    324 batches | lr 0.000347 | ms/batch 314.47 | loss  5.27 | ppl   195.223
| epoch  17 step     7350 |    374 batches | lr 0.000347 | ms/batch 314.15 | loss  5.29 | ppl   197.895
| epoch  17 step     7400 |    424 batches | lr 0.000347 | ms/batch 315.64 | loss  5.29 | ppl   198.204
| epoch  18 step     7450 |     38 batches | lr 0.000347 | ms/batch 325.45 | loss  5.29 | ppl   199.089
| epoch  18 step     7500 |     88 batches | lr 0.000347 | ms/batch 316.88 | loss  5.28 | ppl   195.604
| epoch  18 step     7550 |    138 batches | lr 0.000347 | ms/batch 315.84 | loss  5.25 | ppl   190.939
| epoch  18 step     7600 |    188 batches | lr 0.000347 | ms/batch 318.12 | loss  5.27 | ppl   194.477
----------------------------------------------------------------------------------------------------
| Eval  19 at step     7600 | time: 132.06s | valid loss  4.85 | valid ppl   127.364
----------------------------------------------------------------------------------------------------
| epoch  18 step     7650 |    238 batches | lr 0.000347 | ms/batch 479.14 | loss  5.28 | ppl   196.892
| epoch  18 step     7700 |    288 batches | lr 0.000347 | ms/batch 317.17 | loss  5.31 | ppl   203.031
| epoch  18 step     7750 |    338 batches | lr 0.000347 | ms/batch 316.99 | loss  5.21 | ppl   182.623
| epoch  18 step     7800 |    388 batches | lr 0.000347 | ms/batch 317.36 | loss  5.26 | ppl   192.466
| epoch  19 step     7850 |      2 batches | lr 0.000347 | ms/batch 312.48 | loss  5.26 | ppl   191.536
| epoch  19 step     7900 |     52 batches | lr 0.000347 | ms/batch 317.68 | loss  5.20 | ppl   181.074
| epoch  19 step     7950 |    102 batches | lr 0.000347 | ms/batch 326.73 | loss  5.25 | ppl   190.507
| epoch  19 step     8000 |    152 batches | lr 0.000346 | ms/batch 317.83 | loss  5.21 | ppl   183.137
----------------------------------------------------------------------------------------------------
| Eval  20 at step     8000 | time: 132.73s | valid loss  4.81 | valid ppl   122.924
----------------------------------------------------------------------------------------------------
| epoch  19 step     8050 |    202 batches | lr 0.000346 | ms/batch 481.90 | loss  5.26 | ppl   192.481
| epoch  19 step     8100 |    252 batches | lr 0.000346 | ms/batch 315.97 | loss  5.28 | ppl   195.895
| epoch  19 step     8150 |    302 batches | lr 0.000346 | ms/batch 318.98 | loss  5.31 | ppl   203.079
| epoch  19 step     8200 |    352 batches | lr 0.000346 | ms/batch 317.07 | loss  5.17 | ppl   175.723
| epoch  19 step     8250 |    402 batches | lr 0.000346 | ms/batch 316.99 | loss  5.23 | ppl   186.414
| epoch  20 step     8300 |     16 batches | lr 0.000346 | ms/batch 311.24 | loss  5.24 | ppl   189.409
| epoch  20 step     8350 |     66 batches | lr 0.000346 | ms/batch 325.99 | loss  5.18 | ppl   177.101
| epoch  20 step     8400 |    116 batches | lr 0.000346 | ms/batch 323.85 | loss  5.21 | ppl   184.012
----------------------------------------------------------------------------------------------------
| Eval  21 at step     8400 | time: 132.45s | valid loss  4.76 | valid ppl   116.978
----------------------------------------------------------------------------------------------------
| epoch  20 step     8450 |    166 batches | lr 0.000346 | ms/batch 476.61 | loss  5.20 | ppl   181.726
| epoch  20 step     8500 |    216 batches | lr 0.000346 | ms/batch 317.36 | loss  5.24 | ppl   188.523
| epoch  20 step     8550 |    266 batches | lr 0.000346 | ms/batch 317.17 | loss  5.24 | ppl   188.773
| epoch  20 step     8600 |    316 batches | lr 0.000346 | ms/batch 317.51 | loss  5.24 | ppl   187.964
| epoch  20 step     8650 |    366 batches | lr 0.000346 | ms/batch 317.82 | loss  5.18 | ppl   177.419
| epoch  20 step     8700 |    416 batches | lr 0.000346 | ms/batch 315.56 | loss  5.19 | ppl   179.890
| epoch  21 step     8750 |     30 batches | lr 0.000346 | ms/batch 311.91 | loss  5.25 | ppl   189.720
| epoch  21 step     8800 |     80 batches | lr 0.000346 | ms/batch 322.25 | loss  5.18 | ppl   177.655
----------------------------------------------------------------------------------------------------
| Eval  22 at step     8800 | time: 132.17s | valid loss  4.79 | valid ppl   120.100
----------------------------------------------------------------------------------------------------
| epoch  21 step     8850 |    130 batches | lr 0.000346 | ms/batch 423.16 | loss  5.18 | ppl   177.239
| epoch  21 step     8900 |    180 batches | lr 0.000346 | ms/batch 315.49 | loss  5.20 | ppl   181.726
| epoch  21 step     8950 |    230 batches | lr 0.000346 | ms/batch 314.10 | loss  5.21 | ppl   182.266
| epoch  21 step     9000 |    280 batches | lr 0.000346 | ms/batch 317.84 | loss  5.21 | ppl   182.466
| epoch  21 step     9050 |    330 batches | lr 0.000345 | ms/batch 316.61 | loss  5.17 | ppl   175.174
| epoch  21 step     9100 |    380 batches | lr 0.000345 | ms/batch 316.10 | loss  5.16 | ppl   174.737
| epoch  21 step     9150 |    430 batches | lr 0.000345 | ms/batch 316.47 | loss  5.18 | ppl   177.322
| epoch  22 step     9200 |     44 batches | lr 0.000345 | ms/batch 311.56 | loss  5.17 | ppl   175.901
----------------------------------------------------------------------------------------------------
| Eval  23 at step     9200 | time: 131.50s | valid loss  4.76 | valid ppl   116.570
----------------------------------------------------------------------------------------------------
| epoch  22 step     9250 |     94 batches | lr 0.000345 | ms/batch 469.33 | loss  5.13 | ppl   169.400
| epoch  22 step     9300 |    144 batches | lr 0.000345 | ms/batch 331.01 | loss  5.15 | ppl   171.679
| epoch  22 step     9350 |    194 batches | lr 0.000345 | ms/batch 326.48 | loss  5.20 | ppl   180.396
| epoch  22 step     9400 |    244 batches | lr 0.000345 | ms/batch 316.43 | loss  5.18 | ppl   176.824
| epoch  22 step     9450 |    294 batches | lr 0.000345 | ms/batch 316.08 | loss  5.21 | ppl   183.266
| epoch  22 step     9500 |    344 batches | lr 0.000345 | ms/batch 315.24 | loss  5.08 | ppl   160.072
| epoch  22 step     9550 |    394 batches | lr 0.000345 | ms/batch 316.41 | loss  5.18 | ppl   178.546
| epoch  23 step     9600 |      8 batches | lr 0.000345 | ms/batch 312.00 | loss  5.17 | ppl   175.448
----------------------------------------------------------------------------------------------------
| Eval  24 at step     9600 | time: 132.98s | valid loss  4.74 | valid ppl   114.028
----------------------------------------------------------------------------------------------------
| epoch  23 step     9650 |     58 batches | lr 0.000345 | ms/batch 477.95 | loss  5.16 | ppl   174.969
| epoch  23 step     9700 |    108 batches | lr 0.000345 | ms/batch 315.37 | loss  5.11 | ppl   164.960
| epoch  23 step     9750 |    158 batches | lr 0.000345 | ms/batch 326.83 | loss  5.13 | ppl   168.621
| epoch  23 step     9800 |    208 batches | lr 0.000345 | ms/batch 330.60 | loss  5.16 | ppl   174.696
| epoch  23 step     9850 |    258 batches | lr 0.000345 | ms/batch 320.39 | loss  5.16 | ppl   174.969
| epoch  23 step     9900 |    308 batches | lr 0.000345 | ms/batch 315.89 | loss  5.19 | ppl   179.118
| epoch  23 step     9950 |    358 batches | lr 0.000345 | ms/batch 316.64 | loss  5.08 | ppl   160.072
| epoch  23 step    10000 |    408 batches | lr 0.000345 | ms/batch 317.21 | loss  5.12 | ppl   167.414
----------------------------------------------------------------------------------------------------
| Eval  25 at step    10000 | time: 133.35s | valid loss  4.72 | valid ppl   112.451
----------------------------------------------------------------------------------------------------
| epoch  24 step    10050 |     22 batches | lr 0.000344 | ms/batch 472.17 | loss  5.18 | ppl   177.128
| epoch  24 step    10100 |     72 batches | lr 0.000344 | ms/batch 314.32 | loss  5.09 | ppl   161.782
| epoch  24 step    10150 |    122 batches | lr 0.000344 | ms/batch 313.48 | loss  5.12 | ppl   167.139
| epoch  24 step    10200 |    172 batches | lr 0.000344 | ms/batch 314.39 | loss  5.13 | ppl   169.825
| epoch  24 step    10250 |    222 batches | lr 0.000344 | ms/batch 314.65 | loss  5.14 | ppl   171.545
| epoch  24 step    10300 |    272 batches | lr 0.000344 | ms/batch 314.59 | loss  5.14 | ppl   170.489
| epoch  24 step    10350 |    322 batches | lr 0.000344 | ms/batch 314.16 | loss  5.10 | ppl   164.638
| epoch  24 step    10400 |    372 batches | lr 0.000344 | ms/batch 315.17 | loss  5.09 | ppl   162.276
----------------------------------------------------------------------------------------------------
| Eval  26 at step    10400 | time: 130.53s | valid loss  4.70 | valid ppl   110.127
----------------------------------------------------------------------------------------------------
| epoch  24 step    10450 |    422 batches | lr 0.000344 | ms/batch 530.72 | loss  5.12 | ppl   168.003
| epoch  25 step    10500 |     36 batches | lr 0.000344 | ms/batch 312.93 | loss  5.13 | ppl   168.595
| epoch  25 step    10550 |     86 batches | lr 0.000344 | ms/batch 330.04 | loss  5.08 | ppl   161.189
| epoch  25 step    10600 |    136 batches | lr 0.000344 | ms/batch 317.49 | loss  5.11 | ppl   165.709
| epoch  25 step    10650 |    186 batches | lr 0.000344 | ms/batch 318.00 | loss  5.12 | ppl   167.257
| epoch  25 step    10700 |    236 batches | lr 0.000344 | ms/batch 315.44 | loss  5.12 | ppl   167.061
| epoch  25 step    10750 |    286 batches | lr 0.000344 | ms/batch 317.06 | loss  5.16 | ppl   174.491
| epoch  25 step    10800 |    336 batches | lr 0.000344 | ms/batch 314.51 | loss  5.03 | ppl   152.635
----------------------------------------------------------------------------------------------------
| Eval  27 at step    10800 | time: 132.10s | valid loss  4.69 | valid ppl   109.359
----------------------------------------------------------------------------------------------------
| epoch  25 step    10850 |    386 batches | lr 0.000344 | ms/batch 469.72 | loss  5.09 | ppl   162.314
| epoch  25 step    10900 |    436 batches | lr 0.000343 | ms/batch 312.65 | loss  5.09 | ppl   161.719
| epoch  26 step    10950 |     50 batches | lr 0.000343 | ms/batch 312.84 | loss  5.08 | ppl   160.060
| epoch  26 step    11000 |    100 batches | lr 0.000343 | ms/batch 313.93 | loss  5.07 | ppl   159.436
| epoch  26 step    11050 |    150 batches | lr 0.000343 | ms/batch 316.23 | loss  5.04 | ppl   154.506
| epoch  26 step    11100 |    200 batches | lr 0.000343 | ms/batch 314.86 | loss  5.09 | ppl   162.682
| epoch  26 step    11150 |    250 batches | lr 0.000343 | ms/batch 315.32 | loss  5.10 | ppl   163.217
| epoch  26 step    11200 |    300 batches | lr 0.000343 | ms/batch 316.26 | loss  5.12 | ppl   167.283
----------------------------------------------------------------------------------------------------
| Eval  28 at step    11200 | time: 130.95s | valid loss  4.68 | valid ppl   107.974
----------------------------------------------------------------------------------------------------
| epoch  26 step    11250 |    350 batches | lr 0.000343 | ms/batch 528.79 | loss  5.00 | ppl   149.134
| epoch  26 step    11300 |    400 batches | lr 0.000343 | ms/batch 316.71 | loss  5.06 | ppl   158.183
| epoch  27 step    11350 |     14 batches | lr 0.000343 | ms/batch 310.65 | loss  5.11 | ppl   166.475
| epoch  27 step    11400 |     64 batches | lr 0.000343 | ms/batch 318.50 | loss  5.06 | ppl   158.071
| epoch  27 step    11450 |    114 batches | lr 0.000343 | ms/batch 313.89 | loss  5.03 | ppl   153.280
| epoch  27 step    11500 |    164 batches | lr 0.000343 | ms/batch 313.62 | loss  5.04 | ppl   155.050
| epoch  27 step    11550 |    214 batches | lr 0.000343 | ms/batch 315.09 | loss  5.07 | ppl   159.373
| epoch  27 step    11600 |    264 batches | lr 0.000343 | ms/batch 315.01 | loss  5.07 | ppl   158.975
----------------------------------------------------------------------------------------------------
| Eval  29 at step    11600 | time: 130.97s | valid loss  4.65 | valid ppl   104.847
----------------------------------------------------------------------------------------------------
| epoch  27 step    11650 |    314 batches | lr 0.000343 | ms/batch 478.96 | loss  5.08 | ppl   161.340
| epoch  27 step    11700 |    364 batches | lr 0.000342 | ms/batch 317.30 | loss  5.02 | ppl   151.636
| epoch  27 step    11750 |    414 batches | lr 0.000342 | ms/batch 316.46 | loss  5.04 | ppl   154.458
| epoch  28 step    11800 |     28 batches | lr 0.000342 | ms/batch 313.54 | loss  5.08 | ppl   160.887
| epoch  28 step    11850 |     78 batches | lr 0.000342 | ms/batch 329.39 | loss  5.01 | ppl   149.928
| epoch  28 step    11900 |    128 batches | lr 0.000342 | ms/batch 331.86 | loss  5.06 | ppl   156.927
| epoch  28 step    11950 |    178 batches | lr 0.000342 | ms/batch 320.72 | loss  5.05 | ppl   156.401
| epoch  28 step    12000 |    228 batches | lr 0.000342 | ms/batch 317.00 | loss  5.07 | ppl   159.897
----------------------------------------------------------------------------------------------------
| Eval  30 at step    12000 | time: 133.30s | valid loss  4.66 | valid ppl   105.976
----------------------------------------------------------------------------------------------------
| epoch  28 step    12050 |    278 batches | lr 0.000342 | ms/batch 415.35 | loss  5.08 | ppl   160.887
| epoch  28 step    12100 |    328 batches | lr 0.000342 | ms/batch 317.67 | loss  5.01 | ppl   149.472
| epoch  28 step    12150 |    378 batches | lr 0.000342 | ms/batch 317.76 | loss  5.04 | ppl   154.325
| epoch  28 step    12200 |    428 batches | lr 0.000342 | ms/batch 316.77 | loss  5.07 | ppl   159.797
| epoch  29 step    12250 |     42 batches | lr 0.000342 | ms/batch 311.27 | loss  5.03 | ppl   153.424
| epoch  29 step    12300 |     92 batches | lr 0.000342 | ms/batch 315.12 | loss  5.00 | ppl   149.110
| epoch  29 step    12350 |    142 batches | lr 0.000342 | ms/batch 315.56 | loss  5.00 | ppl   148.692
| epoch  29 step    12400 |    192 batches | lr 0.000342 | ms/batch 316.68 | loss  5.04 | ppl   153.820
----------------------------------------------------------------------------------------------------
| Eval  31 at step    12400 | time: 131.36s | valid loss  4.66 | valid ppl   105.942
----------------------------------------------------------------------------------------------------
| epoch  29 step    12450 |    242 batches | lr 0.000342 | ms/batch 426.16 | loss  5.06 | ppl   157.172
| epoch  29 step    12500 |    292 batches | lr 0.000341 | ms/batch 322.03 | loss  5.08 | ppl   161.239
| epoch  29 step    12550 |    342 batches | lr 0.000341 | ms/batch 319.73 | loss  4.93 | ppl   138.715
| epoch  29 step    12600 |    392 batches | lr 0.000341 | ms/batch 314.33 | loss  5.01 | ppl   150.350
| epoch  30 step    12650 |      6 batches | lr 0.000341 | ms/batch 310.74 | loss  5.04 | ppl   154.808
| epoch  30 step    12700 |     56 batches | lr 0.000341 | ms/batch 317.15 | loss  5.01 | ppl   150.010
| epoch  30 step    12750 |    106 batches | lr 0.000341 | ms/batch 316.02 | loss  4.99 | ppl   146.707
| epoch  30 step    12800 |    156 batches | lr 0.000341 | ms/batch 315.10 | loss  5.01 | ppl   150.633
----------------------------------------------------------------------------------------------------
| Eval  32 at step    12800 | time: 132.14s | valid loss  4.62 | valid ppl   101.257
----------------------------------------------------------------------------------------------------
| epoch  30 step    12850 |    206 batches | lr 0.000341 | ms/batch 485.81 | loss  5.04 | ppl   154.386
| epoch  30 step    12900 |    256 batches | lr 0.000341 | ms/batch 316.55 | loss  5.05 | ppl   156.719
| epoch  30 step    12950 |    306 batches | lr 0.000341 | ms/batch 324.44 | loss  5.06 | ppl   157.087
| epoch  30 step    13000 |    356 batches | lr 0.000341 | ms/batch 314.87 | loss  4.94 | ppl   140.460
| epoch  30 step    13050 |    406 batches | lr 0.000341 | ms/batch 314.12 | loss  4.99 | ppl   146.661
| epoch  31 step    13100 |     20 batches | lr 0.000341 | ms/batch 312.86 | loss  5.07 | ppl   159.050
| epoch  31 step    13150 |     70 batches | lr 0.000341 | ms/batch 321.20 | loss  4.96 | ppl   142.940
| epoch  31 step    13200 |    120 batches | lr 0.00034 | ms/batch 315.07 | loss  5.00 | ppl   148.820
----------------------------------------------------------------------------------------------------
| Eval  33 at step    13200 | time: 132.50s | valid loss  4.63 | valid ppl   102.260
----------------------------------------------------------------------------------------------------
| epoch  31 step    13250 |    170 batches | lr 0.00034 | ms/batch 416.32 | loss  4.97 | ppl   144.308
| epoch  31 step    13300 |    220 batches | lr 0.00034 | ms/batch 316.21 | loss  5.04 | ppl   155.135
| epoch  31 step    13350 |    270 batches | lr 0.00034 | ms/batch 314.37 | loss  5.01 | ppl   149.402
| epoch  31 step    13400 |    320 batches | lr 0.00034 | ms/batch 314.46 | loss  5.02 | ppl   150.951
| epoch  31 step    13450 |    370 batches | lr 0.00034 | ms/batch 316.35 | loss  4.95 | ppl   141.352
| epoch  31 step    13500 |    420 batches | lr 0.00034 | ms/batch 313.73 | loss  5.00 | ppl   148.889
| epoch  32 step    13550 |     34 batches | lr 0.00034 | ms/batch 312.73 | loss  5.02 | ppl   150.691
| epoch  32 step    13600 |     84 batches | lr 0.00034 | ms/batch 315.82 | loss  4.97 | ppl   144.083
----------------------------------------------------------------------------------------------------
| Eval  34 at step    13600 | time: 131.01s | valid loss  4.62 | valid ppl   102.000
----------------------------------------------------------------------------------------------------
| epoch  32 step    13650 |    134 batches | lr 0.00034 | ms/batch 412.15 | loss  4.99 | ppl   147.650
| epoch  32 step    13700 |    184 batches | lr 0.00034 | ms/batch 317.50 | loss  4.99 | ppl   146.718
| epoch  32 step    13750 |    234 batches | lr 0.00034 | ms/batch 315.86 | loss  5.00 | ppl   148.077
| epoch  32 step    13800 |    284 batches | lr 0.00034 | ms/batch 315.71 | loss  5.04 | ppl   154.120
| epoch  32 step    13850 |    334 batches | lr 0.00034 | ms/batch 315.35 | loss  4.94 | ppl   139.334
| epoch  32 step    13900 |    384 batches | lr 0.000339 | ms/batch 316.96 | loss  4.98 | ppl   144.941
| epoch  32 step    13950 |    434 batches | lr 0.000339 | ms/batch 314.68 | loss  5.01 | ppl   149.297
| epoch  33 step    14000 |     48 batches | lr 0.000339 | ms/batch 309.83 | loss  4.97 | ppl   144.399
----------------------------------------------------------------------------------------------------
| Eval  35 at step    14000 | time: 130.94s | valid loss  4.61 | valid ppl   100.685
----------------------------------------------------------------------------------------------------
| epoch  33 step    14050 |     98 batches | lr 0.000339 | ms/batch 446.73 | loss  4.95 | ppl   140.625
| epoch  33 step    14100 |    148 batches | lr 0.000339 | ms/batch 316.39 | loss  4.97 | ppl   143.634
| epoch  33 step    14150 |    198 batches | lr 0.000339 | ms/batch 316.55 | loss  4.98 | ppl   145.225
| epoch  33 step    14200 |    248 batches | lr 0.000339 | ms/batch 315.64 | loss  4.99 | ppl   147.327
| epoch  33 step    14250 |    298 batches | lr 0.000339 | ms/batch 321.19 | loss  5.02 | ppl   151.258
| epoch  33 step    14300 |    348 batches | lr 0.000339 | ms/batch 317.24 | loss  4.89 | ppl   132.570
| epoch  33 step    14350 |    398 batches | lr 0.000339 | ms/batch 315.67 | loss  4.98 | ppl   145.395
| epoch  34 step    14400 |     12 batches | lr 0.000339 | ms/batch 311.40 | loss  5.01 | ppl   149.402
----------------------------------------------------------------------------------------------------
| Eval  36 at step    14400 | time: 131.46s | valid loss  4.60 | valid ppl    99.121
----------------------------------------------------------------------------------------------------
| epoch  34 step    14450 |     62 batches | lr 0.000339 | ms/batch 449.77 | loss  4.94 | ppl   140.438
| epoch  34 step    14500 |    112 batches | lr 0.000339 | ms/batch 328.17 | loss  4.96 | ppl   142.316
| epoch  34 step    14550 |    162 batches | lr 0.000338 | ms/batch 315.38 | loss  4.94 | ppl   139.956
| epoch  34 step    14600 |    212 batches | lr 0.000338 | ms/batch 315.04 | loss  4.99 | ppl   146.994
| epoch  34 step    14650 |    262 batches | lr 0.000338 | ms/batch 316.66 | loss  4.97 | ppl   144.387
| epoch  34 step    14700 |    312 batches | lr 0.000338 | ms/batch 317.22 | loss  4.97 | ppl   143.353
| epoch  34 step    14750 |    362 batches | lr 0.000338 | ms/batch 325.93 | loss  4.90 | ppl   134.952
| epoch  34 step    14800 |    412 batches | lr 0.000338 | ms/batch 315.07 | loss  4.97 | ppl   143.723
----------------------------------------------------------------------------------------------------
| Eval  37 at step    14800 | time: 132.47s | valid loss  4.59 | valid ppl    98.890
----------------------------------------------------------------------------------------------------
| epoch  35 step    14850 |     26 batches | lr 0.000338 | ms/batch 440.36 | loss  4.98 | ppl   145.827
| epoch  35 step    14900 |     76 batches | lr 0.000338 | ms/batch 315.58 | loss  4.95 | ppl   141.849
| epoch  35 step    14950 |    126 batches | lr 0.000338 | ms/batch 316.16 | loss  4.93 | ppl   138.823
| epoch  35 step    15000 |    176 batches | lr 0.000338 | ms/batch 315.30 | loss  4.95 | ppl   141.374
| epoch  35 step    15050 |    226 batches | lr 0.000338 | ms/batch 314.73 | loss  4.94 | ppl   140.109
| epoch  35 step    15100 |    276 batches | lr 0.000338 | ms/batch 315.03 | loss  4.99 | ppl   146.730
| epoch  35 step    15150 |    326 batches | lr 0.000337 | ms/batch 316.41 | loss  4.91 | ppl   135.565
| epoch  35 step    15200 |    376 batches | lr 0.000337 | ms/batch 315.89 | loss  4.93 | ppl   139.019
----------------------------------------------------------------------------------------------------
| Eval  38 at step    15200 | time: 130.81s | valid loss  4.57 | valid ppl    96.672
----------------------------------------------------------------------------------------------------
| epoch  35 step    15250 |    426 batches | lr 0.000337 | ms/batch 446.74 | loss  4.96 | ppl   142.494
| epoch  36 step    15300 |     40 batches | lr 0.000337 | ms/batch 310.22 | loss  4.97 | ppl   143.937
| epoch  36 step    15350 |     90 batches | lr 0.000337 | ms/batch 314.24 | loss  4.93 | ppl   137.969
| epoch  36 step    15400 |    140 batches | lr 0.000337 | ms/batch 314.51 | loss  4.92 | ppl   137.249
| epoch  36 step    15450 |    190 batches | lr 0.000337 | ms/batch 315.45 | loss  4.98 | ppl   145.497
| epoch  36 step    15500 |    240 batches | lr 0.000337 | ms/batch 315.43 | loss  4.96 | ppl   142.794
| epoch  36 step    15550 |    290 batches | lr 0.000337 | ms/batch 314.50 | loss  4.99 | ppl   146.879
| epoch  36 step    15600 |    340 batches | lr 0.000337 | ms/batch 316.77 | loss  4.87 | ppl   129.904
----------------------------------------------------------------------------------------------------
| Eval  39 at step    15600 | time: 130.78s | valid loss  4.58 | valid ppl    97.587
----------------------------------------------------------------------------------------------------
| epoch  36 step    15650 |    390 batches | lr 0.000337 | ms/batch 414.45 | loss  4.92 | ppl   137.142
| epoch  37 step    15700 |      4 batches | lr 0.000337 | ms/batch 309.42 | loss  4.99 | ppl   146.535
| epoch  37 step    15750 |     54 batches | lr 0.000336 | ms/batch 326.66 | loss  4.90 | ppl   134.416
| epoch  37 step    15800 |    104 batches | lr 0.000336 | ms/batch 332.31 | loss  4.91 | ppl   135.396
| epoch  37 step    15850 |    154 batches | lr 0.000336 | ms/batch 320.80 | loss  4.91 | ppl   136.266
| epoch  37 step    15900 |    204 batches | lr 0.000336 | ms/batch 316.33 | loss  4.96 | ppl   141.949
| epoch  37 step    15950 |    254 batches | lr 0.000336 | ms/batch 314.23 | loss  4.93 | ppl   137.872
| epoch  37 step    16000 |    304 batches | lr 0.000336 | ms/batch 316.23 | loss  4.99 | ppl   147.408
----------------------------------------------------------------------------------------------------
| Eval  40 at step    16000 | time: 132.56s | valid loss  4.56 | valid ppl    95.607
----------------------------------------------------------------------------------------------------
| epoch  37 step    16050 |    354 batches | lr 0.000336 | ms/batch 448.44 | loss  4.85 | ppl   128.000
| epoch  37 step    16100 |    404 batches | lr 0.000336 | ms/batch 328.56 | loss  4.91 | ppl   135.322
| epoch  38 step    16150 |     18 batches | lr 0.000336 | ms/batch 325.75 | loss  4.96 | ppl   142.917
| epoch  38 step    16200 |     68 batches | lr 0.000336 | ms/batch 319.24 | loss  4.87 | ppl   130.933
| epoch  38 step    16250 |    118 batches | lr 0.000336 | ms/batch 315.88 | loss  4.95 | ppl   141.098
| epoch  38 step    16300 |    168 batches | lr 0.000336 | ms/batch 315.09 | loss  4.92 | ppl   136.341
| epoch  38 step    16350 |    218 batches | lr 0.000335 | ms/batch 315.38 | loss  4.96 | ppl   142.494
| epoch  38 step    16400 |    268 batches | lr 0.000335 | ms/batch 315.24 | loss  4.91 | ppl   135.037
----------------------------------------------------------------------------------------------------
| Eval  41 at step    16400 | time: 132.48s | valid loss  4.56 | valid ppl    95.354
----------------------------------------------------------------------------------------------------
| epoch  38 step    16450 |    318 batches | lr 0.000335 | ms/batch 447.23 | loss  4.89 | ppl   132.974
| epoch  38 step    16500 |    368 batches | lr 0.000335 | ms/batch 314.72 | loss  4.87 | ppl   129.722
| epoch  38 step    16550 |    418 batches | lr 0.000335 | ms/batch 315.69 | loss  4.92 | ppl   136.383
| epoch  39 step    16600 |     32 batches | lr 0.000335 | ms/batch 308.27 | loss  4.95 | ppl   140.877
| epoch  39 step    16650 |     82 batches | lr 0.000335 | ms/batch 314.74 | loss  4.86 | ppl   128.803
| epoch  39 step    16700 |    132 batches | lr 0.000335 | ms/batch 315.25 | loss  4.89 | ppl   133.099
| epoch  39 step    16750 |    182 batches | lr 0.000335 | ms/batch 315.89 | loss  4.91 | ppl   135.629
| epoch  39 step    16800 |    232 batches | lr 0.000335 | ms/batch 314.22 | loss  4.92 | ppl   137.013
----------------------------------------------------------------------------------------------------
| Eval  42 at step    16800 | time: 130.64s | valid loss  4.56 | valid ppl    95.449
----------------------------------------------------------------------------------------------------
| epoch  39 step    16850 |    282 batches | lr 0.000335 | ms/batch 413.87 | loss  4.95 | ppl   140.680
| epoch  39 step    16900 |    332 batches | lr 0.000334 | ms/batch 315.13 | loss  4.87 | ppl   129.894
| epoch  39 step    16950 |    382 batches | lr 0.000334 | ms/batch 315.61 | loss  4.91 | ppl   135.100
| epoch  39 step    17000 |    432 batches | lr 0.000334 | ms/batch 315.89 | loss  4.92 | ppl   136.938
| epoch  40 step    17050 |     46 batches | lr 0.000334 | ms/batch 308.60 | loss  4.91 | ppl   135.502
| epoch  40 step    17100 |     96 batches | lr 0.000334 | ms/batch 315.54 | loss  4.88 | ppl   131.456
| epoch  40 step    17150 |    146 batches | lr 0.000334 | ms/batch 315.55 | loss  4.89 | ppl   133.297
| epoch  40 step    17200 |    196 batches | lr 0.000334 | ms/batch 314.92 | loss  4.94 | ppl   139.705
----------------------------------------------------------------------------------------------------
| Eval  43 at step    17200 | time: 130.76s | valid loss  4.55 | valid ppl    94.430
----------------------------------------------------------------------------------------------------
| epoch  40 step    17250 |    246 batches | lr 0.000334 | ms/batch 447.53 | loss  4.90 | ppl   134.721
| epoch  40 step    17300 |    296 batches | lr 0.000334 | ms/batch 314.86 | loss  4.94 | ppl   140.153
| epoch  40 step    17350 |    346 batches | lr 0.000334 | ms/batch 315.67 | loss  4.81 | ppl   123.058
| epoch  40 step    17400 |    396 batches | lr 0.000334 | ms/batch 314.80 | loss  4.89 | ppl   133.193
| epoch  41 step    17450 |     10 batches | lr 0.000333 | ms/batch 309.37 | loss  4.93 | ppl   138.358
| epoch  41 step    17500 |     60 batches | lr 0.000333 | ms/batch 316.71 | loss  4.87 | ppl   129.762
| epoch  41 step    17550 |    110 batches | lr 0.000333 | ms/batch 315.43 | loss  4.87 | ppl   129.823
| epoch  41 step    17600 |    160 batches | lr 0.000333 | ms/batch 315.63 | loss  4.88 | ppl   131.682
----------------------------------------------------------------------------------------------------
| Eval  44 at step    17600 | time: 130.80s | valid loss  4.53 | valid ppl    92.531
----------------------------------------------------------------------------------------------------
| epoch  41 step    17650 |    210 batches | lr 0.000333 | ms/batch 452.84 | loss  4.92 | ppl   137.560
| epoch  41 step    17700 |    260 batches | lr 0.000333 | ms/batch 314.00 | loss  4.88 | ppl   131.620
| epoch  41 step    17750 |    310 batches | lr 0.000333 | ms/batch 313.61 | loss  4.91 | ppl   136.021
| epoch  41 step    17800 |    360 batches | lr 0.000333 | ms/batch 313.11 | loss  4.83 | ppl   125.377
| epoch  41 step    17850 |    410 batches | lr 0.000333 | ms/batch 314.77 | loss  4.86 | ppl   128.772
| epoch  42 step    17900 |     24 batches | lr 0.000333 | ms/batch 307.45 | loss  4.90 | ppl   134.605
| epoch  42 step    17950 |     74 batches | lr 0.000332 | ms/batch 313.15 | loss  4.84 | ppl   126.519
| epoch  42 step    18000 |    124 batches | lr 0.000332 | ms/batch 315.12 | loss  4.86 | ppl   128.391
----------------------------------------------------------------------------------------------------
| Eval  45 at step    18000 | time: 130.44s | valid loss  4.53 | valid ppl    93.190
----------------------------------------------------------------------------------------------------
| epoch  42 step    18050 |    174 batches | lr 0.000332 | ms/batch 412.67 | loss  4.85 | ppl   128.351
| epoch  42 step    18100 |    224 batches | lr 0.000332 | ms/batch 313.93 | loss  4.90 | ppl   134.752
| epoch  42 step    18150 |    274 batches | lr 0.000332 | ms/batch 313.53 | loss  4.89 | ppl   132.342
| epoch  42 step    18200 |    324 batches | lr 0.000332 | ms/batch 314.45 | loss  4.86 | ppl   128.511
| epoch  42 step    18250 |    374 batches | lr 0.000332 | ms/batch 313.35 | loss  4.86 | ppl   128.591
| epoch  42 step    18300 |    424 batches | lr 0.000332 | ms/batch 313.97 | loss  4.86 | ppl   128.612
| epoch  43 step    18350 |     38 batches | lr 0.000332 | ms/batch 307.98 | loss  4.89 | ppl   133.172
| epoch  43 step    18400 |     88 batches | lr 0.000332 | ms/batch 315.16 | loss  4.83 | ppl   125.143
----------------------------------------------------------------------------------------------------
| Eval  46 at step    18400 | time: 130.22s | valid loss  4.54 | valid ppl    93.838
----------------------------------------------------------------------------------------------------
| epoch  43 step    18450 |    138 batches | lr 0.000332 | ms/batch 411.96 | loss  4.86 | ppl   129.357
| epoch  43 step    18500 |    188 batches | lr 0.000331 | ms/batch 313.84 | loss  4.88 | ppl   132.280
| epoch  43 step    18550 |    238 batches | lr 0.000331 | ms/batch 315.26 | loss  4.89 | ppl   132.435
| epoch  43 step    18600 |    288 batches | lr 0.000331 | ms/batch 314.99 | loss  4.93 | ppl   138.434
| epoch  43 step    18650 |    338 batches | lr 0.000331 | ms/batch 314.53 | loss  4.78 | ppl   118.983
| epoch  43 step    18700 |    388 batches | lr 0.000331 | ms/batch 312.81 | loss  4.87 | ppl   129.711
| epoch  44 step    18750 |      2 batches | lr 0.000331 | ms/batch 308.54 | loss  4.90 | ppl   133.777
| epoch  44 step    18800 |     52 batches | lr 0.000331 | ms/batch 315.49 | loss  4.82 | ppl   124.334
----------------------------------------------------------------------------------------------------
| Eval  47 at step    18800 | time: 130.43s | valid loss  4.52 | valid ppl    92.102
----------------------------------------------------------------------------------------------------
| epoch  44 step    18850 |    102 batches | lr 0.000331 | ms/batch 446.30 | loss  4.83 | ppl   125.084
| epoch  44 step    18900 |    152 batches | lr 0.000331 | ms/batch 314.03 | loss  4.86 | ppl   129.307
| epoch  44 step    18950 |    202 batches | lr 0.000331 | ms/batch 313.79 | loss  4.90 | ppl   134.900
| epoch  44 step    19000 |    252 batches | lr 0.00033 | ms/batch 313.43 | loss  4.89 | ppl   132.394
| epoch  44 step    19050 |    302 batches | lr 0.00033 | ms/batch 315.21 | loss  4.91 | ppl   135.990
| epoch  44 step    19100 |    352 batches | lr 0.00033 | ms/batch 313.03 | loss  4.79 | ppl   120.414
| epoch  44 step    19150 |    402 batches | lr 0.00033 | ms/batch 314.86 | loss  4.86 | ppl   129.024
| epoch  45 step    19200 |     16 batches | lr 0.00033 | ms/batch 307.70 | loss  4.88 | ppl   132.125
----------------------------------------------------------------------------------------------------
| Eval  48 at step    19200 | time: 130.24s | valid loss  4.53 | valid ppl    92.774
----------------------------------------------------------------------------------------------------
| epoch  45 step    19250 |     66 batches | lr 0.00033 | ms/batch 411.48 | loss  4.79 | ppl   119.776
| epoch  45 step    19300 |    116 batches | lr 0.00033 | ms/batch 316.74 | loss  4.86 | ppl   128.531
| epoch  45 step    19350 |    166 batches | lr 0.00033 | ms/batch 314.37 | loss  4.84 | ppl   126.578
| epoch  45 step    19400 |    216 batches | lr 0.00033 | ms/batch 314.27 | loss  4.90 | ppl   134.942
| epoch  45 step    19450 |    266 batches | lr 0.00033 | ms/batch 313.47 | loss  4.87 | ppl   130.535
| epoch  45 step    19500 |    316 batches | lr 0.000329 | ms/batch 313.79 | loss  4.86 | ppl   128.421
| epoch  45 step    19550 |    366 batches | lr 0.000329 | ms/batch 315.18 | loss  4.79 | ppl   120.236
| epoch  45 step    19600 |    416 batches | lr 0.000329 | ms/batch 316.33 | loss  4.87 | ppl   129.732
----------------------------------------------------------------------------------------------------
| Eval  49 at step    19600 | time: 130.84s | valid loss  4.51 | valid ppl    91.179
----------------------------------------------------------------------------------------------------
| epoch  46 step    19650 |     30 batches | lr 0.000329 | ms/batch 452.18 | loss  4.89 | ppl   132.363
| epoch  46 step    19700 |     80 batches | lr 0.000329 | ms/batch 314.53 | loss  4.82 | ppl   124.538
| epoch  46 step    19750 |    130 batches | lr 0.000329 | ms/batch 313.84 | loss  4.84 | ppl   126.875
| epoch  46 step    19800 |    180 batches | lr 0.000329 | ms/batch 314.35 | loss  4.84 | ppl   126.410
| epoch  46 step    19850 |    230 batches | lr 0.000329 | ms/batch 315.75 | loss  4.83 | ppl   125.417
| epoch  46 step    19900 |    280 batches | lr 0.000329 | ms/batch 314.93 | loss  4.90 | ppl   134.321
| epoch  46 step    19950 |    330 batches | lr 0.000328 | ms/batch 313.23 | loss  4.79 | ppl   119.870
| epoch  46 step    20000 |    380 batches | lr 0.000328 | ms/batch 314.71 | loss  4.85 | ppl   127.372
----------------------------------------------------------------------------------------------------
| Eval  50 at step    20000 | time: 130.36s | valid loss  4.51 | valid ppl    90.876
----------------------------------------------------------------------------------------------------
| epoch  46 step    20050 |    430 batches | lr 0.000328 | ms/batch 445.97 | loss  4.83 | ppl   125.770
| epoch  47 step    20100 |     44 batches | lr 0.000328 | ms/batch 309.62 | loss  4.84 | ppl   126.390
| epoch  47 step    20150 |     94 batches | lr 0.000328 | ms/batch 315.45 | loss  4.80 | ppl   121.520
| epoch  47 step    20200 |    144 batches | lr 0.000328 | ms/batch 314.60 | loss  4.83 | ppl   125.299
| epoch  47 step    20250 |    194 batches | lr 0.000328 | ms/batch 315.90 | loss  4.86 | ppl   128.682
| epoch  47 step    20300 |    244 batches | lr 0.000328 | ms/batch 313.32 | loss  4.86 | ppl   128.451
| epoch  47 step    20350 |    294 batches | lr 0.000328 | ms/batch 312.81 | loss  4.89 | ppl   132.694
| epoch  47 step    20400 |    344 batches | lr 0.000327 | ms/batch 315.10 | loss  4.76 | ppl   116.828
----------------------------------------------------------------------------------------------------
| Eval  51 at step    20400 | time: 130.54s | valid loss  4.51 | valid ppl    91.038
----------------------------------------------------------------------------------------------------
| epoch  47 step    20450 |    394 batches | lr 0.000327 | ms/batch 415.24 | loss  4.84 | ppl   126.618
| epoch  48 step    20500 |      8 batches | lr 0.000327 | ms/batch 310.39 | loss  4.86 | ppl   128.984
| epoch  48 step    20550 |     58 batches | lr 0.000327 | ms/batch 331.26 | loss  4.77 | ppl   118.307
| epoch  48 step    20600 |    108 batches | lr 0.000327 | ms/batch 331.15 | loss  4.79 | ppl   119.814
| epoch  48 step    20650 |    158 batches | lr 0.000327 | ms/batch 332.31 | loss  4.81 | ppl   122.607
| epoch  48 step    20700 |    208 batches | lr 0.000327 | ms/batch 331.67 | loss  4.83 | ppl   124.928
| epoch  48 step    20750 |    258 batches | lr 0.000327 | ms/batch 330.80 | loss  4.83 | ppl   124.645
| epoch  48 step    20800 |    308 batches | lr 0.000327 | ms/batch 333.02 | loss  4.84 | ppl   126.430
----------------------------------------------------------------------------------------------------
| Eval  52 at step    20800 | time: 136.06s | valid loss  4.50 | valid ppl    90.272
----------------------------------------------------------------------------------------------------
| epoch  48 step    20850 |    358 batches | lr 0.000327 | ms/batch 494.41 | loss  4.75 | ppl   115.629
| epoch  48 step    20900 |    408 batches | lr 0.000326 | ms/batch 332.53 | loss  4.82 | ppl   123.569
| epoch  49 step    20950 |     22 batches | lr 0.000326 | ms/batch 326.12 | loss  4.86 | ppl   128.622
| epoch  49 step    21000 |     72 batches | lr 0.000326 | ms/batch 331.88 | loss  4.77 | ppl   117.744
| epoch  49 step    21050 |    122 batches | lr 0.000326 | ms/batch 313.90 | loss  4.82 | ppl   123.385
| epoch  49 step    21100 |    172 batches | lr 0.000326 | ms/batch 313.33 | loss  4.79 | ppl   120.499
| epoch  49 step    21150 |    222 batches | lr 0.000326 | ms/batch 313.44 | loss  4.83 | ppl   125.260
| epoch  49 step    21200 |    272 batches | lr 0.000326 | ms/batch 314.16 | loss  4.85 | ppl   127.720
----------------------------------------------------------------------------------------------------
| Eval  53 at step    21200 | time: 133.85s | valid loss  4.50 | valid ppl    89.912
----------------------------------------------------------------------------------------------------
| epoch  49 step    21250 |    322 batches | lr 0.000326 | ms/batch 461.04 | loss  4.78 | ppl   119.337
| epoch  49 step    21300 |    372 batches | lr 0.000326 | ms/batch 315.32 | loss  4.78 | ppl   119.365
| epoch  49 step    21350 |    422 batches | lr 0.000325 | ms/batch 315.49 | loss  4.80 | ppl   121.425
| epoch  50 step    21400 |     36 batches | lr 0.000325 | ms/batch 308.28 | loss  4.82 | ppl   123.936
| epoch  50 step    21450 |     86 batches | lr 0.000325 | ms/batch 313.95 | loss  4.77 | ppl   117.496
| epoch  50 step    21500 |    136 batches | lr 0.000325 | ms/batch 314.83 | loss  4.80 | ppl   121.027
| epoch  50 step    21550 |    186 batches | lr 0.000325 | ms/batch 313.92 | loss  4.83 | ppl   125.231
| epoch  50 step    21600 |    236 batches | lr 0.000325 | ms/batch 314.40 | loss  4.82 | ppl   123.685
----------------------------------------------------------------------------------------------------
| Eval  54 at step    21600 | time: 130.48s | valid loss  4.50 | valid ppl    89.715
----------------------------------------------------------------------------------------------------
| epoch  50 step    21650 |    286 batches | lr 0.000325 | ms/batch 446.25 | loss  4.84 | ppl   126.361
| epoch  50 step    21700 |    336 batches | lr 0.000325 | ms/batch 313.92 | loss  4.74 | ppl   114.015
| epoch  50 step    21750 |    386 batches | lr 0.000324 | ms/batch 314.33 | loss  4.81 | ppl   122.808
| epoch  50 step    21800 |    436 batches | lr 0.000324 | ms/batch 309.38 | loss  4.83 | ppl   124.664
| epoch  51 step    21850 |     50 batches | lr 0.000324 | ms/batch 314.85 | loss  4.78 | ppl   118.566
| epoch  51 step    21900 |    100 batches | lr 0.000324 | ms/batch 314.36 | loss  4.78 | ppl   119.244
| epoch  51 step    21950 |    150 batches | lr 0.000324 | ms/batch 314.21 | loss  4.77 | ppl   117.597
| epoch  51 step    22000 |    200 batches | lr 0.000324 | ms/batch 315.31 | loss  4.85 | ppl   127.671
----------------------------------------------------------------------------------------------------
| Eval  55 at step    22000 | time: 130.48s | valid loss  4.48 | valid ppl    88.071
----------------------------------------------------------------------------------------------------
| epoch  51 step    22050 |    250 batches | lr 0.000324 | ms/batch 475.07 | loss  4.80 | ppl   121.056
| epoch  51 step    22100 |    300 batches | lr 0.000324 | ms/batch 315.86 | loss  4.82 | ppl   124.421
| epoch  51 step    22150 |    350 batches | lr 0.000324 | ms/batch 315.38 | loss  4.72 | ppl   112.686
| epoch  51 step    22200 |    400 batches | lr 0.000323 | ms/batch 315.88 | loss  4.80 | ppl   121.558
| epoch  52 step    22250 |     14 batches | lr 0.000323 | ms/batch 310.21 | loss  4.84 | ppl   126.548
| epoch  52 step    22300 |     64 batches | lr 0.000323 | ms/batch 315.16 | loss  4.73 | ppl   113.775
| epoch  52 step    22350 |    114 batches | lr 0.000323 | ms/batch 315.42 | loss  4.78 | ppl   119.477
| epoch  52 step    22400 |    164 batches | lr 0.000323 | ms/batch 316.63 | loss  4.81 | ppl   122.789
----------------------------------------------------------------------------------------------------
| Eval  56 at step    22400 | time: 131.05s | valid loss  4.47 | valid ppl    87.713
----------------------------------------------------------------------------------------------------
| epoch  52 step    22450 |    214 batches | lr 0.000323 | ms/batch 447.15 | loss  4.84 | ppl   126.311
| epoch  52 step    22500 |    264 batches | lr 0.000323 | ms/batch 314.67 | loss  4.81 | ppl   123.318
| epoch  52 step    22550 |    314 batches | lr 0.000323 | ms/batch 314.63 | loss  4.83 | ppl   125.839
| epoch  52 step    22600 |    364 batches | lr 0.000323 | ms/batch 315.74 | loss  4.74 | ppl   114.416
| epoch  52 step    22650 |    414 batches | lr 0.000322 | ms/batch 315.81 | loss  4.80 | ppl   121.425
| epoch  53 step    22700 |     28 batches | lr 0.000322 | ms/batch 310.43 | loss  4.81 | ppl   123.048
| epoch  53 step    22750 |     78 batches | lr 0.000322 | ms/batch 316.63 | loss  4.74 | ppl   114.363
| epoch  53 step    22800 |    128 batches | lr 0.000322 | ms/batch 315.25 | loss  4.76 | ppl   117.066
----------------------------------------------------------------------------------------------------
| Eval  57 at step    22800 | time: 130.88s | valid loss  4.48 | valid ppl    87.916
----------------------------------------------------------------------------------------------------
| epoch  53 step    22850 |    178 batches | lr 0.000322 | ms/batch 414.79 | loss  4.77 | ppl   118.381
| epoch  53 step    22900 |    228 batches | lr 0.000322 | ms/batch 314.73 | loss  4.80 | ppl   120.942
| epoch  53 step    22950 |    278 batches | lr 0.000322 | ms/batch 315.62 | loss  4.83 | ppl   124.606
| epoch  53 step    23000 |    328 batches | lr 0.000322 | ms/batch 315.51 | loss  4.76 | ppl   116.737
| epoch  53 step    23050 |    378 batches | lr 0.000321 | ms/batch 314.40 | loss  4.78 | ppl   119.021
| epoch  53 step    23100 |    428 batches | lr 0.000321 | ms/batch 314.88 | loss  4.80 | ppl   121.710
| epoch  54 step    23150 |     42 batches | lr 0.000321 | ms/batch 310.36 | loss  4.76 | ppl   116.719
| epoch  54 step    23200 |     92 batches | lr 0.000321 | ms/batch 314.21 | loss  4.75 | ppl   115.368
----------------------------------------------------------------------------------------------------
| Eval  58 at step    23200 | time: 130.71s | valid loss  4.48 | valid ppl    88.292
----------------------------------------------------------------------------------------------------
| epoch  54 step    23250 |    142 batches | lr 0.000321 | ms/batch 413.90 | loss  4.75 | ppl   116.073
| epoch  54 step    23300 |    192 batches | lr 0.000321 | ms/batch 316.84 | loss  4.80 | ppl   121.027
| epoch  54 step    23350 |    242 batches | lr 0.000321 | ms/batch 315.50 | loss  4.79 | ppl   120.264
| epoch  54 step    23400 |    292 batches | lr 0.000321 | ms/batch 330.40 | loss  4.82 | ppl   123.704
| epoch  54 step    23450 |    342 batches | lr 0.00032 | ms/batch 330.55 | loss  4.70 | ppl   110.136
| epoch  54 step    23500 |    392 batches | lr 0.00032 | ms/batch 332.57 | loss  4.78 | ppl   118.770
| epoch  55 step    23550 |      6 batches | lr 0.00032 | ms/batch 317.51 | loss  4.79 | ppl   120.189
| epoch  55 step    23600 |     56 batches | lr 0.00032 | ms/batch 314.92 | loss  4.73 | ppl   113.650
----------------------------------------------------------------------------------------------------
| Eval  59 at step    23600 | time: 133.64s | valid loss  4.47 | valid ppl    87.037
----------------------------------------------------------------------------------------------------
| epoch  55 step    23650 |    106 batches | lr 0.00032 | ms/batch 447.34 | loss  4.76 | ppl   116.391
| epoch  55 step    23700 |    156 batches | lr 0.00032 | ms/batch 314.24 | loss  4.75 | ppl   115.260
| epoch  55 step    23750 |    206 batches | lr 0.00032 | ms/batch 313.88 | loss  4.79 | ppl   120.029
| epoch  55 step    23800 |    256 batches | lr 0.00032 | ms/batch 314.47 | loss  4.79 | ppl   120.555
| epoch  55 step    23850 |    306 batches | lr 0.000319 | ms/batch 313.95 | loss  4.81 | ppl   122.435
| epoch  55 step    23900 |    356 batches | lr 0.000319 | ms/batch 314.62 | loss  4.70 | ppl   109.784
| epoch  55 step    23950 |    406 batches | lr 0.000319 | ms/batch 314.78 | loss  4.76 | ppl   116.291
| epoch  56 step    24000 |     20 batches | lr 0.000319 | ms/batch 308.23 | loss  4.81 | ppl   123.068
----------------------------------------------------------------------------------------------------
| Eval  60 at step    24000 | time: 130.36s | valid loss  4.48 | valid ppl    88.396
----------------------------------------------------------------------------------------------------
| epoch  56 step    24050 |     70 batches | lr 0.000319 | ms/batch 412.62 | loss  4.74 | ppl   114.193
| epoch  56 step    24100 |    120 batches | lr 0.000319 | ms/batch 314.78 | loss  4.74 | ppl   114.042
| epoch  56 step    24150 |    170 batches | lr 0.000319 | ms/batch 313.37 | loss  4.75 | ppl   115.215
| epoch  56 step    24200 |    220 batches | lr 0.000319 | ms/batch 313.73 | loss  4.79 | ppl   119.804
| epoch  56 step    24250 |    270 batches | lr 0.000318 | ms/batch 313.72 | loss  4.78 | ppl   119.021
| epoch  56 step    24300 |    320 batches | lr 0.000318 | ms/batch 314.23 | loss  4.78 | ppl   118.640
| epoch  56 step    24350 |    370 batches | lr 0.000318 | ms/batch 314.03 | loss  4.73 | ppl   112.854
| epoch  56 step    24400 |    420 batches | lr 0.000318 | ms/batch 316.23 | loss  4.75 | ppl   116.136
----------------------------------------------------------------------------------------------------
| Eval  61 at step    24400 | time: 130.61s | valid loss  4.46 | valid ppl    86.477
----------------------------------------------------------------------------------------------------
| epoch  57 step    24450 |     34 batches | lr 0.000318 | ms/batch 442.00 | loss  4.79 | ppl   120.207
| epoch  57 step    24500 |     84 batches | lr 0.000318 | ms/batch 315.34 | loss  4.72 | ppl   111.871
| epoch  57 step    24550 |    134 batches | lr 0.000318 | ms/batch 313.57 | loss  4.78 | ppl   119.365
| epoch  57 step    24600 |    184 batches | lr 0.000318 | ms/batch 313.65 | loss  4.75 | ppl   115.747
| epoch  57 step    24650 |    234 batches | lr 0.000317 | ms/batch 313.85 | loss  4.78 | ppl   119.617
| epoch  57 step    24700 |    284 batches | lr 0.000317 | ms/batch 314.62 | loss  4.79 | ppl   120.461
| epoch  57 step    24750 |    334 batches | lr 0.000317 | ms/batch 314.34 | loss  4.71 | ppl   111.313
| epoch  57 step    24800 |    384 batches | lr 0.000317 | ms/batch 316.18 | loss  4.75 | ppl   115.747
----------------------------------------------------------------------------------------------------
| Eval  62 at step    24800 | time: 130.54s | valid loss  4.47 | valid ppl    87.130
----------------------------------------------------------------------------------------------------
| epoch  57 step    24850 |    434 batches | lr 0.000317 | ms/batch 414.86 | loss  4.79 | ppl   120.763
| epoch  58 step    24900 |     48 batches | lr 0.000317 | ms/batch 310.07 | loss  4.72 | ppl   112.546
| epoch  58 step    24950 |     98 batches | lr 0.000317 | ms/batch 314.26 | loss  4.70 | ppl   109.681
| epoch  58 step    25000 |    148 batches | lr 0.000317 | ms/batch 314.88 | loss  4.76 | ppl   116.391
| epoch  58 step    25050 |    198 batches | lr 0.000316 | ms/batch 314.50 | loss  4.77 | ppl   118.168
| epoch  58 step    25100 |    248 batches | lr 0.000316 | ms/batch 313.62 | loss  4.76 | ppl   116.627
| epoch  58 step    25150 |    298 batches | lr 0.000316 | ms/batch 314.29 | loss  4.78 | ppl   119.412
| epoch  58 step    25200 |    348 batches | lr 0.000316 | ms/batch 315.77 | loss  4.66 | ppl   105.298
----------------------------------------------------------------------------------------------------
| Eval  63 at step    25200 | time: 130.59s | valid loss  4.44 | valid ppl    84.831
----------------------------------------------------------------------------------------------------
| epoch  58 step    25250 |    398 batches | lr 0.000316 | ms/batch 445.82 | loss  4.76 | ppl   116.573
| epoch  59 step    25300 |     12 batches | lr 0.000316 | ms/batch 309.79 | loss  4.78 | ppl   119.095
| epoch  59 step    25350 |     62 batches | lr 0.000316 | ms/batch 316.11 | loss  4.70 | ppl   110.395
| epoch  59 step    25400 |    112 batches | lr 0.000316 | ms/batch 317.44 | loss  4.74 | ppl   114.336
| epoch  59 step    25450 |    162 batches | lr 0.000315 | ms/batch 316.35 | loss  4.74 | ppl   113.872
| epoch  59 step    25500 |    212 batches | lr 0.000315 | ms/batch 315.66 | loss  4.76 | ppl   117.212
| epoch  59 step    25550 |    262 batches | lr 0.000315 | ms/batch 316.67 | loss  4.77 | ppl   118.251
| epoch  59 step    25600 |    312 batches | lr 0.000315 | ms/batch 317.21 | loss  4.77 | ppl   118.085
----------------------------------------------------------------------------------------------------
| Eval  64 at step    25600 | time: 131.19s | valid loss  4.46 | valid ppl    86.709
----------------------------------------------------------------------------------------------------
| epoch  59 step    25650 |    362 batches | lr 0.000315 | ms/batch 415.82 | loss  4.67 | ppl   106.606
| epoch  59 step    25700 |    412 batches | lr 0.000315 | ms/batch 313.93 | loss  4.76 | ppl   116.245
| epoch  60 step    25750 |     26 batches | lr 0.000315 | ms/batch 307.62 | loss  4.79 | ppl   120.819
| epoch  60 step    25800 |     76 batches | lr 0.000314 | ms/batch 315.13 | loss  4.68 | ppl   107.266
| epoch  60 step    25850 |    126 batches | lr 0.000314 | ms/batch 314.11 | loss  4.75 | ppl   115.143
| epoch  60 step    25900 |    176 batches | lr 0.000314 | ms/batch 315.51 | loss  4.74 | ppl   114.604
| epoch  60 step    25950 |    226 batches | lr 0.000314 | ms/batch 314.79 | loss  4.75 | ppl   115.323
| epoch  60 step    26000 |    276 batches | lr 0.000314 | ms/batch 315.94 | loss  4.78 | ppl   119.095
----------------------------------------------------------------------------------------------------
| Eval  65 at step    26000 | time: 130.66s | valid loss  4.45 | valid ppl    85.889
----------------------------------------------------------------------------------------------------
| epoch  60 step    26050 |    326 batches | lr 0.000314 | ms/batch 415.56 | loss  4.70 | ppl   109.587
| epoch  60 step    26100 |    376 batches | lr 0.000314 | ms/batch 317.02 | loss  4.73 | ppl   113.570
| epoch  60 step    26150 |    426 batches | lr 0.000314 | ms/batch 314.62 | loss  4.73 | ppl   112.854
| epoch  61 step    26200 |     40 batches | lr 0.000313 | ms/batch 308.16 | loss  4.73 | ppl   113.198
| epoch  61 step    26250 |     90 batches | lr 0.000313 | ms/batch 313.86 | loss  4.69 | ppl   109.262
| epoch  61 step    26300 |    140 batches | lr 0.000313 | ms/batch 318.07 | loss  4.75 | ppl   115.089
| epoch  61 step    26350 |    190 batches | lr 0.000313 | ms/batch 316.60 | loss  4.73 | ppl   113.632
| epoch  61 step    26400 |    240 batches | lr 0.000313 | ms/batch 317.19 | loss  4.74 | ppl   114.238
----------------------------------------------------------------------------------------------------
| Eval  66 at step    26400 | time: 131.07s | valid loss  4.44 | valid ppl    84.604
----------------------------------------------------------------------------------------------------
| epoch  61 step    26450 |    290 batches | lr 0.000313 | ms/batch 448.10 | loss  4.77 | ppl   118.187
| epoch  61 step    26500 |    340 batches | lr 0.000313 | ms/batch 316.74 | loss  4.67 | ppl   106.990
| epoch  61 step    26550 |    390 batches | lr 0.000312 | ms/batch 317.41 | loss  4.74 | ppl   114.354
| epoch  62 step    26600 |      4 batches | lr 0.000312 | ms/batch 308.55 | loss  4.77 | ppl   118.344
| epoch  62 step    26650 |     54 batches | lr 0.000312 | ms/batch 315.07 | loss  4.72 | ppl   112.002
| epoch  62 step    26700 |    104 batches | lr 0.000312 | ms/batch 315.12 | loss  4.72 | ppl   112.475
| epoch  62 step    26750 |    154 batches | lr 0.000312 | ms/batch 315.17 | loss  4.72 | ppl   112.221
| epoch  62 step    26800 |    204 batches | lr 0.000312 | ms/batch 315.10 | loss  4.75 | ppl   115.395
----------------------------------------------------------------------------------------------------
| Eval  67 at step    26800 | time: 130.92s | valid loss  4.44 | valid ppl    84.679
----------------------------------------------------------------------------------------------------
| epoch  62 step    26850 |    254 batches | lr 0.000312 | ms/batch 412.10 | loss  4.76 | ppl   116.273
| epoch  62 step    26900 |    304 batches | lr 0.000312 | ms/batch 314.59 | loss  4.79 | ppl   120.848
| epoch  62 step    26950 |    354 batches | lr 0.000311 | ms/batch 314.65 | loss  4.65 | ppl   104.593
| epoch  62 step    27000 |    404 batches | lr 0.000311 | ms/batch 314.08 | loss  4.73 | ppl   113.420
| epoch  63 step    27050 |     18 batches | lr 0.000311 | ms/batch 309.48 | loss  4.75 | ppl   115.134
| epoch  63 step    27100 |     68 batches | lr 0.000311 | ms/batch 316.14 | loss  4.71 | ppl   111.304
| epoch  63 step    27150 |    118 batches | lr 0.000311 | ms/batch 315.41 | loss  4.73 | ppl   113.650
| epoch  63 step    27200 |    168 batches | lr 0.000311 | ms/batch 314.68 | loss  4.73 | ppl   112.757
----------------------------------------------------------------------------------------------------
| Eval  68 at step    27200 | time: 130.51s | valid loss  4.44 | valid ppl    84.374
----------------------------------------------------------------------------------------------------
| epoch  63 step    27250 |    218 batches | lr 0.000311 | ms/batch 448.77 | loss  4.75 | ppl   115.395
| epoch  63 step    27300 |    268 batches | lr 0.00031 | ms/batch 314.58 | loss  4.75 | ppl   115.350
| epoch  63 step    27350 |    318 batches | lr 0.00031 | ms/batch 314.83 | loss  4.70 | ppl   110.464
| epoch  63 step    27400 |    368 batches | lr 0.00031 | ms/batch 313.53 | loss  4.66 | ppl   105.793
| epoch  63 step    27450 |    418 batches | lr 0.00031 | ms/batch 314.61 | loss  4.72 | ppl   111.705
| epoch  64 step    27500 |     32 batches | lr 0.00031 | ms/batch 310.01 | loss  4.75 | ppl   116.100
| epoch  64 step    27550 |     82 batches | lr 0.00031 | ms/batch 315.93 | loss  4.68 | ppl   107.837
| epoch  64 step    27600 |    132 batches | lr 0.00031 | ms/batch 317.15 | loss  4.72 | ppl   111.713
----------------------------------------------------------------------------------------------------
| Eval  69 at step    27600 | time: 130.74s | valid loss  4.44 | valid ppl    84.771
----------------------------------------------------------------------------------------------------
| epoch  64 step    27650 |    182 batches | lr 0.000309 | ms/batch 415.25 | loss  4.71 | ppl   111.313
| epoch  64 step    27700 |    232 batches | lr 0.000309 | ms/batch 315.18 | loss  4.75 | ppl   115.242
| epoch  64 step    27750 |    282 batches | lr 0.000309 | ms/batch 316.22 | loss  4.76 | ppl   116.209
| epoch  64 step    27800 |    332 batches | lr 0.000309 | ms/batch 316.28 | loss  4.69 | ppl   108.887
| epoch  64 step    27850 |    382 batches | lr 0.000309 | ms/batch 315.48 | loss  4.72 | ppl   111.679
| epoch  64 step    27900 |    432 batches | lr 0.000309 | ms/batch 315.24 | loss  4.73 | ppl   113.473
| epoch  65 step    27950 |     46 batches | lr 0.000309 | ms/batch 308.39 | loss  4.69 | ppl   108.319
| epoch  65 step    28000 |     96 batches | lr 0.000308 | ms/batch 315.80 | loss  4.69 | ppl   108.361
----------------------------------------------------------------------------------------------------
| Eval  70 at step    28000 | time: 130.90s | valid loss  4.44 | valid ppl    84.477
----------------------------------------------------------------------------------------------------
| epoch  65 step    28050 |    146 batches | lr 0.000308 | ms/batch 415.04 | loss  4.69 | ppl   109.083
| epoch  65 step    28100 |    196 batches | lr 0.000308 | ms/batch 313.94 | loss  4.73 | ppl   113.695
| epoch  65 step    28150 |    246 batches | lr 0.000308 | ms/batch 314.63 | loss  4.72 | ppl   111.879
| epoch  65 step    28200 |    296 batches | lr 0.000308 | ms/batch 315.75 | loss  4.76 | ppl   116.282
| epoch  65 step    28250 |    346 batches | lr 0.000308 | ms/batch 314.68 | loss  4.63 | ppl   102.426
| epoch  65 step    28300 |    396 batches | lr 0.000308 | ms/batch 314.44 | loss  4.70 | ppl   109.896
| epoch  66 step    28350 |     10 batches | lr 0.000307 | ms/batch 308.97 | loss  4.75 | ppl   115.278
| epoch  66 step    28400 |     60 batches | lr 0.000307 | ms/batch 315.06 | loss  4.66 | ppl   105.117
----------------------------------------------------------------------------------------------------
| Eval  71 at step    28400 | time: 130.61s | valid loss  4.41 | valid ppl    82.358
----------------------------------------------------------------------------------------------------
| epoch  66 step    28450 |    110 batches | lr 0.000307 | ms/batch 448.28 | loss  4.67 | ppl   106.282
| epoch  66 step    28500 |    160 batches | lr 0.000307 | ms/batch 313.69 | loss  4.73 | ppl   112.871
| epoch  66 step    28550 |    210 batches | lr 0.000307 | ms/batch 317.48 | loss  4.73 | ppl   112.827
| epoch  66 step    28600 |    260 batches | lr 0.000307 | ms/batch 317.27 | loss  4.72 | ppl   112.423
| epoch  66 step    28650 |    310 batches | lr 0.000307 | ms/batch 316.33 | loss  4.72 | ppl   112.046
| epoch  66 step    28700 |    360 batches | lr 0.000306 | ms/batch 317.14 | loss  4.64 | ppl   103.350
| epoch  66 step    28750 |    410 batches | lr 0.000306 | ms/batch 315.64 | loss  4.70 | ppl   110.222
| epoch  67 step    28800 |     24 batches | lr 0.000306 | ms/batch 308.22 | loss  4.74 | ppl   114.774
----------------------------------------------------------------------------------------------------
| Eval  72 at step    28800 | time: 131.01s | valid loss  4.42 | valid ppl    83.418
----------------------------------------------------------------------------------------------------
| epoch  67 step    28850 |     74 batches | lr 0.000306 | ms/batch 413.73 | loss  4.67 | ppl   106.166
| epoch  67 step    28900 |    124 batches | lr 0.000306 | ms/batch 315.36 | loss  4.68 | ppl   107.375
| epoch  67 step    28950 |    174 batches | lr 0.000306 | ms/batch 317.31 | loss  4.72 | ppl   112.475
| epoch  67 step    29000 |    224 batches | lr 0.000306 | ms/batch 314.63 | loss  4.72 | ppl   112.563
| epoch  67 step    29050 |    274 batches | lr 0.000305 | ms/batch 314.60 | loss  4.73 | ppl   112.942
| epoch  67 step    29100 |    324 batches | lr 0.000305 | ms/batch 315.92 | loss  4.69 | ppl   109.134
| epoch  67 step    29150 |    374 batches | lr 0.000305 | ms/batch 314.50 | loss  4.70 | ppl   110.154
| epoch  67 step    29200 |    424 batches | lr 0.000305 | ms/batch 314.30 | loss  4.69 | ppl   108.768
----------------------------------------------------------------------------------------------------
| Eval  73 at step    29200 | time: 131.02s | valid loss  4.42 | valid ppl    83.401
----------------------------------------------------------------------------------------------------
| epoch  68 step    29250 |     38 batches | lr 0.000305 | ms/batch 407.64 | loss  4.71 | ppl   111.382
| epoch  68 step    29300 |     88 batches | lr 0.000305 | ms/batch 315.02 | loss  4.66 | ppl   105.126
| epoch  68 step    29350 |    138 batches | lr 0.000305 | ms/batch 314.57 | loss  4.69 | ppl   109.228
| epoch  68 step    29400 |    188 batches | lr 0.000304 | ms/batch 317.18 | loss  4.70 | ppl   109.904
| epoch  68 step    29450 |    238 batches | lr 0.000304 | ms/batch 316.79 | loss  4.71 | ppl   110.645
| epoch  68 step    29500 |    288 batches | lr 0.000304 | ms/batch 316.29 | loss  4.75 | ppl   115.521
| epoch  68 step    29550 |    338 batches | lr 0.000304 | ms/batch 314.70 | loss  4.63 | ppl   102.972
| epoch  68 step    29600 |    388 batches | lr 0.000304 | ms/batch 315.32 | loss  4.68 | ppl   107.955
----------------------------------------------------------------------------------------------------
| Eval  74 at step    29600 | time: 130.88s | valid loss  4.42 | valid ppl    83.419
----------------------------------------------------------------------------------------------------
| epoch  69 step    29650 |      2 batches | lr 0.000304 | ms/batch 410.96 | loss  4.74 | ppl   114.712
| epoch  69 step    29700 |     52 batches | lr 0.000303 | ms/batch 316.93 | loss  4.63 | ppl   102.618
| epoch  69 step    29750 |    102 batches | lr 0.000303 | ms/batch 317.03 | loss  4.67 | ppl   106.764
| epoch  69 step    29800 |    152 batches | lr 0.000303 | ms/batch 316.79 | loss  4.67 | ppl   106.798
| epoch  69 step    29850 |    202 batches | lr 0.000303 | ms/batch 314.14 | loss  4.70 | ppl   110.360
| epoch  69 step    29900 |    252 batches | lr 0.000303 | ms/batch 314.14 | loss  4.68 | ppl   108.099
| epoch  69 step    29950 |    302 batches | lr 0.000303 | ms/batch 315.46 | loss  4.71 | ppl   111.583
| epoch  69 step    30000 |    352 batches | lr 0.000303 | ms/batch 314.38 | loss  4.62 | ppl   101.367
----------------------------------------------------------------------------------------------------
| Eval  75 at step    30000 | time: 131.00s | valid loss  4.41 | valid ppl    82.476
----------------------------------------------------------------------------------------------------
| epoch  69 step    30050 |    402 batches | lr 0.000302 | ms/batch 415.53 | loss  4.68 | ppl   107.989
| epoch  70 step    30100 |     16 batches | lr 0.000302 | ms/batch 310.45 | loss  4.71 | ppl   111.009
| epoch  70 step    30150 |     66 batches | lr 0.000302 | ms/batch 329.37 | loss  4.64 | ppl   103.958
| epoch  70 step    30200 |    116 batches | lr 0.000302 | ms/batch 321.43 | loss  4.68 | ppl   107.837
| epoch  70 step    30250 |    166 batches | lr 0.000302 | ms/batch 315.18 | loss  4.69 | ppl   109.202
| epoch  70 step    30300 |    216 batches | lr 0.000302 | ms/batch 314.28 | loss  4.70 | ppl   110.050
| epoch  70 step    30350 |    266 batches | lr 0.000302 | ms/batch 315.39 | loss  4.70 | ppl   110.438
| epoch  70 step    30400 |    316 batches | lr 0.000301 | ms/batch 315.26 | loss  4.69 | ppl   108.802
----------------------------------------------------------------------------------------------------
| Eval  76 at step    30400 | time: 131.82s | valid loss  4.43 | valid ppl    84.099
----------------------------------------------------------------------------------------------------
| epoch  70 step    30450 |    366 batches | lr 0.000301 | ms/batch 413.47 | loss  4.64 | ppl   103.350
| epoch  70 step    30500 |    416 batches | lr 0.000301 | ms/batch 315.79 | loss  4.69 | ppl   108.497
| epoch  71 step    30550 |     30 batches | lr 0.000301 | ms/batch 310.97 | loss  4.72 | ppl   111.713
| epoch  71 step    30600 |     80 batches | lr 0.000301 | ms/batch 315.06 | loss  4.63 | ppl   102.098
| epoch  71 step    30650 |    130 batches | lr 0.000301 | ms/batch 316.32 | loss  4.67 | ppl   106.481
| epoch  71 step    30700 |    180 batches | lr 0.0003 | ms/batch 316.64 | loss  4.68 | ppl   108.099
| epoch  71 step    30750 |    230 batches | lr 0.0003 | ms/batch 316.90 | loss  4.70 | ppl   109.844
| epoch  71 step    30800 |    280 batches | lr 0.0003 | ms/batch 316.58 | loss  4.70 | ppl   110.205
----------------------------------------------------------------------------------------------------
| Eval  77 at step    30800 | time: 131.11s | valid loss  4.41 | valid ppl    82.292
----------------------------------------------------------------------------------------------------
| epoch  71 step    30850 |    330 batches | lr 0.0003 | ms/batch 460.70 | loss  4.62 | ppl   101.597
| epoch  71 step    30900 |    380 batches | lr 0.0003 | ms/batch 317.76 | loss  4.67 | ppl   106.406
| epoch  71 step    30950 |    430 batches | lr 0.0003 | ms/batch 317.66 | loss  4.69 | ppl   108.836
| epoch  72 step    31000 |     44 batches | lr 0.0003 | ms/batch 307.19 | loss  4.69 | ppl   108.573
| epoch  72 step    31050 |     94 batches | lr 0.000299 | ms/batch 314.51 | loss  4.64 | ppl   103.431
| epoch  72 step    31100 |    144 batches | lr 0.000299 | ms/batch 314.41 | loss  4.67 | ppl   106.224
| epoch  72 step    31150 |    194 batches | lr 0.000299 | ms/batch 314.30 | loss  4.70 | ppl   109.878
| epoch  72 step    31200 |    244 batches | lr 0.000299 | ms/batch 315.26 | loss  4.66 | ppl   105.208
----------------------------------------------------------------------------------------------------
| Eval  78 at step    31200 | time: 130.81s | valid loss  4.41 | valid ppl    82.659
----------------------------------------------------------------------------------------------------
| epoch  72 step    31250 |    294 batches | lr 0.000299 | ms/batch 413.70 | loss  4.74 | ppl   114.999
| epoch  72 step    31300 |    344 batches | lr 0.000299 | ms/batch 315.96 | loss  4.58 | ppl    97.461
| epoch  72 step    31350 |    394 batches | lr 0.000298 | ms/batch 315.35 | loss  4.67 | ppl   106.556
| epoch  73 step    31400 |      8 batches | lr 0.000298 | ms/batch 308.76 | loss  4.70 | ppl   109.827
| epoch  73 step    31450 |     58 batches | lr 0.000298 | ms/batch 315.34 | loss  4.61 | ppl   100.626
| epoch  73 step    31500 |    108 batches | lr 0.000298 | ms/batch 315.20 | loss  4.65 | ppl   104.389
| epoch  73 step    31550 |    158 batches | lr 0.000298 | ms/batch 313.66 | loss  4.69 | ppl   108.412
| epoch  73 step    31600 |    208 batches | lr 0.000298 | ms/batch 315.63 | loss  4.67 | ppl   106.748
----------------------------------------------------------------------------------------------------
| Eval  79 at step    31600 | time: 130.67s | valid loss  4.41 | valid ppl    82.144
----------------------------------------------------------------------------------------------------
| epoch  73 step    31650 |    258 batches | lr 0.000297 | ms/batch 478.34 | loss  4.70 | ppl   110.119
| epoch  73 step    31700 |    308 batches | lr 0.000297 | ms/batch 316.34 | loss  4.65 | ppl   105.068
| epoch  73 step    31750 |    358 batches | lr 0.000297 | ms/batch 318.62 | loss  4.63 | ppl   102.522
| epoch  73 step    31800 |    408 batches | lr 0.000297 | ms/batch 318.67 | loss  4.67 | ppl   106.323
| epoch  74 step    31850 |     22 batches | lr 0.000297 | ms/batch 313.93 | loss  4.70 | ppl   110.179
| epoch  74 step    31900 |     72 batches | lr 0.000297 | ms/batch 319.41 | loss  4.60 | ppl    99.663
| epoch  74 step    31950 |    122 batches | lr 0.000297 | ms/batch 316.66 | loss  4.67 | ppl   107.032
| epoch  74 step    32000 |    172 batches | lr 0.000296 | ms/batch 317.27 | loss  4.66 | ppl   105.768
----------------------------------------------------------------------------------------------------
| Eval  80 at step    32000 | time: 131.79s | valid loss  4.39 | valid ppl    80.495
----------------------------------------------------------------------------------------------------
| epoch  74 step    32050 |    222 batches | lr 0.000296 | ms/batch 448.89 | loss  4.67 | ppl   107.099
| epoch  74 step    32100 |    272 batches | lr 0.000296 | ms/batch 316.74 | loss  4.68 | ppl   108.158
| epoch  74 step    32150 |    322 batches | lr 0.000296 | ms/batch 315.76 | loss  4.66 | ppl   106.108
| epoch  74 step    32200 |    372 batches | lr 0.000296 | ms/batch 315.73 | loss  4.65 | ppl   104.422
| epoch  74 step    32250 |    422 batches | lr 0.000296 | ms/batch 318.00 | loss  4.65 | ppl   104.365
| epoch  75 step    32300 |     36 batches | lr 0.000295 | ms/batch 309.49 | loss  4.72 | ppl   111.670
| epoch  75 step    32350 |     86 batches | lr 0.000295 | ms/batch 314.57 | loss  4.62 | ppl   101.296
| epoch  75 step    32400 |    136 batches | lr 0.000295 | ms/batch 314.03 | loss  4.67 | ppl   106.564
----------------------------------------------------------------------------------------------------
| Eval  81 at step    32400 | time: 130.97s | valid loss  4.41 | valid ppl    82.063
----------------------------------------------------------------------------------------------------
| epoch  75 step    32450 |    186 batches | lr 0.000295 | ms/batch 414.81 | loss  4.67 | ppl   106.564
| epoch  75 step    32500 |    236 batches | lr 0.000295 | ms/batch 316.28 | loss  4.68 | ppl   108.175
| epoch  75 step    32550 |    286 batches | lr 0.000295 | ms/batch 316.67 | loss  4.69 | ppl   109.373
| epoch  75 step    32600 |    336 batches | lr 0.000294 | ms/batch 316.88 | loss  4.60 | ppl    99.835
| epoch  75 step    32650 |    386 batches | lr 0.000294 | ms/batch 316.19 | loss  4.68 | ppl   107.526
| epoch  75 step    32700 |    436 batches | lr 0.000294 | ms/batch 309.79 | loss  4.67 | ppl   106.415
| epoch  76 step    32750 |     50 batches | lr 0.000294 | ms/batch 312.39 | loss  4.62 | ppl   101.700
| epoch  76 step    32800 |    100 batches | lr 0.000294 | ms/batch 314.84 | loss  4.61 | ppl   100.249
----------------------------------------------------------------------------------------------------
| Eval  82 at step    32800 | time: 130.90s | valid loss  4.40 | valid ppl    81.273
----------------------------------------------------------------------------------------------------
| epoch  76 step    32850 |    150 batches | lr 0.000294 | ms/batch 413.41 | loss  4.64 | ppl   103.682
| epoch  76 step    32900 |    200 batches | lr 0.000294 | ms/batch 315.93 | loss  4.68 | ppl   107.720
| epoch  76 step    32950 |    250 batches | lr 0.000293 | ms/batch 316.75 | loss  4.67 | ppl   106.290
| epoch  76 step    33000 |    300 batches | lr 0.000293 | ms/batch 315.78 | loss  4.71 | ppl   110.714
| epoch  76 step    33050 |    350 batches | lr 0.000293 | ms/batch 313.88 | loss  4.58 | ppl    97.332
| epoch  76 step    33100 |    400 batches | lr 0.000293 | ms/batch 314.88 | loss  4.68 | ppl   108.107
| epoch  77 step    33150 |     14 batches | lr 0.000293 | ms/batch 307.97 | loss  4.70 | ppl   109.758
| epoch  77 step    33200 |     64 batches | lr 0.000293 | ms/batch 314.84 | loss  4.60 | ppl    99.050
----------------------------------------------------------------------------------------------------
| Eval  83 at step    33200 | time: 130.67s | valid loss  4.37 | valid ppl    79.303
----------------------------------------------------------------------------------------------------
| epoch  77 step    33250 |    114 batches | lr 0.000292 | ms/batch 444.79 | loss  4.65 | ppl   104.552
| epoch  77 step    33300 |    164 batches | lr 0.000292 | ms/batch 313.71 | loss  4.63 | ppl   102.626
| epoch  77 step    33350 |    214 batches | lr 0.000292 | ms/batch 315.07 | loss  4.68 | ppl   107.753
| epoch  77 step    33400 |    264 batches | lr 0.000292 | ms/batch 314.14 | loss  4.65 | ppl   104.316
| epoch  77 step    33450 |    314 batches | lr 0.000292 | ms/batch 314.30 | loss  4.65 | ppl   104.569
| epoch  77 step    33500 |    364 batches | lr 0.000292 | ms/batch 316.33 | loss  4.60 | ppl    99.780
| epoch  77 step    33550 |    414 batches | lr 0.000291 | ms/batch 315.61 | loss  4.65 | ppl   104.994
| epoch  78 step    33600 |     28 batches | lr 0.000291 | ms/batch 309.76 | loss  4.68 | ppl   107.258
----------------------------------------------------------------------------------------------------
| Eval  84 at step    33600 | time: 130.63s | valid loss  4.39 | valid ppl    80.831
----------------------------------------------------------------------------------------------------
| epoch  78 step    33650 |     78 batches | lr 0.000291 | ms/batch 415.32 | loss  4.59 | ppl    98.857
| epoch  78 step    33700 |    128 batches | lr 0.000291 | ms/batch 316.62 | loss  4.64 | ppl   103.877
| epoch  78 step    33750 |    178 batches | lr 0.000291 | ms/batch 317.15 | loss  4.66 | ppl   105.372
| epoch  78 step    33800 |    228 batches | lr 0.000291 | ms/batch 316.42 | loss  4.67 | ppl   106.456
| epoch  78 step    33850 |    278 batches | lr 0.00029 | ms/batch 316.64 | loss  4.68 | ppl   107.888
| epoch  78 step    33900 |    328 batches | lr 0.00029 | ms/batch 316.93 | loss  4.61 | ppl   100.163
| epoch  78 step    33950 |    378 batches | lr 0.00029 | ms/batch 316.70 | loss  4.64 | ppl   103.463
| epoch  78 step    34000 |    428 batches | lr 0.00029 | ms/batch 316.30 | loss  4.68 | ppl   107.686
----------------------------------------------------------------------------------------------------
| Eval  85 at step    34000 | time: 131.61s | valid loss  4.39 | valid ppl    80.804
----------------------------------------------------------------------------------------------------
| epoch  79 step    34050 |     42 batches | lr 0.00029 | ms/batch 411.21 | loss  4.63 | ppl   102.666
| epoch  79 step    34100 |     92 batches | lr 0.00029 | ms/batch 331.57 | loss  4.58 | ppl    97.507
| epoch  79 step    34150 |    142 batches | lr 0.000289 | ms/batch 331.29 | loss  4.66 | ppl   105.265
| epoch  79 step    34200 |    192 batches | lr 0.000289 | ms/batch 324.55 | loss  4.65 | ppl   104.226
| epoch  79 step    34250 |    242 batches | lr 0.000289 | ms/batch 316.82 | loss  4.66 | ppl   105.232
| epoch  79 step    34300 |    292 batches | lr 0.000289 | ms/batch 316.06 | loss  4.71 | ppl   110.550
| epoch  79 step    34350 |    342 batches | lr 0.000289 | ms/batch 314.02 | loss  4.56 | ppl    95.524
| epoch  79 step    34400 |    392 batches | lr 0.000289 | ms/batch 313.35 | loss  4.66 | ppl   106.025
----------------------------------------------------------------------------------------------------
| Eval  86 at step    34400 | time: 132.89s | valid loss  4.38 | valid ppl    80.180
----------------------------------------------------------------------------------------------------
| epoch  80 step    34450 |      6 batches | lr 0.000288 | ms/batch 408.59 | loss  4.69 | ppl   108.879
| epoch  80 step    34500 |     56 batches | lr 0.000288 | ms/batch 331.16 | loss  4.62 | ppl   101.494
| epoch  80 step    34550 |    106 batches | lr 0.000288 | ms/batch 330.90 | loss  4.63 | ppl   102.043
| epoch  80 step    34600 |    156 batches | lr 0.000288 | ms/batch 331.61 | loss  4.65 | ppl   104.880
| epoch  80 step    34650 |    206 batches | lr 0.000288 | ms/batch 324.65 | loss  4.64 | ppl   103.383
| epoch  80 step    34700 |    256 batches | lr 0.000288 | ms/batch 316.01 | loss  4.63 | ppl   102.466
| epoch  80 step    34750 |    306 batches | lr 0.000287 | ms/batch 315.26 | loss  4.67 | ppl   106.631
| epoch  80 step    34800 |    356 batches | lr 0.000287 | ms/batch 314.60 | loss  4.59 | ppl    98.287
----------------------------------------------------------------------------------------------------
| Eval  87 at step    34800 | time: 133.64s | valid loss  4.38 | valid ppl    79.894
----------------------------------------------------------------------------------------------------
| epoch  80 step    34850 |    406 batches | lr 0.000287 | ms/batch 413.96 | loss  4.64 | ppl   103.342
| epoch  81 step    34900 |     20 batches | lr 0.000287 | ms/batch 308.97 | loss  4.65 | ppl   104.773
| epoch  81 step    34950 |     70 batches | lr 0.000287 | ms/batch 315.27 | loss  4.58 | ppl    97.820
| epoch  81 step    35000 |    120 batches | lr 0.000287 | ms/batch 317.46 | loss  4.64 | ppl   103.068
| epoch  81 step    35050 |    170 batches | lr 0.000286 | ms/batch 317.58 | loss  4.62 | ppl   101.012
| epoch  81 step    35100 |    220 batches | lr 0.000286 | ms/batch 315.93 | loss  4.67 | ppl   107.023
| epoch  81 step    35150 |    270 batches | lr 0.000286 | ms/batch 316.19 | loss  4.64 | ppl   103.165
| epoch  81 step    35200 |    320 batches | lr 0.000286 | ms/batch 315.78 | loss  4.66 | ppl   105.183
----------------------------------------------------------------------------------------------------
| Eval  88 at step    35200 | time: 131.06s | valid loss  4.38 | valid ppl    79.981
----------------------------------------------------------------------------------------------------
| epoch  81 step    35250 |    370 batches | lr 0.000286 | ms/batch 415.73 | loss  4.60 | ppl    99.811
| epoch  81 step    35300 |    420 batches | lr 0.000286 | ms/batch 317.04 | loss  4.61 | ppl   100.571
| epoch  82 step    35350 |     34 batches | lr 0.000285 | ms/batch 309.84 | loss  4.68 | ppl   107.720
| epoch  82 step    35400 |     84 batches | lr 0.000285 | ms/batch 317.63 | loss  4.60 | ppl    99.352
| epoch  82 step    35450 |    134 batches | lr 0.000285 | ms/batch 317.12 | loss  4.63 | ppl   102.947
| epoch  82 step    35500 |    184 batches | lr 0.000285 | ms/batch 313.63 | loss  4.61 | ppl   100.736
| epoch  82 step    35550 |    234 batches | lr 0.000285 | ms/batch 314.39 | loss  4.63 | ppl   102.771
| epoch  82 step    35600 |    284 batches | lr 0.000285 | ms/batch 314.70 | loss  4.65 | ppl   104.299
----------------------------------------------------------------------------------------------------
| Eval  89 at step    35600 | time: 130.99s | valid loss  4.38 | valid ppl    79.764
----------------------------------------------------------------------------------------------------
| epoch  82 step    35650 |    334 batches | lr 0.000284 | ms/batch 412.88 | loss  4.56 | ppl    95.890
| epoch  82 step    35700 |    384 batches | lr 0.000284 | ms/batch 314.37 | loss  4.63 | ppl   102.290
| epoch  82 step    35750 |    434 batches | lr 0.000284 | ms/batch 314.11 | loss  4.64 | ppl   103.779
| epoch  83 step    35800 |     48 batches | lr 0.000284 | ms/batch 307.93 | loss  4.61 | ppl   100.249
| epoch  83 step    35850 |     98 batches | lr 0.000284 | ms/batch 314.88 | loss  4.58 | ppl    97.324
| epoch  83 step    35900 |    148 batches | lr 0.000283 | ms/batch 314.20 | loss  4.62 | ppl   101.090
| epoch  83 step    35950 |    198 batches | lr 0.000283 | ms/batch 314.66 | loss  4.66 | ppl   105.200
| epoch  83 step    36000 |    248 batches | lr 0.000283 | ms/batch 319.04 | loss  4.65 | ppl   104.896
----------------------------------------------------------------------------------------------------
| Eval  90 at step    36000 | time: 130.60s | valid loss  4.38 | valid ppl    80.145
----------------------------------------------------------------------------------------------------
| epoch  83 step    36050 |    298 batches | lr 0.000283 | ms/batch 412.54 | loss  4.67 | ppl   106.406
| epoch  83 step    36100 |    348 batches | lr 0.000283 | ms/batch 314.51 | loss  4.52 | ppl    91.886
| epoch  83 step    36150 |    398 batches | lr 0.000283 | ms/batch 314.49 | loss  4.63 | ppl   102.731
| epoch  84 step    36200 |     12 batches | lr 0.000282 | ms/batch 309.06 | loss  4.69 | ppl   109.160
| epoch  84 step    36250 |     62 batches | lr 0.000282 | ms/batch 313.93 | loss  4.57 | ppl    96.378
| epoch  84 step    36300 |    112 batches | lr 0.000282 | ms/batch 318.22 | loss  4.59 | ppl    98.210
| epoch  84 step    36350 |    162 batches | lr 0.000282 | ms/batch 315.77 | loss  4.63 | ppl   102.106
| epoch  84 step    36400 |    212 batches | lr 0.000282 | ms/batch 315.99 | loss  4.64 | ppl   103.714
----------------------------------------------------------------------------------------------------
| Eval  91 at step    36400 | time: 130.78s | valid loss  4.38 | valid ppl    79.652
----------------------------------------------------------------------------------------------------
| epoch  84 step    36450 |    262 batches | lr 0.000282 | ms/batch 415.65 | loss  4.63 | ppl   102.442
| epoch  84 step    36500 |    312 batches | lr 0.000281 | ms/batch 316.16 | loss  4.66 | ppl   105.216
| epoch  84 step    36550 |    362 batches | lr 0.000281 | ms/batch 315.57 | loss  4.56 | ppl    95.763
| epoch  84 step    36600 |    412 batches | lr 0.000281 | ms/batch 316.78 | loss  4.60 | ppl    99.267
| epoch  85 step    36650 |     26 batches | lr 0.000281 | ms/batch 313.36 | loss  4.65 | ppl   104.560
| epoch  85 step    36700 |     76 batches | lr 0.000281 | ms/batch 313.89 | loss  4.58 | ppl    97.074
| epoch  85 step    36750 |    126 batches | lr 0.000281 | ms/batch 313.86 | loss  4.60 | ppl    99.741
| epoch  85 step    36800 |    176 batches | lr 0.00028 | ms/batch 313.63 | loss  4.64 | ppl   103.318
----------------------------------------------------------------------------------------------------
| Eval  92 at step    36800 | time: 130.90s | valid loss  4.36 | valid ppl    78.322
----------------------------------------------------------------------------------------------------
| epoch  85 step    36850 |    226 batches | lr 0.00028 | ms/batch 445.54 | loss  4.64 | ppl   104.015
| epoch  85 step    36900 |    276 batches | lr 0.00028 | ms/batch 315.03 | loss  4.65 | ppl   105.003
| epoch  85 step    36950 |    326 batches | lr 0.00028 | ms/batch 314.46 | loss  4.55 | ppl    95.032
| epoch  85 step    37000 |    376 batches | lr 0.00028 | ms/batch 315.97 | loss  4.63 | ppl   102.346
| epoch  85 step    37050 |    426 batches | lr 0.000279 | ms/batch 316.28 | loss  4.63 | ppl   102.626
| epoch  86 step    37100 |     40 batches | lr 0.000279 | ms/batch 309.60 | loss  4.61 | ppl   100.194
| epoch  86 step    37150 |     90 batches | lr 0.000279 | ms/batch 314.55 | loss  4.57 | ppl    96.506
| epoch  86 step    37200 |    140 batches | lr 0.000279 | ms/batch 313.06 | loss  4.61 | ppl   100.069
----------------------------------------------------------------------------------------------------
| Eval  93 at step    37200 | time: 130.61s | valid loss  4.37 | valid ppl    79.324
----------------------------------------------------------------------------------------------------
| epoch  86 step    37250 |    190 batches | lr 0.000279 | ms/batch 411.90 | loss  4.64 | ppl   103.036
| epoch  86 step    37300 |    240 batches | lr 0.000279 | ms/batch 314.05 | loss  4.64 | ppl   103.229
| epoch  86 step    37350 |    290 batches | lr 0.000278 | ms/batch 314.66 | loss  4.67 | ppl   106.448
| epoch  86 step    37400 |    340 batches | lr 0.000278 | ms/batch 313.50 | loss  4.50 | ppl    90.468
| epoch  86 step    37450 |    390 batches | lr 0.000278 | ms/batch 313.76 | loss  4.61 | ppl   100.233
| epoch  87 step    37500 |      4 batches | lr 0.000278 | ms/batch 307.53 | loss  4.64 | ppl   103.593
| epoch  87 step    37550 |     54 batches | lr 0.000278 | ms/batch 313.42 | loss  4.59 | ppl    98.903
| epoch  87 step    37600 |    104 batches | lr 0.000278 | ms/batch 316.21 | loss  4.60 | ppl    99.438
----------------------------------------------------------------------------------------------------
| Eval  94 at step    37600 | time: 130.35s | valid loss  4.36 | valid ppl    78.141
----------------------------------------------------------------------------------------------------
| epoch  87 step    37650 |    154 batches | lr 0.000277 | ms/batch 543.21 | loss  4.61 | ppl   100.030
| epoch  87 step    37700 |    204 batches | lr 0.000277 | ms/batch 314.55 | loss  4.60 | ppl    99.741
| epoch  87 step    37750 |    254 batches | lr 0.000277 | ms/batch 315.41 | loss  4.61 | ppl   100.791
| epoch  87 step    37800 |    304 batches | lr 0.000277 | ms/batch 314.00 | loss  4.64 | ppl   103.294
| epoch  87 step    37850 |    354 batches | lr 0.000277 | ms/batch 316.32 | loss  4.54 | ppl    93.282
| epoch  87 step    37900 |    404 batches | lr 0.000276 | ms/batch 314.75 | loss  4.59 | ppl    98.479
| epoch  88 step    37950 |     18 batches | lr 0.000276 | ms/batch 307.57 | loss  4.67 | ppl   106.473
| epoch  88 step    38000 |     68 batches | lr 0.000276 | ms/batch 314.40 | loss  4.56 | ppl    95.651
----------------------------------------------------------------------------------------------------
| Eval  95 at step    38000 | time: 130.58s | valid loss  4.36 | valid ppl    78.505
----------------------------------------------------------------------------------------------------
| epoch  88 step    38050 |    118 batches | lr 0.000276 | ms/batch 412.40 | loss  4.58 | ppl    97.820
| epoch  88 step    38100 |    168 batches | lr 0.000276 | ms/batch 314.96 | loss  4.59 | ppl    98.011
| epoch  88 step    38150 |    218 batches | lr 0.000276 | ms/batch 314.52 | loss  4.64 | ppl   103.149
| epoch  88 step    38200 |    268 batches | lr 0.000275 | ms/batch 314.11 | loss  4.62 | ppl   101.724
| epoch  88 step    38250 |    318 batches | lr 0.000275 | ms/batch 313.68 | loss  4.59 | ppl    98.249
| epoch  88 step    38300 |    368 batches | lr 0.000275 | ms/batch 314.82 | loss  4.54 | ppl    93.779
| epoch  88 step    38350 |    418 batches | lr 0.000275 | ms/batch 312.86 | loss  4.58 | ppl    97.980
| epoch  89 step    38400 |     32 batches | lr 0.000275 | ms/batch 308.25 | loss  4.64 | ppl   103.068
----------------------------------------------------------------------------------------------------
| Eval  96 at step    38400 | time: 130.26s | valid loss  4.37 | valid ppl    78.725
----------------------------------------------------------------------------------------------------
| epoch  89 step    38450 |     82 batches | lr 0.000274 | ms/batch 411.54 | loss  4.57 | ppl    96.190
| epoch  89 step    38500 |    132 batches | lr 0.000274 | ms/batch 313.61 | loss  4.58 | ppl    97.766
| epoch  89 step    38550 |    182 batches | lr 0.000274 | ms/batch 316.55 | loss  4.59 | ppl    98.548
| epoch  89 step    38600 |    232 batches | lr 0.000274 | ms/batch 316.85 | loss  4.62 | ppl   101.844
| epoch  89 step    38650 |    282 batches | lr 0.000274 | ms/batch 315.44 | loss  4.63 | ppl   102.811
| epoch  89 step    38700 |    332 batches | lr 0.000274 | ms/batch 316.73 | loss  4.54 | ppl    93.947
| epoch  89 step    38750 |    382 batches | lr 0.000273 | ms/batch 328.70 | loss  4.57 | ppl    96.521
| epoch  89 step    38800 |    432 batches | lr 0.000273 | ms/batch 317.35 | loss  4.61 | ppl   100.586
----------------------------------------------------------------------------------------------------
| Eval  97 at step    38800 | time: 131.85s | valid loss  4.36 | valid ppl    78.052
----------------------------------------------------------------------------------------------------
| epoch  90 step    38850 |     46 batches | lr 0.000273 | ms/batch 440.50 | loss  4.57 | ppl    96.454
| epoch  90 step    38900 |     96 batches | lr 0.000273 | ms/batch 320.86 | loss  4.56 | ppl    95.778
| epoch  90 step    38950 |    146 batches | lr 0.000273 | ms/batch 329.40 | loss  4.59 | ppl    98.441
| epoch  90 step    39000 |    196 batches | lr 0.000272 | ms/batch 323.48 | loss  4.60 | ppl    99.360
| epoch  90 step    39050 |    246 batches | lr 0.000272 | ms/batch 316.16 | loss  4.61 | ppl   100.578
| epoch  90 step    39100 |    296 batches | lr 0.000272 | ms/batch 315.97 | loss  4.64 | ppl   103.358
| epoch  90 step    39150 |    346 batches | lr 0.000272 | ms/batch 315.26 | loss  4.51 | ppl    90.922
| epoch  90 step    39200 |    396 batches | lr 0.000272 | ms/batch 315.15 | loss  4.62 | ppl   101.828
----------------------------------------------------------------------------------------------------
| Eval  98 at step    39200 | time: 132.19s | valid loss  4.36 | valid ppl    77.947
----------------------------------------------------------------------------------------------------
| epoch  91 step    39250 |     10 batches | lr 0.000272 | ms/batch 440.00 | loss  4.62 | ppl   101.724
| epoch  91 step    39300 |     60 batches | lr 0.000271 | ms/batch 314.02 | loss  4.58 | ppl    97.728
| epoch  91 step    39350 |    110 batches | lr 0.000271 | ms/batch 313.64 | loss  4.60 | ppl    99.391
| epoch  91 step    39400 |    160 batches | lr 0.000271 | ms/batch 313.66 | loss  4.56 | ppl    95.583
| epoch  91 step    39450 |    210 batches | lr 0.000271 | ms/batch 313.59 | loss  4.59 | ppl    98.456
| epoch  91 step    39500 |    260 batches | lr 0.000271 | ms/batch 315.66 | loss  4.59 | ppl    98.818
| epoch  91 step    39550 |    310 batches | lr 0.00027 | ms/batch 313.79 | loss  4.63 | ppl   102.972
| epoch  91 step    39600 |    360 batches | lr 0.00027 | ms/batch 313.64 | loss  4.55 | ppl    94.219
----------------------------------------------------------------------------------------------------
| Eval  99 at step    39600 | time: 130.25s | valid loss  4.35 | valid ppl    77.706
----------------------------------------------------------------------------------------------------
| epoch  91 step    39650 |    410 batches | lr 0.00027 | ms/batch 445.53 | loss  4.61 | ppl   100.398
| epoch  92 step    39700 |     24 batches | lr 0.00027 | ms/batch 307.85 | loss  4.64 | ppl   103.100
| epoch  92 step    39750 |     74 batches | lr 0.00027 | ms/batch 314.66 | loss  4.54 | ppl    93.464
| epoch  92 step    39800 |    124 batches | lr 0.00027 | ms/batch 315.48 | loss  4.57 | ppl    96.115
| epoch  92 step    39850 |    174 batches | lr 0.000269 | ms/batch 315.37 | loss  4.58 | ppl    97.575
| epoch  92 step    39900 |    224 batches | lr 0.000269 | ms/batch 316.91 | loss  4.61 | ppl   100.468
| epoch  92 step    39950 |    274 batches | lr 0.000269 | ms/batch 317.23 | loss  4.60 | ppl    99.166
| epoch  92 step    40000 |    324 batches | lr 0.000269 | ms/batch 316.08 | loss  4.55 | ppl    94.721
----------------------------------------------------------------------------------------------------
| Eval 100 at step    40000 | time: 130.84s | valid loss  4.36 | valid ppl    78.355
----------------------------------------------------------------------------------------------------
| epoch  92 step    40050 |    374 batches | lr 0.000269 | ms/batch 412.55 | loss  4.57 | ppl    96.258
| epoch  92 step    40100 |    424 batches | lr 0.000268 | ms/batch 313.18 | loss  4.57 | ppl    96.733
| epoch  93 step    40150 |     38 batches | lr 0.000268 | ms/batch 307.99 | loss  4.59 | ppl    98.919
| epoch  93 step    40200 |     88 batches | lr 0.000268 | ms/batch 313.66 | loss  4.55 | ppl    94.810
| epoch  93 step    40250 |    138 batches | lr 0.000268 | ms/batch 312.83 | loss  4.58 | ppl    97.400
| epoch  93 step    40300 |    188 batches | lr 0.000268 | ms/batch 313.42 | loss  4.58 | ppl    97.225
| epoch  93 step    40350 |    238 batches | lr 0.000267 | ms/batch 313.32 | loss  4.58 | ppl    97.850
| epoch  93 step    40400 |    288 batches | lr 0.000267 | ms/batch 312.68 | loss  4.63 | ppl   102.186
----------------------------------------------------------------------------------------------------
| Eval 101 at step    40400 | time: 129.97s | valid loss  4.36 | valid ppl    78.577
----------------------------------------------------------------------------------------------------
| epoch  93 step    40450 |    338 batches | lr 0.000267 | ms/batch 411.41 | loss  4.50 | ppl    90.271
| epoch  93 step    40500 |    388 batches | lr 0.000267 | ms/batch 313.50 | loss  4.59 | ppl    98.548
| epoch  94 step    40550 |      2 batches | lr 0.000267 | ms/batch 308.24 | loss  4.62 | ppl   101.288
| epoch  94 step    40600 |     52 batches | lr 0.000267 | ms/batch 313.43 | loss  4.56 | ppl    95.711
| epoch  94 step    40650 |    102 batches | lr 0.000266 | ms/batch 313.52 | loss  4.55 | ppl    94.640
| epoch  94 step    40700 |    152 batches | lr 0.000266 | ms/batch 313.99 | loss  4.58 | ppl    97.074
| epoch  94 step    40750 |    202 batches | lr 0.000266 | ms/batch 314.60 | loss  4.58 | ppl    97.911
| epoch  94 step    40800 |    252 batches | lr 0.000266 | ms/batch 313.21 | loss  4.57 | ppl    96.831
----------------------------------------------------------------------------------------------------
| Eval 102 at step    40800 | time: 130.09s | valid loss  4.35 | valid ppl    77.697
----------------------------------------------------------------------------------------------------
| epoch  94 step    40850 |    302 batches | lr 0.000266 | ms/batch 446.47 | loss  4.60 | ppl    99.368
| epoch  94 step    40900 |    352 batches | lr 0.000265 | ms/batch 316.82 | loss  4.51 | ppl    91.178
| epoch  94 step    40950 |    402 batches | lr 0.000265 | ms/batch 318.14 | loss  4.57 | ppl    96.983
| epoch  95 step    41000 |     16 batches | lr 0.000265 | ms/batch 311.32 | loss  4.62 | ppl   101.114
| epoch  95 step    41050 |     66 batches | lr 0.000265 | ms/batch 317.20 | loss  4.51 | ppl    90.539
| epoch  95 step    41100 |    116 batches | lr 0.000265 | ms/batch 316.46 | loss  4.54 | ppl    93.486
| epoch  95 step    41150 |    166 batches | lr 0.000264 | ms/batch 315.94 | loss  4.59 | ppl    98.764
| epoch  95 step    41200 |    216 batches | lr 0.000264 | ms/batch 313.64 | loss  4.61 | ppl   100.751
----------------------------------------------------------------------------------------------------
| Eval 103 at step    41200 | time: 131.15s | valid loss  4.35 | valid ppl    77.547
----------------------------------------------------------------------------------------------------
| epoch  95 step    41250 |    266 batches | lr 0.000264 | ms/batch 444.91 | loss  4.60 | ppl    99.251
| epoch  95 step    41300 |    316 batches | lr 0.000264 | ms/batch 314.42 | loss  4.61 | ppl   100.586
| epoch  95 step    41350 |    366 batches | lr 0.000264 | ms/batch 313.41 | loss  4.52 | ppl    91.922
| epoch  95 step    41400 |    416 batches | lr 0.000264 | ms/batch 313.63 | loss  4.58 | ppl    97.370
| epoch  96 step    41450 |     30 batches | lr 0.000263 | ms/batch 308.80 | loss  4.60 | ppl    99.718
| epoch  96 step    41500 |     80 batches | lr 0.000263 | ms/batch 313.81 | loss  4.52 | ppl    91.771
| epoch  96 step    41550 |    130 batches | lr 0.000263 | ms/batch 313.86 | loss  4.55 | ppl    94.854
| epoch  96 step    41600 |    180 batches | lr 0.000263 | ms/batch 314.03 | loss  4.57 | ppl    96.446
----------------------------------------------------------------------------------------------------
| Eval 104 at step    41600 | time: 130.24s | valid loss  4.36 | valid ppl    77.922
----------------------------------------------------------------------------------------------------
| epoch  96 step    41650 |    230 batches | lr 0.000263 | ms/batch 412.57 | loss  4.61 | ppl   100.594
| epoch  96 step    41700 |    280 batches | lr 0.000262 | ms/batch 317.23 | loss  4.63 | ppl   102.122
| epoch  96 step    41750 |    330 batches | lr 0.000262 | ms/batch 314.13 | loss  4.53 | ppl    93.042
| epoch  96 step    41800 |    380 batches | lr 0.000262 | ms/batch 316.11 | loss  4.55 | ppl    94.485
| epoch  96 step    41850 |    430 batches | lr 0.000262 | ms/batch 315.59 | loss  4.59 | ppl    98.541
| epoch  97 step    41900 |     44 batches | lr 0.000262 | ms/batch 306.26 | loss  4.54 | ppl    93.632
| epoch  97 step    41950 |     94 batches | lr 0.000261 | ms/batch 313.99 | loss  4.51 | ppl    91.128
| epoch  97 step    42000 |    144 batches | lr 0.000261 | ms/batch 314.49 | loss  4.59 | ppl    98.895
----------------------------------------------------------------------------------------------------
| Eval 105 at step    42000 | time: 130.51s | valid loss  4.36 | valid ppl    78.186
----------------------------------------------------------------------------------------------------
| epoch  97 step    42050 |    194 batches | lr 0.000261 | ms/batch 412.01 | loss  4.59 | ppl    98.302
| epoch  97 step    42100 |    244 batches | lr 0.000261 | ms/batch 313.20 | loss  4.58 | ppl    97.552
| epoch  97 step    42150 |    294 batches | lr 0.000261 | ms/batch 313.71 | loss  4.60 | ppl    99.648
| epoch  97 step    42200 |    344 batches | lr 0.00026 | ms/batch 314.10 | loss  4.47 | ppl    87.699
| epoch  97 step    42250 |    394 batches | lr 0.00026 | ms/batch 314.96 | loss  4.58 | ppl    97.142
| epoch  98 step    42300 |      8 batches | lr 0.00026 | ms/batch 307.77 | loss  4.59 | ppl    98.187
| epoch  98 step    42350 |     58 batches | lr 0.00026 | ms/batch 314.42 | loss  4.52 | ppl    91.821
| epoch  98 step    42400 |    108 batches | lr 0.00026 | ms/batch 314.35 | loss  4.53 | ppl    93.143
----------------------------------------------------------------------------------------------------
| Eval 106 at step    42400 | time: 130.30s | valid loss  4.35 | valid ppl    77.213
----------------------------------------------------------------------------------------------------
| epoch  98 step    42450 |    158 batches | lr 0.000259 | ms/batch 475.81 | loss  4.57 | ppl    96.356
| epoch  98 step    42500 |    208 batches | lr 0.000259 | ms/batch 316.90 | loss  4.58 | ppl    97.988
| epoch  98 step    42550 |    258 batches | lr 0.000259 | ms/batch 317.50 | loss  4.57 | ppl    96.998
| epoch  98 step    42600 |    308 batches | lr 0.000259 | ms/batch 317.03 | loss  4.60 | ppl    99.819
| epoch  98 step    42650 |    358 batches | lr 0.000259 | ms/batch 316.65 | loss  4.50 | ppl    89.701
| epoch  98 step    42700 |    408 batches | lr 0.000259 | ms/batch 317.04 | loss  4.54 | ppl    93.713
| epoch  99 step    42750 |     22 batches | lr 0.000258 | ms/batch 310.90 | loss  4.61 | ppl   100.351
| epoch  99 step    42800 |     72 batches | lr 0.000258 | ms/batch 316.18 | loss  4.54 | ppl    93.224
----------------------------------------------------------------------------------------------------
| Eval 107 at step    42800 | time: 131.43s | valid loss  4.34 | valid ppl    76.804
----------------------------------------------------------------------------------------------------
| epoch  99 step    42850 |    122 batches | lr 0.000258 | ms/batch 505.55 | loss  4.56 | ppl    95.868
| epoch  99 step    42900 |    172 batches | lr 0.000258 | ms/batch 332.32 | loss  4.55 | ppl    94.869
| epoch  99 step    42950 |    222 batches | lr 0.000258 | ms/batch 333.20 | loss  4.59 | ppl    98.780
| epoch  99 step    43000 |    272 batches | lr 0.000257 | ms/batch 317.40 | loss  4.57 | ppl    96.763
| epoch  99 step    43050 |    322 batches | lr 0.000257 | ms/batch 319.68 | loss  4.57 | ppl    96.078
| epoch  99 step    43100 |    372 batches | lr 0.000257 | ms/batch 317.30 | loss  4.53 | ppl    93.129
| epoch  99 step    43150 |    422 batches | lr 0.000257 | ms/batch 315.67 | loss  4.54 | ppl    93.618
| epoch 100 step    43200 |     36 batches | lr 0.000257 | ms/batch 309.05 | loss  4.58 | ppl    97.298
----------------------------------------------------------------------------------------------------
| Eval 108 at step    43200 | time: 133.68s | valid loss  4.34 | valid ppl    76.994
----------------------------------------------------------------------------------------------------
| epoch 100 step    43250 |     86 batches | lr 0.000256 | ms/batch 413.35 | loss  4.51 | ppl    91.192
| epoch 100 step    43300 |    136 batches | lr 0.000256 | ms/batch 317.12 | loss  4.57 | ppl    96.160
| epoch 100 step    43350 |    186 batches | lr 0.000256 | ms/batch 315.08 | loss  4.57 | ppl    96.341
| epoch 100 step    43400 |    236 batches | lr 0.000256 | ms/batch 314.81 | loss  4.59 | ppl    98.156
| epoch 100 step    43450 |    286 batches | lr 0.000256 | ms/batch 314.79 | loss  4.60 | ppl    99.921
| epoch 100 step    43500 |    336 batches | lr 0.000255 | ms/batch 314.51 | loss  4.50 | ppl    90.200
| epoch 100 step    43550 |    386 batches | lr 0.000255 | ms/batch 325.56 | loss  4.55 | ppl    94.610
| epoch 100 step    43600 |    436 batches | lr 0.000255 | ms/batch 312.96 | loss  4.57 | ppl    96.431
----------------------------------------------------------------------------------------------------
| Eval 109 at step    43600 | time: 131.45s | valid loss  4.34 | valid ppl    76.722
----------------------------------------------------------------------------------------------------
| epoch 101 step    43650 |     50 batches | lr 0.000255 | ms/batch 446.83 | loss  4.52 | ppl    92.116
| epoch 101 step    43700 |    100 batches | lr 0.000255 | ms/batch 317.06 | loss  4.52 | ppl    91.943
| epoch 101 step    43750 |    150 batches | lr 0.000254 | ms/batch 316.64 | loss  4.56 | ppl    95.248
| epoch 101 step    43800 |    200 batches | lr 0.000254 | ms/batch 316.74 | loss  4.58 | ppl    97.058
| epoch 101 step    43850 |    250 batches | lr 0.000254 | ms/batch 315.07 | loss  4.56 | ppl    95.883
| epoch 101 step    43900 |    300 batches | lr 0.000254 | ms/batch 314.98 | loss  4.58 | ppl    97.758
| epoch 101 step    43950 |    350 batches | lr 0.000254 | ms/batch 319.04 | loss  4.45 | ppl    85.848
| epoch 101 step    44000 |    400 batches | lr 0.000253 | ms/batch 318.07 | loss  4.55 | ppl    94.345
----------------------------------------------------------------------------------------------------
| Eval 110 at step    44000 | time: 131.60s | valid loss  4.34 | valid ppl    76.474
----------------------------------------------------------------------------------------------------
| epoch 102 step    44050 |     14 batches | lr 0.000253 | ms/batch 445.33 | loss  4.57 | ppl    96.537
| epoch 102 step    44100 |     64 batches | lr 0.000253 | ms/batch 318.08 | loss  4.51 | ppl    90.908
| epoch 102 step    44150 |    114 batches | lr 0.000253 | ms/batch 322.07 | loss  4.52 | ppl    92.253
| epoch 102 step    44200 |    164 batches | lr 0.000253 | ms/batch 319.90 | loss  4.55 | ppl    94.315
| epoch 102 step    44250 |    214 batches | lr 0.000252 | ms/batch 315.17 | loss  4.58 | ppl    97.560
| epoch 102 step    44300 |    264 batches | lr 0.000252 | ms/batch 314.10 | loss  4.54 | ppl    93.764
| epoch 102 step    44350 |    314 batches | lr 0.000252 | ms/batch 313.21 | loss  4.57 | ppl    96.521
| epoch 102 step    44400 |    364 batches | lr 0.000252 | ms/batch 314.61 | loss  4.49 | ppl    89.414
----------------------------------------------------------------------------------------------------
| Eval 111 at step    44400 | time: 131.36s | valid loss  4.33 | valid ppl    76.035
----------------------------------------------------------------------------------------------------
| epoch 102 step    44450 |    414 batches | lr 0.000252 | ms/batch 447.28 | loss  4.54 | ppl    93.596
| epoch 103 step    44500 |     28 batches | lr 0.000251 | ms/batch 313.63 | loss  4.59 | ppl    98.433
| epoch 103 step    44550 |     78 batches | lr 0.000251 | ms/batch 331.86 | loss  4.49 | ppl    89.484
| epoch 103 step    44600 |    128 batches | lr 0.000251 | ms/batch 316.16 | loss  4.52 | ppl    91.606
| epoch 103 step    44650 |    178 batches | lr 0.000251 | ms/batch 316.27 | loss  4.55 | ppl    95.003
| epoch 103 step    44700 |    228 batches | lr 0.000251 | ms/batch 316.87 | loss  4.60 | ppl    99.414
| epoch 103 step    44750 |    278 batches | lr 0.000251 | ms/batch 315.42 | loss  4.56 | ppl    95.860
| epoch 103 step    44800 |    328 batches | lr 0.00025 | ms/batch 317.20 | loss  4.51 | ppl    90.532
----------------------------------------------------------------------------------------------------
| Eval 112 at step    44800 | time: 132.05s | valid loss  4.35 | valid ppl    77.194
----------------------------------------------------------------------------------------------------
| epoch 103 step    44850 |    378 batches | lr 0.00025 | ms/batch 413.10 | loss  4.50 | ppl    90.461
| epoch 103 step    44900 |    428 batches | lr 0.00025 | ms/batch 313.19 | loss  4.58 | ppl    97.066
| epoch 104 step    44950 |     42 batches | lr 0.00025 | ms/batch 308.34 | loss  4.53 | ppl    92.592
| epoch 104 step    45000 |     92 batches | lr 0.00025 | ms/batch 313.65 | loss  4.49 | ppl    89.177
| epoch 104 step    45050 |    142 batches | lr 0.000249 | ms/batch 313.26 | loss  4.54 | ppl    93.311
| epoch 104 step    45100 |    192 batches | lr 0.000249 | ms/batch 313.69 | loss  4.57 | ppl    96.393
| epoch 104 step    45150 |    242 batches | lr 0.000249 | ms/batch 313.59 | loss  4.60 | ppl    99.151
| epoch 104 step    45200 |    292 batches | lr 0.000249 | ms/batch 314.86 | loss  4.57 | ppl    97.020
----------------------------------------------------------------------------------------------------
| Eval 113 at step    45200 | time: 130.23s | valid loss  4.33 | valid ppl    76.319
----------------------------------------------------------------------------------------------------
| epoch 104 step    45250 |    342 batches | lr 0.000249 | ms/batch 415.29 | loss  4.44 | ppl    85.040
| epoch 104 step    45300 |    392 batches | lr 0.000248 | ms/batch 315.12 | loss  4.55 | ppl    94.640
| epoch 105 step    45350 |      6 batches | lr 0.000248 | ms/batch 311.13 | loss  4.57 | ppl    96.153
| epoch 105 step    45400 |     56 batches | lr 0.000248 | ms/batch 316.22 | loss  4.49 | ppl    88.698
| epoch 105 step    45450 |    106 batches | lr 0.000248 | ms/batch 315.27 | loss  4.51 | ppl    91.263
| epoch 105 step    45500 |    156 batches | lr 0.000248 | ms/batch 316.43 | loss  4.55 | ppl    94.529
| epoch 105 step    45550 |    206 batches | lr 0.000247 | ms/batch 316.13 | loss  4.54 | ppl    94.094
| epoch 105 step    45600 |    256 batches | lr 0.000247 | ms/batch 315.88 | loss  4.57 | ppl    96.363
----------------------------------------------------------------------------------------------------
| Eval 114 at step    45600 | time: 131.07s | valid loss  4.33 | valid ppl    75.883
----------------------------------------------------------------------------------------------------
| epoch 105 step    45650 |    306 batches | lr 0.000247 | ms/batch 447.44 | loss  4.59 | ppl    98.410
| epoch 105 step    45700 |    356 batches | lr 0.000247 | ms/batch 313.45 | loss  4.47 | ppl    87.685
| epoch 105 step    45750 |    406 batches | lr 0.000247 | ms/batch 314.32 | loss  4.53 | ppl    93.180
| epoch 106 step    45800 |     20 batches | lr 0.000246 | ms/batch 308.98 | loss  4.57 | ppl    96.070
| epoch 106 step    45850 |     70 batches | lr 0.000246 | ms/batch 312.84 | loss  4.47 | ppl    87.329
| epoch 106 step    45900 |    120 batches | lr 0.000246 | ms/batch 319.96 | loss  4.51 | ppl    91.378
| epoch 106 step    45950 |    170 batches | lr 0.000246 | ms/batch 318.15 | loss  4.52 | ppl    91.592
| epoch 106 step    46000 |    220 batches | lr 0.000246 | ms/batch 314.22 | loss  4.58 | ppl    97.051
----------------------------------------------------------------------------------------------------
| Eval 115 at step    46000 | time: 130.73s | valid loss  4.33 | valid ppl    76.059
----------------------------------------------------------------------------------------------------
| epoch 106 step    46050 |    270 batches | lr 0.000245 | ms/batch 414.09 | loss  4.55 | ppl    94.463
| epoch 106 step    46100 |    320 batches | lr 0.000245 | ms/batch 316.54 | loss  4.50 | ppl    89.975
| epoch 106 step    46150 |    370 batches | lr 0.000245 | ms/batch 316.28 | loss  4.50 | ppl    89.624
| epoch 106 step    46200 |    420 batches | lr 0.000245 | ms/batch 315.47 | loss  4.53 | ppl    92.563
| epoch 107 step    46250 |     34 batches | lr 0.000245 | ms/batch 310.99 | loss  4.55 | ppl    94.832
| epoch 107 step    46300 |     84 batches | lr 0.000244 | ms/batch 314.66 | loss  4.47 | ppl    87.261
| epoch 107 step    46350 |    134 batches | lr 0.000244 | ms/batch 314.62 | loss  4.53 | ppl    93.114
| epoch 107 step    46400 |    184 batches | lr 0.000244 | ms/batch 313.92 | loss  4.52 | ppl    91.649
----------------------------------------------------------------------------------------------------
| Eval 116 at step    46400 | time: 130.82s | valid loss  4.33 | valid ppl    76.148
----------------------------------------------------------------------------------------------------
| epoch 107 step    46450 |    234 batches | lr 0.000244 | ms/batch 414.27 | loss  4.56 | ppl    95.218
| epoch 107 step    46500 |    284 batches | lr 0.000243 | ms/batch 316.17 | loss  4.55 | ppl    94.455
| epoch 107 step    46550 |    334 batches | lr 0.000243 | ms/batch 316.53 | loss  4.45 | ppl    86.023
| epoch 107 step    46600 |    384 batches | lr 0.000243 | ms/batch 317.22 | loss  4.51 | ppl    91.092
| epoch 107 step    46650 |    434 batches | lr 0.000243 | ms/batch 316.03 | loss  4.54 | ppl    93.435
| epoch 108 step    46700 |     48 batches | lr 0.000243 | ms/batch 310.53 | loss  4.49 | ppl    89.470
| epoch 108 step    46750 |     98 batches | lr 0.000242 | ms/batch 316.88 | loss  4.48 | ppl    88.049
| epoch 108 step    46800 |    148 batches | lr 0.000242 | ms/batch 315.16 | loss  4.50 | ppl    90.348
----------------------------------------------------------------------------------------------------
| Eval 117 at step    46800 | time: 131.18s | valid loss  4.33 | valid ppl    75.754
----------------------------------------------------------------------------------------------------
| epoch 108 step    46850 |    198 batches | lr 0.000242 | ms/batch 447.13 | loss  4.54 | ppl    93.940
| epoch 108 step    46900 |    248 batches | lr 0.000242 | ms/batch 314.15 | loss  4.57 | ppl    96.318
| epoch 108 step    46950 |    298 batches | lr 0.000242 | ms/batch 313.62 | loss  4.53 | ppl    93.143
| epoch 108 step    47000 |    348 batches | lr 0.000241 | ms/batch 313.64 | loss  4.43 | ppl    83.859
| epoch 108 step    47050 |    398 batches | lr 0.000241 | ms/batch 314.59 | loss  4.51 | ppl    90.801
| epoch 109 step    47100 |     12 batches | lr 0.000241 | ms/batch 308.44 | loss  4.56 | ppl    95.845
| epoch 109 step    47150 |     62 batches | lr 0.000241 | ms/batch 315.52 | loss  4.47 | ppl    87.159
| epoch 109 step    47200 |    112 batches | lr 0.000241 | ms/batch 315.59 | loss  4.53 | ppl    92.325
----------------------------------------------------------------------------------------------------
| Eval 118 at step    47200 | time: 130.45s | valid loss  4.32 | valid ppl    75.522
----------------------------------------------------------------------------------------------------
| epoch 109 step    47250 |    162 batches | lr 0.00024 | ms/batch 446.47 | loss  4.52 | ppl    91.742
| epoch 109 step    47300 |    212 batches | lr 0.00024 | ms/batch 313.49 | loss  4.54 | ppl    93.267
| epoch 109 step    47350 |    262 batches | lr 0.00024 | ms/batch 313.21 | loss  4.56 | ppl    95.748
| epoch 109 step    47400 |    312 batches | lr 0.00024 | ms/batch 314.15 | loss  4.52 | ppl    92.267
| epoch 109 step    47450 |    362 batches | lr 0.00024 | ms/batch 313.88 | loss  4.46 | ppl    86.305
| epoch 109 step    47500 |    412 batches | lr 0.000239 | ms/batch 313.11 | loss  4.49 | ppl    89.205
| epoch 110 step    47550 |     26 batches | lr 0.000239 | ms/batch 307.22 | loss  4.59 | ppl    98.772
| epoch 110 step    47600 |     76 batches | lr 0.000239 | ms/batch 313.83 | loss  4.48 | ppl    88.414
----------------------------------------------------------------------------------------------------
| Eval 119 at step    47600 | time: 130.07s | valid loss  4.33 | valid ppl    76.102
----------------------------------------------------------------------------------------------------
| epoch 110 step    47650 |    126 batches | lr 0.000239 | ms/batch 413.17 | loss  4.49 | ppl    89.317
| epoch 110 step    47700 |    176 batches | lr 0.000239 | ms/batch 316.54 | loss  4.50 | ppl    90.320
| epoch 110 step    47750 |    226 batches | lr 0.000238 | ms/batch 316.26 | loss  4.52 | ppl    91.735
| epoch 110 step    47800 |    276 batches | lr 0.000238 | ms/batch 315.93 | loss  4.53 | ppl    93.093
| epoch 110 step    47850 |    326 batches | lr 0.000238 | ms/batch 316.49 | loss  4.47 | ppl    87.486
| epoch 110 step    47900 |    376 batches | lr 0.000238 | ms/batch 315.67 | loss  4.50 | ppl    90.433
| epoch 110 step    47950 |    426 batches | lr 0.000238 | ms/batch 316.39 | loss  4.53 | ppl    92.296
| epoch 111 step    48000 |     40 batches | lr 0.000237 | ms/batch 309.30 | loss  4.51 | ppl    90.482
----------------------------------------------------------------------------------------------------
| Eval 120 at step    48000 | time: 131.02s | valid loss  4.32 | valid ppl    75.460
----------------------------------------------------------------------------------------------------
| epoch 111 step    48050 |     90 batches | lr 0.000237 | ms/batch 465.16 | loss  4.45 | ppl    85.962
| epoch 111 step    48100 |    140 batches | lr 0.000237 | ms/batch 330.81 | loss  4.51 | ppl    90.865
| epoch 111 step    48150 |    190 batches | lr 0.000237 | ms/batch 329.01 | loss  4.55 | ppl    94.367
| epoch 111 step    48200 |    240 batches | lr 0.000237 | ms/batch 313.80 | loss  4.53 | ppl    92.462
| epoch 111 step    48250 |    290 batches | lr 0.000236 | ms/batch 314.29 | loss  4.59 | ppl    98.295
| epoch 111 step    48300 |    340 batches | lr 0.000236 | ms/batch 313.93 | loss  4.43 | ppl    83.591
| epoch 111 step    48350 |    390 batches | lr 0.000236 | ms/batch 315.04 | loss  4.49 | ppl    89.003
| epoch 112 step    48400 |      4 batches | lr 0.000236 | ms/batch 309.21 | loss  4.54 | ppl    94.138
----------------------------------------------------------------------------------------------------
| Eval 121 at step    48400 | time: 132.81s | valid loss  4.33 | valid ppl    75.781
----------------------------------------------------------------------------------------------------
| epoch 112 step    48450 |     54 batches | lr 0.000236 | ms/batch 413.86 | loss  4.50 | ppl    89.596
| epoch 112 step    48500 |    104 batches | lr 0.000235 | ms/batch 315.46 | loss  4.47 | ppl    87.658
| epoch 112 step    48550 |    154 batches | lr 0.000235 | ms/batch 315.33 | loss  4.52 | ppl    92.030
| epoch 112 step    48600 |    204 batches | lr 0.000235 | ms/batch 318.92 | loss  4.53 | ppl    92.614
| epoch 112 step    48650 |    254 batches | lr 0.000235 | ms/batch 321.12 | loss  4.51 | ppl    91.292
| epoch 112 step    48700 |    304 batches | lr 0.000234 | ms/batch 321.55 | loss  4.57 | ppl    96.235
| epoch 112 step    48750 |    354 batches | lr 0.000234 | ms/batch 322.41 | loss  4.42 | ppl    83.317
| epoch 112 step    48800 |    404 batches | lr 0.000234 | ms/batch 321.05 | loss  4.48 | ppl    88.386
----------------------------------------------------------------------------------------------------
| Eval 122 at step    48800 | time: 132.49s | valid loss  4.33 | valid ppl    75.580
----------------------------------------------------------------------------------------------------
| epoch 113 step    48850 |     18 batches | lr 0.000234 | ms/batch 409.83 | loss  4.56 | ppl    95.782
| epoch 113 step    48900 |     68 batches | lr 0.000234 | ms/batch 317.60 | loss  4.48 | ppl    88.594
| epoch 113 step    48950 |    118 batches | lr 0.000233 | ms/batch 316.84 | loss  4.49 | ppl    88.760
| epoch 113 step    49000 |    168 batches | lr 0.000233 | ms/batch 313.85 | loss  4.49 | ppl    89.456
| epoch 113 step    49050 |    218 batches | lr 0.000233 | ms/batch 316.97 | loss  4.52 | ppl    92.044
| epoch 113 step    49100 |    268 batches | lr 0.000233 | ms/batch 316.81 | loss  4.49 | ppl    89.547
| epoch 113 step    49150 |    318 batches | lr 0.000233 | ms/batch 316.04 | loss  4.48 | ppl    88.476
| epoch 113 step    49200 |    368 batches | lr 0.000232 | ms/batch 315.38 | loss  4.47 | ppl    87.507
----------------------------------------------------------------------------------------------------
| Eval 123 at step    49200 | time: 131.14s | valid loss  4.32 | valid ppl    75.428
----------------------------------------------------------------------------------------------------
| epoch 113 step    49250 |    418 batches | lr 0.000232 | ms/batch 445.30 | loss  4.49 | ppl    88.829
| epoch 114 step    49300 |     32 batches | lr 0.000232 | ms/batch 307.50 | loss  4.53 | ppl    92.570
| epoch 114 step    49350 |     82 batches | lr 0.000232 | ms/batch 314.43 | loss  4.43 | ppl    83.840
| epoch 114 step    49400 |    132 batches | lr 0.000232 | ms/batch 315.14 | loss  4.50 | ppl    90.031
| epoch 114 step    49450 |    182 batches | lr 0.000231 | ms/batch 316.56 | loss  4.51 | ppl    90.808
| epoch 114 step    49500 |    232 batches | lr 0.000231 | ms/batch 315.45 | loss  4.54 | ppl    93.742
| epoch 114 step    49550 |    282 batches | lr 0.000231 | ms/batch 314.30 | loss  4.52 | ppl    91.406
| epoch 114 step    49600 |    332 batches | lr 0.000231 | ms/batch 315.29 | loss  4.42 | ppl    82.941
----------------------------------------------------------------------------------------------------
| Eval 124 at step    49600 | time: 130.57s | valid loss  4.32 | valid ppl    75.086
----------------------------------------------------------------------------------------------------
| epoch 114 step    49650 |    382 batches | lr 0.000231 | ms/batch 446.65 | loss  4.46 | ppl    86.569
| epoch 114 step    49700 |    432 batches | lr 0.00023 | ms/batch 315.41 | loss  4.53 | ppl    92.650
| epoch 115 step    49750 |     46 batches | lr 0.00023 | ms/batch 309.76 | loss  4.49 | ppl    88.850
| epoch 115 step    49800 |     96 batches | lr 0.00023 | ms/batch 315.56 | loss  4.45 | ppl    85.654
| epoch 115 step    49850 |    146 batches | lr 0.00023 | ms/batch 316.36 | loss  4.49 | ppl    88.788
| epoch 115 step    49900 |    196 batches | lr 0.000229 | ms/batch 314.18 | loss  4.52 | ppl    91.742
| epoch 115 step    49950 |    246 batches | lr 0.000229 | ms/batch 313.04 | loss  4.50 | ppl    90.257
| epoch 115 step    50000 |    296 batches | lr 0.000229 | ms/batch 313.77 | loss  4.56 | ppl    95.464
----------------------------------------------------------------------------------------------------
| Eval 125 at step    50000 | time: 130.51s | valid loss  4.33 | valid ppl    75.864
----------------------------------------------------------------------------------------------------
| epoch 115 step    50050 |    346 batches | lr 0.000229 | ms/batch 411.38 | loss  4.40 | ppl    81.165
| epoch 115 step    50100 |    396 batches | lr 0.000229 | ms/batch 312.56 | loss  4.50 | ppl    90.109
| epoch 116 step    50150 |     10 batches | lr 0.000228 | ms/batch 307.45 | loss  4.56 | ppl    95.352
| epoch 116 step    50200 |     60 batches | lr 0.000228 | ms/batch 313.73 | loss  4.44 | ppl    84.590
| epoch 116 step    50250 |    110 batches | lr 0.000228 | ms/batch 315.35 | loss  4.45 | ppl    85.213
| epoch 116 step    50300 |    160 batches | lr 0.000228 | ms/batch 315.92 | loss  4.51 | ppl    90.773
| epoch 116 step    50350 |    210 batches | lr 0.000228 | ms/batch 315.96 | loss  4.49 | ppl    89.094
| epoch 116 step    50400 |    260 batches | lr 0.000227 | ms/batch 314.48 | loss  4.50 | ppl    90.123
----------------------------------------------------------------------------------------------------
| Eval 126 at step    50400 | time: 130.37s | valid loss  4.32 | valid ppl    75.104
----------------------------------------------------------------------------------------------------
| epoch 116 step    50450 |    310 batches | lr 0.000227 | ms/batch 412.26 | loss  4.53 | ppl    93.085
| epoch 116 step    50500 |    360 batches | lr 0.000227 | ms/batch 313.18 | loss  4.46 | ppl    86.123
| epoch 116 step    50550 |    410 batches | lr 0.000227 | ms/batch 313.95 | loss  4.46 | ppl    86.460
| epoch 117 step    50600 |     24 batches | lr 0.000227 | ms/batch 308.49 | loss  4.51 | ppl    90.957
| epoch 117 step    50650 |     74 batches | lr 0.000226 | ms/batch 312.85 | loss  4.44 | ppl    84.841
| epoch 117 step    50700 |    124 batches | lr 0.000226 | ms/batch 315.12 | loss  4.47 | ppl    87.234
| epoch 117 step    50750 |    174 batches | lr 0.000226 | ms/batch 314.80 | loss  4.49 | ppl    89.352
| epoch 117 step    50800 |    224 batches | lr 0.000226 | ms/batch 313.44 | loss  4.52 | ppl    92.231
----------------------------------------------------------------------------------------------------
| Eval 127 at step    50800 | time: 130.19s | valid loss  4.33 | valid ppl    75.608
----------------------------------------------------------------------------------------------------
| epoch 117 step    50850 |    274 batches | lr 0.000226 | ms/batch 412.21 | loss  4.53 | ppl    92.621
| epoch 117 step    50900 |    324 batches | lr 0.000225 | ms/batch 314.99 | loss  4.46 | ppl    86.921
| epoch 117 step    50950 |    374 batches | lr 0.000225 | ms/batch 316.19 | loss  4.47 | ppl    87.747
| epoch 117 step    51000 |    424 batches | lr 0.000225 | ms/batch 316.35 | loss  4.47 | ppl    87.569
| epoch 118 step    51050 |     38 batches | lr 0.000225 | ms/batch 310.36 | loss  4.51 | ppl    90.638
| epoch 118 step    51100 |     88 batches | lr 0.000224 | ms/batch 316.46 | loss  4.43 | ppl    84.352
| epoch 118 step    51150 |    138 batches | lr 0.000224 | ms/batch 316.09 | loss  4.48 | ppl    88.497
| epoch 118 step    51200 |    188 batches | lr 0.000224 | ms/batch 313.39 | loss  4.50 | ppl    89.975
----------------------------------------------------------------------------------------------------
| Eval 128 at step    51200 | time: 130.81s | valid loss  4.32 | valid ppl    75.543
----------------------------------------------------------------------------------------------------
| epoch 118 step    51250 |    238 batches | lr 0.000224 | ms/batch 412.50 | loss  4.50 | ppl    89.933
| epoch 118 step    51300 |    288 batches | lr 0.000224 | ms/batch 314.26 | loss  4.54 | ppl    93.581
| epoch 118 step    51350 |    338 batches | lr 0.000223 | ms/batch 312.95 | loss  4.42 | ppl    82.753
| epoch 118 step    51400 |    388 batches | lr 0.000223 | ms/batch 313.40 | loss  4.50 | ppl    89.891
| epoch 119 step    51450 |      2 batches | lr 0.000223 | ms/batch 306.78 | loss  4.53 | ppl    92.683
| epoch 119 step    51500 |     52 batches | lr 0.000223 | ms/batch 314.94 | loss  4.43 | ppl    84.247
| epoch 119 step    51550 |    102 batches | lr 0.000223 | ms/batch 315.61 | loss  4.44 | ppl    84.961
| epoch 119 step    51600 |    152 batches | lr 0.000222 | ms/batch 313.97 | loss  4.49 | ppl    88.954
----------------------------------------------------------------------------------------------------
| Eval 129 at step    51600 | time: 130.21s | valid loss  4.31 | valid ppl    74.660
----------------------------------------------------------------------------------------------------
| epoch 119 step    51650 |    202 batches | lr 0.000222 | ms/batch 443.92 | loss  4.50 | ppl    89.715
| epoch 119 step    51700 |    252 batches | lr 0.000222 | ms/batch 315.60 | loss  4.50 | ppl    90.151
| epoch 119 step    51750 |    302 batches | lr 0.000222 | ms/batch 313.86 | loss  4.52 | ppl    91.528
| epoch 119 step    51800 |    352 batches | lr 0.000221 | ms/batch 315.24 | loss  4.40 | ppl    81.738
| epoch 119 step    51850 |    402 batches | lr 0.000221 | ms/batch 315.24 | loss  4.48 | ppl    88.035
| epoch 120 step    51900 |     16 batches | lr 0.000221 | ms/batch 308.74 | loss  4.53 | ppl    92.476
| epoch 120 step    51950 |     66 batches | lr 0.000221 | ms/batch 315.07 | loss  4.43 | ppl    84.109
| epoch 120 step    52000 |    116 batches | lr 0.000221 | ms/batch 316.35 | loss  4.45 | ppl    85.828
----------------------------------------------------------------------------------------------------
| Eval 130 at step    52000 | time: 130.63s | valid loss  4.30 | valid ppl    73.995
----------------------------------------------------------------------------------------------------
| epoch 120 step    52050 |    166 batches | lr 0.00022 | ms/batch 443.66 | loss  4.48 | ppl    88.069
| epoch 120 step    52100 |    216 batches | lr 0.00022 | ms/batch 313.09 | loss  4.52 | ppl    91.707
| epoch 120 step    52150 |    266 batches | lr 0.00022 | ms/batch 312.89 | loss  4.48 | ppl    88.649
| epoch 120 step    52200 |    316 batches | lr 0.00022 | ms/batch 314.01 | loss  4.49 | ppl    88.691
| epoch 120 step    52250 |    366 batches | lr 0.00022 | ms/batch 315.18 | loss  4.42 | ppl    82.701
| epoch 120 step    52300 |    416 batches | lr 0.000219 | ms/batch 315.32 | loss  4.48 | ppl    88.352
| epoch 121 step    52350 |     30 batches | lr 0.000219 | ms/batch 308.12 | loss  4.50 | ppl    90.165
| epoch 121 step    52400 |     80 batches | lr 0.000219 | ms/batch 314.01 | loss  4.46 | ppl    86.874
----------------------------------------------------------------------------------------------------
| Eval 131 at step    52400 | time: 130.24s | valid loss  4.32 | valid ppl    74.885
----------------------------------------------------------------------------------------------------
| epoch 121 step    52450 |    130 batches | lr 0.000219 | ms/batch 412.30 | loss  4.47 | ppl    87.289
| epoch 121 step    52500 |    180 batches | lr 0.000219 | ms/batch 315.19 | loss  4.46 | ppl    86.880
| epoch 121 step    52550 |    230 batches | lr 0.000218 | ms/batch 316.20 | loss  4.51 | ppl    90.830
| epoch 121 step    52600 |    280 batches | lr 0.000218 | ms/batch 315.34 | loss  4.50 | ppl    90.405
| epoch 121 step    52650 |    330 batches | lr 0.000218 | ms/batch 315.50 | loss  4.44 | ppl    84.841
| epoch 121 step    52700 |    380 batches | lr 0.000218 | ms/batch 314.10 | loss  4.47 | ppl    87.466
| epoch 121 step    52750 |    430 batches | lr 0.000217 | ms/batch 315.68 | loss  4.48 | ppl    87.815
| epoch 122 step    52800 |     44 batches | lr 0.000217 | ms/batch 313.65 | loss  4.46 | ppl    86.630
----------------------------------------------------------------------------------------------------
| Eval 132 at step    52800 | time: 130.93s | valid loss  4.30 | valid ppl    74.048
----------------------------------------------------------------------------------------------------
| epoch 122 step    52850 |     94 batches | lr 0.000217 | ms/batch 413.48 | loss  4.41 | ppl    82.456
| epoch 122 step    52900 |    144 batches | lr 0.000217 | ms/batch 315.15 | loss  4.47 | ppl    87.658
| epoch 122 step    52950 |    194 batches | lr 0.000217 | ms/batch 314.03 | loss  4.49 | ppl    89.365
| epoch 122 step    53000 |    244 batches | lr 0.000216 | ms/batch 314.71 | loss  4.50 | ppl    90.235
| epoch 122 step    53050 |    294 batches | lr 0.000216 | ms/batch 314.58 | loss  4.51 | ppl    90.751
| epoch 122 step    53100 |    344 batches | lr 0.000216 | ms/batch 314.78 | loss  4.41 | ppl    82.314
| epoch 122 step    53150 |    394 batches | lr 0.000216 | ms/batch 316.01 | loss  4.50 | ppl    89.666
| epoch 123 step    53200 |      8 batches | lr 0.000216 | ms/batch 309.65 | loss  4.51 | ppl    91.228
----------------------------------------------------------------------------------------------------
| Eval 133 at step    53200 | time: 130.63s | valid loss  4.31 | valid ppl    74.098
----------------------------------------------------------------------------------------------------
| epoch 123 step    53250 |     58 batches | lr 0.000215 | ms/batch 414.24 | loss  4.45 | ppl    85.527
| epoch 123 step    53300 |    108 batches | lr 0.000215 | ms/batch 314.98 | loss  4.44 | ppl    84.563
| epoch 123 step    53350 |    158 batches | lr 0.000215 | ms/batch 314.86 | loss  4.50 | ppl    89.680
| epoch 123 step    53400 |    208 batches | lr 0.000215 | ms/batch 314.30 | loss  4.51 | ppl    90.497
| epoch 123 step    53450 |    258 batches | lr 0.000214 | ms/batch 314.87 | loss  4.50 | ppl    89.905
| epoch 123 step    53500 |    308 batches | lr 0.000214 | ms/batch 315.38 | loss  4.51 | ppl    90.546
| epoch 123 step    53550 |    358 batches | lr 0.000214 | ms/batch 315.06 | loss  4.39 | ppl    80.830
| epoch 123 step    53600 |    408 batches | lr 0.000214 | ms/batch 316.54 | loss  4.46 | ppl    86.427
----------------------------------------------------------------------------------------------------
| Eval 134 at step    53600 | time: 131.01s | valid loss  4.31 | valid ppl    74.260
----------------------------------------------------------------------------------------------------
| epoch 124 step    53650 |     22 batches | lr 0.000214 | ms/batch 407.82 | loss  4.52 | ppl    91.392
| epoch 124 step    53700 |     72 batches | lr 0.000213 | ms/batch 316.25 | loss  4.43 | ppl    83.886
| epoch 124 step    53750 |    122 batches | lr 0.000213 | ms/batch 315.51 | loss  4.48 | ppl    88.255
| epoch 124 step    53800 |    172 batches | lr 0.000213 | ms/batch 313.80 | loss  4.45 | ppl    86.016
| epoch 124 step    53850 |    222 batches | lr 0.000213 | ms/batch 316.21 | loss  4.49 | ppl    89.128
| epoch 124 step    53900 |    272 batches | lr 0.000213 | ms/batch 314.35 | loss  4.47 | ppl    87.473
| epoch 124 step    53950 |    322 batches | lr 0.000212 | ms/batch 314.20 | loss  4.43 | ppl    83.853
| epoch 124 step    54000 |    372 batches | lr 0.000212 | ms/batch 313.98 | loss  4.47 | ppl    87.207
----------------------------------------------------------------------------------------------------
| Eval 135 at step    54000 | time: 130.58s | valid loss  4.30 | valid ppl    73.914
----------------------------------------------------------------------------------------------------
| epoch 124 step    54050 |    422 batches | lr 0.000212 | ms/batch 448.34 | loss  4.48 | ppl    88.304
| epoch 125 step    54100 |     36 batches | lr 0.000212 | ms/batch 307.65 | loss  4.50 | ppl    89.827
| epoch 125 step    54150 |     86 batches | lr 0.000211 | ms/batch 313.53 | loss  4.42 | ppl    83.226
| epoch 125 step    54200 |    136 batches | lr 0.000211 | ms/batch 313.74 | loss  4.46 | ppl    86.299
| epoch 125 step    54250 |    186 batches | lr 0.000211 | ms/batch 314.12 | loss  4.46 | ppl    86.596
| epoch 125 step    54300 |    236 batches | lr 0.000211 | ms/batch 313.26 | loss  4.48 | ppl    88.552
| epoch 125 step    54350 |    286 batches | lr 0.000211 | ms/batch 312.59 | loss  4.51 | ppl    91.285
| epoch 125 step    54400 |    336 batches | lr 0.00021 | ms/batch 314.91 | loss  4.36 | ppl    78.318
----------------------------------------------------------------------------------------------------
| Eval 136 at step    54400 | time: 130.16s | valid loss  4.30 | valid ppl    73.993
----------------------------------------------------------------------------------------------------
| epoch 125 step    54450 |    386 batches | lr 0.00021 | ms/batch 412.68 | loss  4.48 | ppl    88.304
| epoch 125 step    54500 |    436 batches | lr 0.00021 | ms/batch 309.25 | loss  4.48 | ppl    88.608
| epoch 126 step    54550 |     50 batches | lr 0.00021 | ms/batch 313.74 | loss  4.44 | ppl    84.974
| epoch 126 step    54600 |    100 batches | lr 0.00021 | ms/batch 316.08 | loss  4.43 | ppl    84.214
| epoch 126 step    54650 |    150 batches | lr 0.000209 | ms/batch 316.25 | loss  4.45 | ppl    85.707
| epoch 126 step    54700 |    200 batches | lr 0.000209 | ms/batch 313.64 | loss  4.48 | ppl    88.317
| epoch 126 step    54750 |    250 batches | lr 0.000209 | ms/batch 314.17 | loss  4.48 | ppl    88.262
| epoch 126 step    54800 |    300 batches | lr 0.000209 | ms/batch 313.74 | loss  4.50 | ppl    89.982
----------------------------------------------------------------------------------------------------
| Eval 137 at step    54800 | time: 130.49s | valid loss  4.30 | valid ppl    73.866
----------------------------------------------------------------------------------------------------
| epoch 126 step    54850 |    350 batches | lr 0.000208 | ms/batch 444.81 | loss  4.40 | ppl    81.553
| epoch 126 step    54900 |    400 batches | lr 0.000208 | ms/batch 314.87 | loss  4.48 | ppl    87.939
| epoch 127 step    54950 |     14 batches | lr 0.000208 | ms/batch 307.36 | loss  4.50 | ppl    90.232
| epoch 127 step    55000 |     64 batches | lr 0.000208 | ms/batch 314.69 | loss  4.40 | ppl    81.054
| epoch 127 step    55050 |    114 batches | lr 0.000208 | ms/batch 313.02 | loss  4.44 | ppl    84.405
| epoch 127 step    55100 |    164 batches | lr 0.000207 | ms/batch 314.90 | loss  4.45 | ppl    85.976
| epoch 127 step    55150 |    214 batches | lr 0.000207 | ms/batch 315.89 | loss  4.45 | ppl    85.734
| epoch 127 step    55200 |    264 batches | lr 0.000207 | ms/batch 314.22 | loss  4.46 | ppl    86.339
----------------------------------------------------------------------------------------------------
| Eval 138 at step    55200 | time: 130.39s | valid loss  4.30 | valid ppl    74.048
----------------------------------------------------------------------------------------------------
| epoch 127 step    55250 |    314 batches | lr 0.000207 | ms/batch 411.91 | loss  4.45 | ppl    85.460
| epoch 127 step    55300 |    364 batches | lr 0.000206 | ms/batch 314.12 | loss  4.42 | ppl    82.967
| epoch 127 step    55350 |    414 batches | lr 0.000206 | ms/batch 314.57 | loss  4.46 | ppl    86.393
| epoch 128 step    55400 |     28 batches | lr 0.000206 | ms/batch 309.63 | loss  4.50 | ppl    90.137
| epoch 128 step    55450 |     78 batches | lr 0.000206 | ms/batch 313.56 | loss  4.41 | ppl    82.579
| epoch 128 step    55500 |    128 batches | lr 0.000206 | ms/batch 315.12 | loss  4.45 | ppl    85.614
| epoch 128 step    55550 |    178 batches | lr 0.000205 | ms/batch 314.79 | loss  4.46 | ppl    86.602
| epoch 128 step    55600 |    228 batches | lr 0.000205 | ms/batch 314.21 | loss  4.50 | ppl    89.617
----------------------------------------------------------------------------------------------------
| Eval 139 at step    55600 | time: 130.43s | valid loss  4.31 | valid ppl    74.175
----------------------------------------------------------------------------------------------------
| epoch 128 step    55650 |    278 batches | lr 0.000205 | ms/batch 413.78 | loss  4.50 | ppl    89.701
| epoch 128 step    55700 |    328 batches | lr 0.000205 | ms/batch 314.61 | loss  4.40 | ppl    81.699
| epoch 128 step    55750 |    378 batches | lr 0.000205 | ms/batch 314.59 | loss  4.41 | ppl    82.173
| epoch 128 step    55800 |    428 batches | lr 0.000204 | ms/batch 314.27 | loss  4.45 | ppl    85.882
| epoch 129 step    55850 |     42 batches | lr 0.000204 | ms/batch 310.11 | loss  4.45 | ppl    85.935
| epoch 129 step    55900 |     92 batches | lr 0.000204 | ms/batch 316.75 | loss  4.39 | ppl    80.326
| epoch 129 step    55950 |    142 batches | lr 0.000204 | ms/batch 316.32 | loss  4.45 | ppl    85.533
| epoch 129 step    56000 |    192 batches | lr 0.000203 | ms/batch 316.22 | loss  4.51 | ppl    90.709
----------------------------------------------------------------------------------------------------
| Eval 140 at step    56000 | time: 130.81s | valid loss  4.31 | valid ppl    74.160
----------------------------------------------------------------------------------------------------
| epoch 129 step    56050 |    242 batches | lr 0.000203 | ms/batch 415.14 | loss  4.47 | ppl    87.699
| epoch 129 step    56100 |    292 batches | lr 0.000203 | ms/batch 316.63 | loss  4.47 | ppl    87.637
| epoch 129 step    56150 |    342 batches | lr 0.000203 | ms/batch 315.50 | loss  4.35 | ppl    77.533
| epoch 129 step    56200 |    392 batches | lr 0.000203 | ms/batch 315.86 | loss  4.46 | ppl    86.379
| epoch 130 step    56250 |      6 batches | lr 0.000202 | ms/batch 310.88 | loss  4.46 | ppl    86.846
| epoch 130 step    56300 |     56 batches | lr 0.000202 | ms/batch 315.41 | loss  4.41 | ppl    82.604
| epoch 130 step    56350 |    106 batches | lr 0.000202 | ms/batch 314.37 | loss  4.41 | ppl    82.173
| epoch 130 step    56400 |    156 batches | lr 0.000202 | ms/batch 313.91 | loss  4.44 | ppl    85.140
----------------------------------------------------------------------------------------------------
| Eval 141 at step    56400 | time: 130.86s | valid loss  4.30 | valid ppl    73.580
----------------------------------------------------------------------------------------------------
| epoch 130 step    56450 |    206 batches | lr 0.000202 | ms/batch 446.96 | loss  4.46 | ppl    86.623
| epoch 130 step    56500 |    256 batches | lr 0.000201 | ms/batch 317.34 | loss  4.49 | ppl    88.809
| epoch 130 step    56550 |    306 batches | lr 0.000201 | ms/batch 315.00 | loss  4.47 | ppl    87.248
| epoch 130 step    56600 |    356 batches | lr 0.000201 | ms/batch 316.03 | loss  4.38 | ppl    80.163
| epoch 130 step    56650 |    406 batches | lr 0.000201 | ms/batch 316.44 | loss  4.42 | ppl    82.876
| epoch 131 step    56700 |     20 batches | lr 0.0002 | ms/batch 310.70 | loss  4.48 | ppl    88.476
| epoch 131 step    56750 |     70 batches | lr 0.0002 | ms/batch 316.72 | loss  4.39 | ppl    80.767
| epoch 131 step    56800 |    120 batches | lr 0.0002 | ms/batch 314.88 | loss  4.43 | ppl    83.709
----------------------------------------------------------------------------------------------------
| Eval 142 at step    56800 | time: 131.06s | valid loss  4.29 | valid ppl    72.900
----------------------------------------------------------------------------------------------------
| epoch 131 step    56850 |    170 batches | lr 0.0002 | ms/batch 448.52 | loss  4.42 | ppl    83.038
| epoch 131 step    56900 |    220 batches | lr 0.0002 | ms/batch 314.84 | loss  4.47 | ppl    87.357
| epoch 131 step    56950 |    270 batches | lr 0.000199 | ms/batch 313.44 | loss  4.44 | ppl    84.801
| epoch 131 step    57000 |    320 batches | lr 0.000199 | ms/batch 314.91 | loss  4.44 | ppl    84.696
| epoch 131 step    57050 |    370 batches | lr 0.000199 | ms/batch 314.96 | loss  4.37 | ppl    79.310
| epoch 131 step    57100 |    420 batches | lr 0.000199 | ms/batch 313.83 | loss  4.45 | ppl    85.246
| epoch 132 step    57150 |     34 batches | lr 0.000198 | ms/batch 310.16 | loss  4.48 | ppl    88.069
| epoch 132 step    57200 |     84 batches | lr 0.000198 | ms/batch 315.23 | loss  4.39 | ppl    80.886
----------------------------------------------------------------------------------------------------
| Eval 143 at step    57200 | time: 130.59s | valid loss  4.31 | valid ppl    74.250
----------------------------------------------------------------------------------------------------
| epoch 132 step    57250 |    134 batches | lr 0.000198 | ms/batch 414.75 | loss  4.44 | ppl    84.576
| epoch 132 step    57300 |    184 batches | lr 0.000198 | ms/batch 315.86 | loss  4.42 | ppl    82.843
| epoch 132 step    57350 |    234 batches | lr 0.000198 | ms/batch 314.55 | loss  4.46 | ppl    86.589
| epoch 132 step    57400 |    284 batches | lr 0.000197 | ms/batch 314.41 | loss  4.49 | ppl    89.205
| epoch 132 step    57450 |    334 batches | lr 0.000197 | ms/batch 314.54 | loss  4.37 | ppl    79.081
| epoch 132 step    57500 |    384 batches | lr 0.000197 | ms/batch 314.08 | loss  4.42 | ppl    82.902
| epoch 132 step    57550 |    434 batches | lr 0.000197 | ms/batch 315.63 | loss  4.46 | ppl    86.792
| epoch 133 step    57600 |     48 batches | lr 0.000196 | ms/batch 310.13 | loss  4.41 | ppl    82.086
----------------------------------------------------------------------------------------------------
| Eval 144 at step    57600 | time: 130.73s | valid loss  4.29 | valid ppl    73.302
----------------------------------------------------------------------------------------------------
| epoch 133 step    57650 |     98 batches | lr 0.000196 | ms/batch 415.69 | loss  4.40 | ppl    81.241
| epoch 133 step    57700 |    148 batches | lr 0.000196 | ms/batch 317.34 | loss  4.42 | ppl    83.265
| epoch 133 step    57750 |    198 batches | lr 0.000196 | ms/batch 316.65 | loss  4.45 | ppl    85.527
| epoch 133 step    57800 |    248 batches | lr 0.000196 | ms/batch 315.90 | loss  4.47 | ppl    87.214
| epoch 133 step    57850 |    298 batches | lr 0.000195 | ms/batch 317.63 | loss  4.48 | ppl    88.214
| epoch 133 step    57900 |    348 batches | lr 0.000195 | ms/batch 317.24 | loss  4.34 | ppl    76.516
| epoch 133 step    57950 |    398 batches | lr 0.000195 | ms/batch 317.77 | loss  4.44 | ppl    84.669
| epoch 134 step    58000 |     12 batches | lr 0.000195 | ms/batch 309.88 | loss  4.46 | ppl    86.751
----------------------------------------------------------------------------------------------------
| Eval 145 at step    58000 | time: 131.42s | valid loss  4.29 | valid ppl    72.952
----------------------------------------------------------------------------------------------------
| epoch 134 step    58050 |     62 batches | lr 0.000195 | ms/batch 414.05 | loss  4.39 | ppl    81.038
| epoch 134 step    58100 |    112 batches | lr 0.000194 | ms/batch 315.78 | loss  4.41 | ppl    82.032
| epoch 134 step    58150 |    162 batches | lr 0.000194 | ms/batch 315.88 | loss  4.44 | ppl    84.451
| epoch 134 step    58200 |    212 batches | lr 0.000194 | ms/batch 316.00 | loss  4.44 | ppl    85.074
| epoch 134 step    58250 |    262 batches | lr 0.000194 | ms/batch 316.67 | loss  4.45 | ppl    85.741
| epoch 134 step    58300 |    312 batches | lr 0.000193 | ms/batch 317.57 | loss  4.45 | ppl    85.848
| epoch 134 step    58350 |    362 batches | lr 0.000193 | ms/batch 316.35 | loss  4.39 | ppl    80.773
| epoch 134 step    58400 |    412 batches | lr 0.000193 | ms/batch 316.69 | loss  4.40 | ppl    81.636
----------------------------------------------------------------------------------------------------
| Eval 146 at step    58400 | time: 131.48s | valid loss  4.29 | valid ppl    73.288
----------------------------------------------------------------------------------------------------
| epoch 135 step    58450 |     26 batches | lr 0.000193 | ms/batch 411.45 | loss  4.46 | ppl    86.870
| epoch 135 step    58500 |     76 batches | lr 0.000193 | ms/batch 318.08 | loss  4.38 | ppl    79.558
| epoch 135 step    58550 |    126 batches | lr 0.000192 | ms/batch 316.90 | loss  4.40 | ppl    81.668
| epoch 135 step    58600 |    176 batches | lr 0.000192 | ms/batch 315.81 | loss  4.40 | ppl    81.355
| epoch 135 step    58650 |    226 batches | lr 0.000192 | ms/batch 317.09 | loss  4.45 | ppl    85.480
| epoch 135 step    58700 |    276 batches | lr 0.000192 | ms/batch 315.59 | loss  4.46 | ppl    86.070
| epoch 135 step    58750 |    326 batches | lr 0.000191 | ms/batch 314.98 | loss  4.38 | ppl    79.502
| epoch 135 step    58800 |    376 batches | lr 0.000191 | ms/batch 318.83 | loss  4.40 | ppl    81.814
----------------------------------------------------------------------------------------------------
| Eval 147 at step    58800 | time: 131.48s | valid loss  4.29 | valid ppl    72.628
----------------------------------------------------------------------------------------------------
| epoch 135 step    58850 |    426 batches | lr 0.000191 | ms/batch 451.13 | loss  4.43 | ppl    83.670
| epoch 136 step    58900 |     40 batches | lr 0.000191 | ms/batch 310.66 | loss  4.44 | ppl    85.020
| epoch 136 step    58950 |     90 batches | lr 0.000191 | ms/batch 320.86 | loss  4.36 | ppl    78.471
| epoch 136 step    59000 |    140 batches | lr 0.00019 | ms/batch 314.52 | loss  4.42 | ppl    82.902
| epoch 136 step    59050 |    190 batches | lr 0.00019 | ms/batch 315.76 | loss  4.45 | ppl    85.540
| epoch 136 step    59100 |    240 batches | lr 0.00019 | ms/batch 314.61 | loss  4.47 | ppl    87.343
| epoch 136 step    59150 |    290 batches | lr 0.00019 | ms/batch 326.20 | loss  4.48 | ppl    88.421
| epoch 136 step    59200 |    340 batches | lr 0.000189 | ms/batch 328.67 | loss  4.33 | ppl    75.855
----------------------------------------------------------------------------------------------------
| Eval 148 at step    59200 | time: 132.48s | valid loss  4.28 | valid ppl    72.597
----------------------------------------------------------------------------------------------------
| epoch 136 step    59250 |    390 batches | lr 0.000189 | ms/batch 444.89 | loss  4.42 | ppl    82.759
| epoch 137 step    59300 |      4 batches | lr 0.000189 | ms/batch 309.07 | loss  4.46 | ppl    86.093
| epoch 137 step    59350 |     54 batches | lr 0.000189 | ms/batch 315.33 | loss  4.39 | ppl    80.295
| epoch 137 step    59400 |    104 batches | lr 0.000189 | ms/batch 314.91 | loss  4.39 | ppl    80.710
| epoch 137 step    59450 |    154 batches | lr 0.000188 | ms/batch 314.18 | loss  4.42 | ppl    83.506
| epoch 137 step    59500 |    204 batches | lr 0.000188 | ms/batch 316.03 | loss  4.45 | ppl    85.487
| epoch 137 step    59550 |    254 batches | lr 0.000188 | ms/batch 321.87 | loss  4.43 | ppl    83.971
| epoch 137 step    59600 |    304 batches | lr 0.000188 | ms/batch 331.93 | loss  4.47 | ppl    87.459
----------------------------------------------------------------------------------------------------
| Eval 149 at step    59600 | time: 131.82s | valid loss  4.29 | valid ppl    72.856
----------------------------------------------------------------------------------------------------
| epoch 137 step    59650 |    354 batches | lr 0.000188 | ms/batch 415.11 | loss  4.36 | ppl    77.879
| epoch 137 step    59700 |    404 batches | lr 0.000187 | ms/batch 316.30 | loss  4.42 | ppl    83.350
| epoch 138 step    59750 |     18 batches | lr 0.000187 | ms/batch 310.32 | loss  4.46 | ppl    86.117
| epoch 138 step    59800 |     68 batches | lr 0.000187 | ms/batch 315.61 | loss  4.37 | ppl    79.344
| epoch 138 step    59850 |    118 batches | lr 0.000187 | ms/batch 316.31 | loss  4.44 | ppl    84.576
| epoch 138 step    59900 |    168 batches | lr 0.000186 | ms/batch 313.84 | loss  4.42 | ppl    83.038
| epoch 138 step    59950 |    218 batches | lr 0.000186 | ms/batch 314.21 | loss  4.43 | ppl    84.214
| epoch 138 step    60000 |    268 batches | lr 0.000186 | ms/batch 314.73 | loss  4.44 | ppl    84.392
----------------------------------------------------------------------------------------------------
| Eval 150 at step    60000 | time: 130.77s | valid loss  4.29 | valid ppl    72.715
----------------------------------------------------------------------------------------------------
| epoch 138 step    60050 |    318 batches | lr 0.000186 | ms/batch 412.51 | loss  4.41 | ppl    82.630
| epoch 138 step    60100 |    368 batches | lr 0.000186 | ms/batch 313.97 | loss  4.35 | ppl    77.201
| epoch 138 step    60150 |    418 batches | lr 0.000185 | ms/batch 313.86 | loss  4.44 | ppl    84.451
| epoch 139 step    60200 |     32 batches | lr 0.000185 | ms/batch 309.08 | loss  4.46 | ppl    86.521
| epoch 139 step    60250 |     82 batches | lr 0.000185 | ms/batch 313.46 | loss  4.39 | ppl    80.653
| epoch 139 step    60300 |    132 batches | lr 0.000185 | ms/batch 315.63 | loss  4.41 | ppl    82.224
| epoch 139 step    60350 |    182 batches | lr 0.000184 | ms/batch 316.35 | loss  4.43 | ppl    83.598
| epoch 139 step    60400 |    232 batches | lr 0.000184 | ms/batch 316.35 | loss  4.44 | ppl    84.524
----------------------------------------------------------------------------------------------------
| Eval 151 at step    60400 | time: 130.60s | valid loss  4.28 | valid ppl    72.474
----------------------------------------------------------------------------------------------------
| epoch 139 step    60450 |    282 batches | lr 0.000184 | ms/batch 449.45 | loss  4.44 | ppl    84.596
| epoch 139 step    60500 |    332 batches | lr 0.000184 | ms/batch 317.07 | loss  4.36 | ppl    78.062
| epoch 139 step    60550 |    382 batches | lr 0.000184 | ms/batch 315.75 | loss  4.40 | ppl    81.738
| epoch 139 step    60600 |    432 batches | lr 0.000183 | ms/batch 314.89 | loss  4.43 | ppl    83.925
| epoch 140 step    60650 |     46 batches | lr 0.000183 | ms/batch 307.73 | loss  4.39 | ppl    80.452
| epoch 140 step    60700 |     96 batches | lr 0.000183 | ms/batch 313.86 | loss  4.35 | ppl    77.855
| epoch 140 step    60750 |    146 batches | lr 0.000183 | ms/batch 314.47 | loss  4.39 | ppl    80.357
| epoch 140 step    60800 |    196 batches | lr 0.000182 | ms/batch 314.26 | loss  4.43 | ppl    83.656
----------------------------------------------------------------------------------------------------
| Eval 152 at step    60800 | time: 130.71s | valid loss  4.29 | valid ppl    72.944
----------------------------------------------------------------------------------------------------
| epoch 140 step    60850 |    246 batches | lr 0.000182 | ms/batch 413.89 | loss  4.41 | ppl    82.521
| epoch 140 step    60900 |    296 batches | lr 0.000182 | ms/batch 314.06 | loss  4.46 | ppl    86.860
| epoch 140 step    60950 |    346 batches | lr 0.000182 | ms/batch 315.66 | loss  4.31 | ppl    74.720
| epoch 140 step    61000 |    396 batches | lr 0.000182 | ms/batch 315.36 | loss  4.41 | ppl    82.675
| epoch 141 step    61050 |     10 batches | lr 0.000181 | ms/batch 308.43 | loss  4.45 | ppl    85.871
| epoch 141 step    61100 |     60 batches | lr 0.000181 | ms/batch 316.37 | loss  4.35 | ppl    77.491
| epoch 141 step    61150 |    110 batches | lr 0.000181 | ms/batch 315.55 | loss  4.39 | ppl    80.546
| epoch 141 step    61200 |    160 batches | lr 0.000181 | ms/batch 313.27 | loss  4.40 | ppl    81.381
----------------------------------------------------------------------------------------------------
| Eval 153 at step    61200 | time: 130.59s | valid loss  4.27 | valid ppl    71.661
----------------------------------------------------------------------------------------------------
| epoch 141 step    61250 |    210 batches | lr 0.00018 | ms/batch 452.06 | loss  4.43 | ppl    83.827
| epoch 141 step    61300 |    260 batches | lr 0.00018 | ms/batch 316.30 | loss  4.43 | ppl    83.545
| epoch 141 step    61350 |    310 batches | lr 0.00018 | ms/batch 317.18 | loss  4.43 | ppl    83.794
| epoch 141 step    61400 |    360 batches | lr 0.00018 | ms/batch 316.03 | loss  4.34 | ppl    76.776
| epoch 141 step    61450 |    410 batches | lr 0.00018 | ms/batch 315.55 | loss  4.38 | ppl    79.701
| epoch 142 step    61500 |     24 batches | lr 0.000179 | ms/batch 310.51 | loss  4.46 | ppl    86.525
| epoch 142 step    61550 |     74 batches | lr 0.000179 | ms/batch 315.97 | loss  4.34 | ppl    76.972
| epoch 142 step    61600 |    124 batches | lr 0.000179 | ms/batch 315.01 | loss  4.40 | ppl    81.661
----------------------------------------------------------------------------------------------------
| Eval 154 at step    61600 | time: 131.24s | valid loss  4.28 | valid ppl    72.374
----------------------------------------------------------------------------------------------------
| epoch 142 step    61650 |    174 batches | lr 0.000179 | ms/batch 415.47 | loss  4.39 | ppl    80.615
| epoch 142 step    61700 |    224 batches | lr 0.000179 | ms/batch 314.32 | loss  4.41 | ppl    82.347
| epoch 142 step    61750 |    274 batches | lr 0.000178 | ms/batch 317.57 | loss  4.43 | ppl    83.820
| epoch 142 step    61800 |    324 batches | lr 0.000178 | ms/batch 320.24 | loss  4.37 | ppl    78.846
| epoch 142 step    61850 |    374 batches | lr 0.000178 | ms/batch 315.47 | loss  4.38 | ppl    79.982
| epoch 142 step    61900 |    424 batches | lr 0.000178 | ms/batch 317.12 | loss  4.43 | ppl    83.892
| epoch 143 step    61950 |     38 batches | lr 0.000177 | ms/batch 311.20 | loss  4.42 | ppl    83.138
| epoch 143 step    62000 |     88 batches | lr 0.000177 | ms/batch 315.17 | loss  4.33 | ppl    76.313
----------------------------------------------------------------------------------------------------
| Eval 155 at step    62000 | time: 131.34s | valid loss  4.29 | valid ppl    72.945
----------------------------------------------------------------------------------------------------
| epoch 143 step    62050 |    138 batches | lr 0.000177 | ms/batch 415.70 | loss  4.39 | ppl    80.515
| epoch 143 step    62100 |    188 batches | lr 0.000177 | ms/batch 324.88 | loss  4.40 | ppl    81.674
| epoch 143 step    62150 |    238 batches | lr 0.000177 | ms/batch 318.42 | loss  4.42 | ppl    83.233
| epoch 143 step    62200 |    288 batches | lr 0.000176 | ms/batch 314.56 | loss  4.45 | ppl    85.607
| epoch 143 step    62250 |    338 batches | lr 0.000176 | ms/batch 314.06 | loss  4.31 | ppl    74.156
| epoch 143 step    62300 |    388 batches | lr 0.000176 | ms/batch 314.08 | loss  4.41 | ppl    82.289
| epoch 144 step    62350 |      2 batches | lr 0.000176 | ms/batch 309.65 | loss  4.44 | ppl    84.825
| epoch 144 step    62400 |     52 batches | lr 0.000175 | ms/batch 319.26 | loss  4.37 | ppl    79.310
----------------------------------------------------------------------------------------------------
| Eval 156 at step    62400 | time: 131.49s | valid loss  4.27 | valid ppl    71.639
----------------------------------------------------------------------------------------------------
| epoch 144 step    62450 |    102 batches | lr 0.000175 | ms/batch 451.29 | loss  4.38 | ppl    79.483
| epoch 144 step    62500 |    152 batches | lr 0.000175 | ms/batch 317.25 | loss  4.38 | ppl    79.844
| epoch 144 step    62550 |    202 batches | lr 0.000175 | ms/batch 316.00 | loss  4.44 | ppl    85.147
| epoch 144 step    62600 |    252 batches | lr 0.000175 | ms/batch 317.56 | loss  4.43 | ppl    83.925
| epoch 144 step    62650 |    302 batches | lr 0.000174 | ms/batch 316.73 | loss  4.45 | ppl    85.888
| epoch 144 step    62700 |    352 batches | lr 0.000174 | ms/batch 316.22 | loss  4.29 | ppl    73.321
| epoch 144 step    62750 |    402 batches | lr 0.000174 | ms/batch 316.25 | loss  4.43 | ppl    84.174
| epoch 145 step    62800 |     16 batches | lr 0.000174 | ms/batch 309.75 | loss  4.44 | ppl    85.054
----------------------------------------------------------------------------------------------------
| Eval 157 at step    62800 | time: 131.37s | valid loss  4.28 | valid ppl    72.160
----------------------------------------------------------------------------------------------------
| epoch 145 step    62850 |     66 batches | lr 0.000173 | ms/batch 411.00 | loss  4.32 | ppl    75.524
| epoch 145 step    62900 |    116 batches | lr 0.000173 | ms/batch 314.46 | loss  4.39 | ppl    80.345
| epoch 145 step    62950 |    166 batches | lr 0.000173 | ms/batch 314.58 | loss  4.37 | ppl    79.037
| epoch 145 step    63000 |    216 batches | lr 0.000173 | ms/batch 315.58 | loss  4.43 | ppl    83.735
| epoch 145 step    63050 |    266 batches | lr 0.000173 | ms/batch 315.35 | loss  4.44 | ppl    84.524
| epoch 145 step    63100 |    316 batches | lr 0.000172 | ms/batch 314.53 | loss  4.40 | ppl    81.610
| epoch 145 step    63150 |    366 batches | lr 0.000172 | ms/batch 315.70 | loss  4.36 | ppl    78.294
| epoch 145 step    63200 |    416 batches | lr 0.000172 | ms/batch 315.74 | loss  4.38 | ppl    79.533
----------------------------------------------------------------------------------------------------
| Eval 158 at step    63200 | time: 130.89s | valid loss  4.28 | valid ppl    72.109
----------------------------------------------------------------------------------------------------
| epoch 146 step    63250 |     30 batches | lr 0.000172 | ms/batch 407.89 | loss  4.43 | ppl    84.122
| epoch 146 step    63300 |     80 batches | lr 0.000171 | ms/batch 314.64 | loss  4.34 | ppl    76.936
| epoch 146 step    63350 |    130 batches | lr 0.000171 | ms/batch 315.34 | loss  4.39 | ppl    80.577
| epoch 146 step    63400 |    180 batches | lr 0.000171 | ms/batch 315.90 | loss  4.36 | ppl    78.606
| epoch 146 step    63450 |    230 batches | lr 0.000171 | ms/batch 314.30 | loss  4.40 | ppl    81.655
| epoch 146 step    63500 |    280 batches | lr 0.000171 | ms/batch 315.87 | loss  4.43 | ppl    83.899
| epoch 146 step    63550 |    330 batches | lr 0.00017 | ms/batch 315.50 | loss  4.35 | ppl    77.182
| epoch 146 step    63600 |    380 batches | lr 0.00017 | ms/batch 314.35 | loss  4.38 | ppl    80.038
----------------------------------------------------------------------------------------------------
| Eval 159 at step    63600 | time: 130.66s | valid loss  4.27 | valid ppl    71.734
----------------------------------------------------------------------------------------------------
| epoch 146 step    63650 |    430 batches | lr 0.00017 | ms/batch 411.76 | loss  4.40 | ppl    81.228
| epoch 147 step    63700 |     44 batches | lr 0.00017 | ms/batch 308.85 | loss  4.40 | ppl    81.076
| epoch 147 step    63750 |     94 batches | lr 0.00017 | ms/batch 315.18 | loss  4.32 | ppl    75.489
| epoch 147 step    63800 |    144 batches | lr 0.000169 | ms/batch 313.72 | loss  4.40 | ppl    81.375
| epoch 147 step    63850 |    194 batches | lr 0.000169 | ms/batch 313.41 | loss  4.42 | ppl    83.480
| epoch 147 step    63900 |    244 batches | lr 0.000169 | ms/batch 315.79 | loss  4.42 | ppl    82.967
| epoch 147 step    63950 |    294 batches | lr 0.000169 | ms/batch 328.47 | loss  4.43 | ppl    83.971
| epoch 147 step    64000 |    344 batches | lr 0.000168 | ms/batch 330.96 | loss  4.29 | ppl    73.143
----------------------------------------------------------------------------------------------------
| Eval 160 at step    64000 | time: 132.19s | valid loss  4.28 | valid ppl    72.188
----------------------------------------------------------------------------------------------------
| epoch 147 step    64050 |    394 batches | lr 0.000168 | ms/batch 434.52 | loss  4.37 | ppl    79.266
| epoch 148 step    64100 |      8 batches | lr 0.000168 | ms/batch 325.79 | loss  4.40 | ppl    81.448
| epoch 148 step    64150 |     58 batches | lr 0.000168 | ms/batch 332.14 | loss  4.35 | ppl    77.104
| epoch 148 step    64200 |    108 batches | lr 0.000168 | ms/batch 331.98 | loss  4.34 | ppl    76.852
| epoch 148 step    64250 |    158 batches | lr 0.000167 | ms/batch 332.09 | loss  4.39 | ppl    80.326
| epoch 148 step    64300 |    208 batches | lr 0.000167 | ms/batch 331.02 | loss  4.40 | ppl    81.623
| epoch 148 step    64350 |    258 batches | lr 0.000167 | ms/batch 332.42 | loss  4.42 | ppl    82.759
| epoch 148 step    64400 |    308 batches | lr 0.000167 | ms/batch 331.81 | loss  4.43 | ppl    84.017
----------------------------------------------------------------------------------------------------
| Eval 161 at step    64400 | time: 137.59s | valid loss  4.27 | valid ppl    71.872
----------------------------------------------------------------------------------------------------
| epoch 148 step    64450 |    358 batches | lr 0.000166 | ms/batch 435.06 | loss  4.33 | ppl    75.873
| epoch 148 step    64500 |    408 batches | lr 0.000166 | ms/batch 331.76 | loss  4.39 | ppl    80.351
| epoch 149 step    64550 |     22 batches | lr 0.000166 | ms/batch 324.99 | loss  4.42 | ppl    83.246
| epoch 149 step    64600 |     72 batches | lr 0.000166 | ms/batch 330.83 | loss  4.31 | ppl    74.744
| epoch 149 step    64650 |    122 batches | lr 0.000166 | ms/batch 330.27 | loss  4.36 | ppl    78.282
| epoch 149 step    64700 |    172 batches | lr 0.000165 | ms/batch 314.63 | loss  4.40 | ppl    81.222
| epoch 149 step    64750 |    222 batches | lr 0.000165 | ms/batch 315.43 | loss  4.41 | ppl    82.128
| epoch 149 step    64800 |    272 batches | lr 0.000165 | ms/batch 314.92 | loss  4.45 | ppl    85.286
----------------------------------------------------------------------------------------------------
| Eval 162 at step    64800 | time: 134.63s | valid loss  4.27 | valid ppl    71.665
----------------------------------------------------------------------------------------------------
| epoch 149 step    64850 |    322 batches | lr 0.000165 | ms/batch 413.61 | loss  4.38 | ppl    79.539
| epoch 149 step    64900 |    372 batches | lr 0.000164 | ms/batch 314.72 | loss  4.35 | ppl    77.806
| epoch 149 step    64950 |    422 batches | lr 0.000164 | ms/batch 322.96 | loss  4.37 | ppl    78.889
| epoch 150 step    65000 |     36 batches | lr 0.000164 | ms/batch 319.47 | loss  4.43 | ppl    83.872
| epoch 150 step    65050 |     86 batches | lr 0.000164 | ms/batch 325.30 | loss  4.33 | ppl    75.613
| epoch 150 step    65100 |    136 batches | lr 0.000164 | ms/batch 324.57 | loss  4.39 | ppl    80.483
| epoch 150 step    65150 |    186 batches | lr 0.000163 | ms/batch 325.31 | loss  4.41 | ppl    82.128
| epoch 150 step    65200 |    236 batches | lr 0.000163 | ms/batch 321.27 | loss  4.38 | ppl    79.919
----------------------------------------------------------------------------------------------------
| Eval 163 at step    65200 | time: 133.46s | valid loss  4.28 | valid ppl    71.921
----------------------------------------------------------------------------------------------------
| epoch 150 step    65250 |    286 batches | lr 0.000163 | ms/batch 431.42 | loss  4.41 | ppl    82.212
| epoch 150 step    65300 |    336 batches | lr 0.000163 | ms/batch 317.07 | loss  4.29 | ppl    72.881
| epoch 150 step    65350 |    386 batches | lr 0.000162 | ms/batch 315.85 | loss  4.39 | ppl    80.389
| epoch 150 step    65400 |    436 batches | lr 0.000162 | ms/batch 311.16 | loss  4.41 | ppl    82.019
| epoch 151 step    65450 |     50 batches | lr 0.000162 | ms/batch 312.23 | loss  4.38 | ppl    79.446
| epoch 151 step    65500 |    100 batches | lr 0.000162 | ms/batch 313.67 | loss  4.32 | ppl    75.395
| epoch 151 step    65550 |    150 batches | lr 0.000162 | ms/batch 314.08 | loss  4.36 | ppl    78.545
| epoch 151 step    65600 |    200 batches | lr 0.000161 | ms/batch 313.90 | loss  4.41 | ppl    82.424
----------------------------------------------------------------------------------------------------
| Eval 164 at step    65600 | time: 131.36s | valid loss  4.27 | valid ppl    71.811
----------------------------------------------------------------------------------------------------
| epoch 151 step    65650 |    250 batches | lr 0.000161 | ms/batch 422.64 | loss  4.41 | ppl    82.160
| epoch 151 step    65700 |    300 batches | lr 0.000161 | ms/batch 314.94 | loss  4.43 | ppl    83.827
| epoch 151 step    65750 |    350 batches | lr 0.000161 | ms/batch 315.13 | loss  4.30 | ppl    73.476
| epoch 151 step    65800 |    400 batches | lr 0.000161 | ms/batch 313.87 | loss  4.38 | ppl    79.608
| epoch 152 step    65850 |     14 batches | lr 0.00016 | ms/batch 308.44 | loss  4.42 | ppl    82.688
| epoch 152 step    65900 |     64 batches | lr 0.00016 | ms/batch 313.32 | loss  4.31 | ppl    74.717
| epoch 152 step    65950 |    114 batches | lr 0.00016 | ms/batch 313.49 | loss  4.35 | ppl    77.400
| epoch 152 step    66000 |    164 batches | lr 0.00016 | ms/batch 314.02 | loss  4.37 | ppl    79.266
----------------------------------------------------------------------------------------------------
| Eval 165 at step    66000 | time: 130.85s | valid loss  4.26 | valid ppl    71.097
----------------------------------------------------------------------------------------------------
| epoch 152 step    66050 |    214 batches | lr 0.000159 | ms/batch 448.69 | loss  4.41 | ppl    81.955
| epoch 152 step    66100 |    264 batches | lr 0.000159 | ms/batch 316.25 | loss  4.39 | ppl    80.710
| epoch 152 step    66150 |    314 batches | lr 0.000159 | ms/batch 316.31 | loss  4.38 | ppl    79.620
| epoch 152 step    66200 |    364 batches | lr 0.000159 | ms/batch 314.90 | loss  4.30 | ppl    73.378
| epoch 152 step    66250 |    414 batches | lr 0.000159 | ms/batch 316.01 | loss  4.37 | ppl    78.760
| epoch 153 step    66300 |     28 batches | lr 0.000158 | ms/batch 310.39 | loss  4.39 | ppl    80.386
| epoch 153 step    66350 |     78 batches | lr 0.000158 | ms/batch 313.22 | loss  4.34 | ppl    76.498
| epoch 153 step    66400 |    128 batches | lr 0.000158 | ms/batch 315.22 | loss  4.34 | ppl    76.618
----------------------------------------------------------------------------------------------------
| Eval 166 at step    66400 | time: 130.81s | valid loss  4.26 | valid ppl    70.838
----------------------------------------------------------------------------------------------------
| epoch 153 step    66450 |    178 batches | lr 0.000158 | ms/batch 445.27 | loss  4.39 | ppl    80.546
| epoch 153 step    66500 |    228 batches | lr 0.000157 | ms/batch 313.48 | loss  4.39 | ppl    80.622
| epoch 153 step    66550 |    278 batches | lr 0.000157 | ms/batch 313.88 | loss  4.39 | ppl    80.458
| epoch 153 step    66600 |    328 batches | lr 0.000157 | ms/batch 312.82 | loss  4.31 | ppl    74.621
| epoch 153 step    66650 |    378 batches | lr 0.000157 | ms/batch 316.45 | loss  4.34 | ppl    76.755
| epoch 153 step    66700 |    428 batches | lr 0.000157 | ms/batch 317.29 | loss  4.40 | ppl    81.699
| epoch 154 step    66750 |     42 batches | lr 0.000156 | ms/batch 308.67 | loss  4.36 | ppl    78.404
| epoch 154 step    66800 |     92 batches | lr 0.000156 | ms/batch 315.79 | loss  4.29 | ppl    72.932
----------------------------------------------------------------------------------------------------
| Eval 167 at step    66800 | time: 130.56s | valid loss  4.27 | valid ppl    71.828
----------------------------------------------------------------------------------------------------
| epoch 154 step    66850 |    142 batches | lr 0.000156 | ms/batch 416.44 | loss  4.36 | ppl    77.873
| epoch 154 step    66900 |    192 batches | lr 0.000156 | ms/batch 314.30 | loss  4.41 | ppl    81.872
| epoch 154 step    66950 |    242 batches | lr 0.000155 | ms/batch 313.96 | loss  4.41 | ppl    81.962
| epoch 154 step    67000 |    292 batches | lr 0.000155 | ms/batch 313.97 | loss  4.42 | ppl    83.012
| epoch 154 step    67050 |    342 batches | lr 0.000155 | ms/batch 315.38 | loss  4.28 | ppl    72.060
| epoch 154 step    67100 |    392 batches | lr 0.000155 | ms/batch 316.57 | loss  4.37 | ppl    79.062
| epoch 155 step    67150 |      6 batches | lr 0.000155 | ms/batch 313.98 | loss  4.37 | ppl    79.375
| epoch 155 step    67200 |     56 batches | lr 0.000154 | ms/batch 321.78 | loss  4.34 | ppl    76.702
----------------------------------------------------------------------------------------------------
| Eval 168 at step    67200 | time: 131.35s | valid loss  4.28 | valid ppl    71.928
----------------------------------------------------------------------------------------------------
| epoch 155 step    67250 |    106 batches | lr 0.000154 | ms/batch 415.81 | loss  4.32 | ppl    75.489
| epoch 155 step    67300 |    156 batches | lr 0.000154 | ms/batch 323.28 | loss  4.38 | ppl    79.570
| epoch 155 step    67350 |    206 batches | lr 0.000154 | ms/batch 330.34 | loss  4.37 | ppl    78.945
| epoch 155 step    67400 |    256 batches | lr 0.000154 | ms/batch 330.09 | loss  4.40 | ppl    81.553
| epoch 155 step    67450 |    306 batches | lr 0.000153 | ms/batch 331.21 | loss  4.40 | ppl    81.719
| epoch 155 step    67500 |    356 batches | lr 0.000153 | ms/batch 324.50 | loss  4.31 | ppl    74.107
| epoch 155 step    67550 |    406 batches | lr 0.000153 | ms/batch 315.16 | loss  4.32 | ppl    75.400
| epoch 156 step    67600 |     20 batches | lr 0.000153 | ms/batch 309.52 | loss  4.42 | ppl    83.428
----------------------------------------------------------------------------------------------------
| Eval 169 at step    67600 | time: 133.94s | valid loss  4.27 | valid ppl    71.195
----------------------------------------------------------------------------------------------------
| epoch 156 step    67650 |     70 batches | lr 0.000152 | ms/batch 411.86 | loss  4.32 | ppl    75.533
| epoch 156 step    67700 |    120 batches | lr 0.000152 | ms/batch 312.75 | loss  4.34 | ppl    76.403
| epoch 156 step    67750 |    170 batches | lr 0.000152 | ms/batch 313.61 | loss  4.37 | ppl    79.427
| epoch 156 step    67800 |    220 batches | lr 0.000152 | ms/batch 314.12 | loss  4.40 | ppl    81.757
| epoch 156 step    67850 |    270 batches | lr 0.000152 | ms/batch 313.18 | loss  4.38 | ppl    79.975
| epoch 156 step    67900 |    320 batches | lr 0.000151 | ms/batch 313.96 | loss  4.36 | ppl    78.111
| epoch 156 step    67950 |    370 batches | lr 0.000151 | ms/batch 313.98 | loss  4.31 | ppl    74.784
| epoch 156 step    68000 |    420 batches | lr 0.000151 | ms/batch 312.90 | loss  4.37 | ppl    79.087
----------------------------------------------------------------------------------------------------
| Eval 170 at step    68000 | time: 130.31s | valid loss  4.26 | valid ppl    70.850
----------------------------------------------------------------------------------------------------
| epoch 157 step    68050 |     34 batches | lr 0.000151 | ms/batch 406.33 | loss  4.36 | ppl    78.117
| epoch 157 step    68100 |     84 batches | lr 0.00015 | ms/batch 313.19 | loss  4.30 | ppl    73.516
| epoch 157 step    68150 |    134 batches | lr 0.00015 | ms/batch 314.48 | loss  4.35 | ppl    77.333
| epoch 157 step    68200 |    184 batches | lr 0.00015 | ms/batch 314.29 | loss  4.38 | ppl    79.932
| epoch 157 step    68250 |    234 batches | lr 0.00015 | ms/batch 313.04 | loss  4.40 | ppl    81.553
| epoch 157 step    68300 |    284 batches | lr 0.00015 | ms/batch 318.93 | loss  4.38 | ppl    79.701
| epoch 157 step    68350 |    334 batches | lr 0.000149 | ms/batch 316.14 | loss  4.29 | ppl    73.218
| epoch 157 step    68400 |    384 batches | lr 0.000149 | ms/batch 315.18 | loss  4.36 | ppl    77.928
----------------------------------------------------------------------------------------------------
| Eval 171 at step    68400 | time: 130.63s | valid loss  4.27 | valid ppl    71.275
----------------------------------------------------------------------------------------------------
| epoch 157 step    68450 |    434 batches | lr 0.000149 | ms/batch 427.32 | loss  4.38 | ppl    79.975
| epoch 158 step    68500 |     48 batches | lr 0.000149 | ms/batch 314.49 | loss  4.33 | ppl    76.117
| epoch 158 step    68550 |     98 batches | lr 0.000148 | ms/batch 313.89 | loss  4.31 | ppl    74.650
| epoch 158 step    68600 |    148 batches | lr 0.000148 | ms/batch 313.68 | loss  4.35 | ppl    77.612
| epoch 158 step    68650 |    198 batches | lr 0.000148 | ms/batch 315.01 | loss  4.37 | ppl    79.204
| epoch 158 step    68700 |    248 batches | lr 0.000148 | ms/batch 315.69 | loss  4.37 | ppl    78.668
| epoch 158 step    68750 |    298 batches | lr 0.000148 | ms/batch 317.42 | loss  4.42 | ppl    82.714
| epoch 158 step    68800 |    348 batches | lr 0.000147 | ms/batch 315.88 | loss  4.25 | ppl    70.427
----------------------------------------------------------------------------------------------------
| Eval 172 at step    68800 | time: 131.66s | valid loss  4.26 | valid ppl    70.884
----------------------------------------------------------------------------------------------------
| epoch 158 step    68850 |    398 batches | lr 0.000147 | ms/batch 415.20 | loss  4.38 | ppl    79.726
| epoch 159 step    68900 |     12 batches | lr 0.000147 | ms/batch 310.09 | loss  4.39 | ppl    80.644
| epoch 159 step    68950 |     62 batches | lr 0.000147 | ms/batch 315.50 | loss  4.32 | ppl    75.483
| epoch 159 step    69000 |    112 batches | lr 0.000147 | ms/batch 316.61 | loss  4.35 | ppl    77.285
| epoch 159 step    69050 |    162 batches | lr 0.000146 | ms/batch 323.64 | loss  4.34 | ppl    76.361
| epoch 159 step    69100 |    212 batches | lr 0.000146 | ms/batch 331.36 | loss  4.36 | ppl    78.520
| epoch 159 step    69150 |    262 batches | lr 0.000146 | ms/batch 330.07 | loss  4.36 | ppl    78.269
| epoch 159 step    69200 |    312 batches | lr 0.000146 | ms/batch 331.15 | loss  4.34 | ppl    77.071
----------------------------------------------------------------------------------------------------
| Eval 173 at step    69200 | time: 133.92s | valid loss  4.26 | valid ppl    70.649
----------------------------------------------------------------------------------------------------
| epoch 159 step    69250 |    362 batches | lr 0.000145 | ms/batch 459.74 | loss  4.28 | ppl    72.433
| epoch 159 step    69300 |    412 batches | lr 0.000145 | ms/batch 314.92 | loss  4.33 | ppl    76.170
| epoch 160 step    69350 |     26 batches | lr 0.000145 | ms/batch 308.39 | loss  4.39 | ppl    80.722
| epoch 160 step    69400 |     76 batches | lr 0.000145 | ms/batch 314.82 | loss  4.27 | ppl    71.818
| epoch 160 step    69450 |    126 batches | lr 0.000145 | ms/batch 315.11 | loss  4.36 | ppl    77.989
| epoch 160 step    69500 |    176 batches | lr 0.000144 | ms/batch 315.01 | loss  4.35 | ppl    77.478
| epoch 160 step    69550 |    226 batches | lr 0.000144 | ms/batch 315.93 | loss  4.36 | ppl    78.471
| epoch 160 step    69600 |    276 batches | lr 0.000144 | ms/batch 314.52 | loss  4.37 | ppl    79.434
----------------------------------------------------------------------------------------------------
| Eval 174 at step    69600 | time: 130.90s | valid loss  4.26 | valid ppl    70.658
----------------------------------------------------------------------------------------------------
| epoch 160 step    69650 |    326 batches | lr 0.000144 | ms/batch 414.05 | loss  4.31 | ppl    74.101
| epoch 160 step    69700 |    376 batches | lr 0.000144 | ms/batch 314.86 | loss  4.32 | ppl    75.095
| epoch 160 step    69750 |    426 batches | lr 0.000143 | ms/batch 313.72 | loss  4.34 | ppl    77.056
| epoch 161 step    69800 |     40 batches | lr 0.000143 | ms/batch 309.59 | loss  4.36 | ppl    77.952
| epoch 161 step    69850 |     90 batches | lr 0.000143 | ms/batch 316.18 | loss  4.28 | ppl    72.257
| epoch 161 step    69900 |    140 batches | lr 0.000143 | ms/batch 315.22 | loss  4.34 | ppl    76.630
| epoch 161 step    69950 |    190 batches | lr 0.000142 | ms/batch 315.48 | loss  4.37 | ppl    79.019
| epoch 161 step    70000 |    240 batches | lr 0.000142 | ms/batch 314.98 | loss  4.38 | ppl    80.013
----------------------------------------------------------------------------------------------------
| Eval 175 at step    70000 | time: 130.71s | valid loss  4.26 | valid ppl    70.712
----------------------------------------------------------------------------------------------------
| epoch 161 step    70050 |    290 batches | lr 0.000142 | ms/batch 412.30 | loss  4.42 | ppl    83.057
| epoch 161 step    70100 |    340 batches | lr 0.000142 | ms/batch 313.42 | loss  4.27 | ppl    71.810
| epoch 161 step    70150 |    390 batches | lr 0.000142 | ms/batch 313.25 | loss  4.35 | ppl    77.454
| epoch 162 step    70200 |      4 batches | lr 0.000141 | ms/batch 308.15 | loss  4.38 | ppl    79.496
| epoch 162 step    70250 |     54 batches | lr 0.000141 | ms/batch 314.74 | loss  4.31 | ppl    74.574
| epoch 162 step    70300 |    104 batches | lr 0.000141 | ms/batch 316.10 | loss  4.29 | ppl    72.909
| epoch 162 step    70350 |    154 batches | lr 0.000141 | ms/batch 316.89 | loss  4.35 | ppl    77.345
| epoch 162 step    70400 |    204 batches | lr 0.00014 | ms/batch 317.91 | loss  4.37 | ppl    78.742
----------------------------------------------------------------------------------------------------
| Eval 176 at step    70400 | time: 130.65s | valid loss  4.26 | valid ppl    71.092
----------------------------------------------------------------------------------------------------
| epoch 162 step    70450 |    254 batches | lr 0.00014 | ms/batch 414.79 | loss  4.37 | ppl    79.142
| epoch 162 step    70500 |    304 batches | lr 0.00014 | ms/batch 316.36 | loss  4.41 | ppl    81.891
| epoch 162 step    70550 |    354 batches | lr 0.00014 | ms/batch 316.21 | loss  4.24 | ppl    69.438
| epoch 162 step    70600 |    404 batches | lr 0.00014 | ms/batch 317.01 | loss  4.34 | ppl    76.888
| epoch 163 step    70650 |     18 batches | lr 0.000139 | ms/batch 309.48 | loss  4.37 | ppl    78.662
| epoch 163 step    70700 |     68 batches | lr 0.000139 | ms/batch 316.00 | loss  4.29 | ppl    73.158
| epoch 163 step    70750 |    118 batches | lr 0.000139 | ms/batch 316.36 | loss  4.34 | ppl    76.600
| epoch 163 step    70800 |    168 batches | lr 0.000139 | ms/batch 315.49 | loss  4.33 | ppl    75.672
----------------------------------------------------------------------------------------------------
| Eval 177 at step    70800 | time: 131.07s | valid loss  4.25 | valid ppl    70.302
----------------------------------------------------------------------------------------------------
| epoch 163 step    70850 |    218 batches | lr 0.000139 | ms/batch 459.46 | loss  4.38 | ppl    79.813
| epoch 163 step    70900 |    268 batches | lr 0.000138 | ms/batch 315.51 | loss  4.36 | ppl    78.306
| epoch 163 step    70950 |    318 batches | lr 0.000138 | ms/batch 316.10 | loss  4.33 | ppl    76.236
| epoch 163 step    71000 |    368 batches | lr 0.000138 | ms/batch 316.61 | loss  4.29 | ppl    73.138
| epoch 163 step    71050 |    418 batches | lr 0.000138 | ms/batch 317.86 | loss  4.34 | ppl    76.432
| epoch 164 step    71100 |     32 batches | lr 0.000137 | ms/batch 323.85 | loss  4.41 | ppl    81.920
| epoch 164 step    71150 |     82 batches | lr 0.000137 | ms/batch 333.84 | loss  4.27 | ppl    71.293
| epoch 164 step    71200 |    132 batches | lr 0.000137 | ms/batch 332.31 | loss  4.33 | ppl    76.218
----------------------------------------------------------------------------------------------------
| Eval 178 at step    71200 | time: 133.56s | valid loss  4.25 | valid ppl    69.956
----------------------------------------------------------------------------------------------------
| epoch 164 step    71250 |    182 batches | lr 0.000137 | ms/batch 450.21 | loss  4.34 | ppl    76.630
| epoch 164 step    71300 |    232 batches | lr 0.000137 | ms/batch 316.47 | loss  4.36 | ppl    78.373
| epoch 164 step    71350 |    282 batches | lr 0.000136 | ms/batch 316.72 | loss  4.37 | ppl    78.939
| epoch 164 step    71400 |    332 batches | lr 0.000136 | ms/batch 317.08 | loss  4.30 | ppl    73.562
| epoch 164 step    71450 |    382 batches | lr 0.000136 | ms/batch 318.47 | loss  4.32 | ppl    75.465
| epoch 164 step    71500 |    432 batches | lr 0.000136 | ms/batch 316.78 | loss  4.36 | ppl    78.318
| epoch 165 step    71550 |     46 batches | lr 0.000136 | ms/batch 310.81 | loss  4.31 | ppl    74.767
| epoch 165 step    71600 |     96 batches | lr 0.000135 | ms/batch 316.30 | loss  4.30 | ppl    73.395
----------------------------------------------------------------------------------------------------
| Eval 179 at step    71600 | time: 131.42s | valid loss  4.26 | valid ppl    71.156
----------------------------------------------------------------------------------------------------
| epoch 165 step    71650 |    146 batches | lr 0.000135 | ms/batch 414.12 | loss  4.34 | ppl    76.779
| epoch 165 step    71700 |    196 batches | lr 0.000135 | ms/batch 316.84 | loss  4.33 | ppl    76.283
| epoch 165 step    71750 |    246 batches | lr 0.000135 | ms/batch 315.93 | loss  4.34 | ppl    76.912
| epoch 165 step    71800 |    296 batches | lr 0.000134 | ms/batch 316.48 | loss  4.40 | ppl    81.095
| epoch 165 step    71850 |    346 batches | lr 0.000134 | ms/batch 316.39 | loss  4.24 | ppl    69.164
| epoch 165 step    71900 |    396 batches | lr 0.000134 | ms/batch 316.96 | loss  4.32 | ppl    74.966
| epoch 166 step    71950 |     10 batches | lr 0.000134 | ms/batch 310.80 | loss  4.40 | ppl    81.098
| epoch 166 step    72000 |     60 batches | lr 0.000134 | ms/batch 316.23 | loss  4.29 | ppl    72.688
----------------------------------------------------------------------------------------------------
| Eval 180 at step    72000 | time: 131.19s | valid loss  4.26 | valid ppl    70.690
----------------------------------------------------------------------------------------------------
| epoch 166 step    72050 |    110 batches | lr 0.000133 | ms/batch 415.37 | loss  4.29 | ppl    72.648
| epoch 166 step    72100 |    160 batches | lr 0.000133 | ms/batch 316.75 | loss  4.31 | ppl    74.773
| epoch 166 step    72150 |    210 batches | lr 0.000133 | ms/batch 316.15 | loss  4.34 | ppl    76.576
| epoch 166 step    72200 |    260 batches | lr 0.000133 | ms/batch 315.44 | loss  4.32 | ppl    75.542
| epoch 166 step    72250 |    310 batches | lr 0.000133 | ms/batch 316.01 | loss  4.34 | ppl    76.708
| epoch 166 step    72300 |    360 batches | lr 0.000132 | ms/batch 315.95 | loss  4.27 | ppl    71.182
| epoch 166 step    72350 |    410 batches | lr 0.000132 | ms/batch 315.63 | loss  4.32 | ppl    75.189
| epoch 167 step    72400 |     24 batches | lr 0.000132 | ms/batch 311.09 | loss  4.40 | ppl    81.108
----------------------------------------------------------------------------------------------------
| Eval 181 at step    72400 | time: 131.13s | valid loss  4.26 | valid ppl    70.886
----------------------------------------------------------------------------------------------------
| epoch 167 step    72450 |     74 batches | lr 0.000132 | ms/batch 429.59 | loss  4.28 | ppl    72.353
| epoch 167 step    72500 |    124 batches | lr 0.000131 | ms/batch 332.03 | loss  4.30 | ppl    73.501
| epoch 167 step    72550 |    174 batches | lr 0.000131 | ms/batch 328.94 | loss  4.32 | ppl    75.236
| epoch 167 step    72600 |    224 batches | lr 0.000131 | ms/batch 313.48 | loss  4.34 | ppl    76.504
| epoch 167 step    72650 |    274 batches | lr 0.000131 | ms/batch 314.66 | loss  4.30 | ppl    74.023
| epoch 167 step    72700 |    324 batches | lr 0.000131 | ms/batch 314.39 | loss  4.29 | ppl    72.620
| epoch 167 step    72750 |    374 batches | lr 0.00013 | ms/batch 313.68 | loss  4.30 | ppl    73.902
| epoch 167 step    72800 |    424 batches | lr 0.00013 | ms/batch 314.09 | loss  4.33 | ppl    75.843
----------------------------------------------------------------------------------------------------
| Eval 182 at step    72800 | time: 132.99s | valid loss  4.25 | valid ppl    69.784
----------------------------------------------------------------------------------------------------
| epoch 168 step    72850 |     38 batches | lr 0.00013 | ms/batch 439.58 | loss  4.34 | ppl    76.773
| epoch 168 step    72900 |     88 batches | lr 0.00013 | ms/batch 313.84 | loss  4.27 | ppl    71.499
| epoch 168 step    72950 |    138 batches | lr 0.00013 | ms/batch 313.38 | loss  4.32 | ppl    75.042
| epoch 168 step    73000 |    188 batches | lr 0.000129 | ms/batch 314.58 | loss  4.35 | ppl    77.678
| epoch 168 step    73050 |    238 batches | lr 0.000129 | ms/batch 314.32 | loss  4.34 | ppl    76.827
| epoch 168 step    73100 |    288 batches | lr 0.000129 | ms/batch 313.38 | loss  4.35 | ppl    77.709
| epoch 168 step    73150 |    338 batches | lr 0.000129 | ms/batch 313.24 | loss  4.25 | ppl    69.898
| epoch 168 step    73200 |    388 batches | lr 0.000129 | ms/batch 314.11 | loss  4.32 | ppl    75.518
----------------------------------------------------------------------------------------------------
| Eval 183 at step    73200 | time: 130.12s | valid loss  4.25 | valid ppl    70.134
----------------------------------------------------------------------------------------------------
| epoch 169 step    73250 |      2 batches | lr 0.000128 | ms/batch 406.20 | loss  4.38 | ppl    79.654
| epoch 169 step    73300 |     52 batches | lr 0.000128 | ms/batch 313.87 | loss  4.26 | ppl    70.954
| epoch 169 step    73350 |    102 batches | lr 0.000128 | ms/batch 314.91 | loss  4.28 | ppl    72.257
| epoch 169 step    73400 |    152 batches | lr 0.000128 | ms/batch 315.12 | loss  4.33 | ppl    76.259
| epoch 169 step    73450 |    202 batches | lr 0.000127 | ms/batch 314.29 | loss  4.32 | ppl    75.524
| epoch 169 step    73500 |    252 batches | lr 0.000127 | ms/batch 315.19 | loss  4.35 | ppl    77.557
| epoch 169 step    73550 |    302 batches | lr 0.000127 | ms/batch 314.12 | loss  4.34 | ppl    76.839
| epoch 169 step    73600 |    352 batches | lr 0.000127 | ms/batch 313.67 | loss  4.25 | ppl    70.418
----------------------------------------------------------------------------------------------------
| Eval 184 at step    73600 | time: 130.37s | valid loss  4.25 | valid ppl    69.873
----------------------------------------------------------------------------------------------------
| epoch 169 step    73650 |    402 batches | lr 0.000127 | ms/batch 412.36 | loss  4.32 | ppl    75.489
| epoch 170 step    73700 |     16 batches | lr 0.000126 | ms/batch 307.36 | loss  4.35 | ppl    77.161
| epoch 170 step    73750 |     66 batches | lr 0.000126 | ms/batch 313.52 | loss  4.29 | ppl    72.784
| epoch 170 step    73800 |    116 batches | lr 0.000126 | ms/batch 314.13 | loss  4.30 | ppl    73.590
| epoch 170 step    73850 |    166 batches | lr 0.000126 | ms/batch 313.22 | loss  4.30 | ppl    73.902
| epoch 170 step    73900 |    216 batches | lr 0.000126 | ms/batch 316.80 | loss  4.32 | ppl    75.077
| epoch 170 step    73950 |    266 batches | lr 0.000125 | ms/batch 316.17 | loss  4.34 | ppl    77.032
| epoch 170 step    74000 |    316 batches | lr 0.000125 | ms/batch 315.91 | loss  4.33 | ppl    75.962
----------------------------------------------------------------------------------------------------
| Eval 185 at step    74000 | time: 130.52s | valid loss  4.25 | valid ppl    70.093
----------------------------------------------------------------------------------------------------
| epoch 170 step    74050 |    366 batches | lr 0.000125 | ms/batch 415.30 | loss  4.28 | ppl    72.280
| epoch 170 step    74100 |    416 batches | lr 0.000125 | ms/batch 315.89 | loss  4.32 | ppl    74.855
| epoch 171 step    74150 |     30 batches | lr 0.000124 | ms/batch 310.45 | loss  4.36 | ppl    78.120
| epoch 171 step    74200 |     80 batches | lr 0.000124 | ms/batch 315.72 | loss  4.29 | ppl    73.052
| epoch 171 step    74250 |    130 batches | lr 0.000124 | ms/batch 315.27 | loss  4.33 | ppl    76.230
| epoch 171 step    74300 |    180 batches | lr 0.000124 | ms/batch 316.31 | loss  4.31 | ppl    74.087
| epoch 171 step    74350 |    230 batches | lr 0.000124 | ms/batch 316.29 | loss  4.36 | ppl    78.019
| epoch 171 step    74400 |    280 batches | lr 0.000123 | ms/batch 315.46 | loss  4.33 | ppl    76.170
----------------------------------------------------------------------------------------------------
| Eval 186 at step    74400 | time: 131.03s | valid loss  4.25 | valid ppl    69.925
----------------------------------------------------------------------------------------------------
| epoch 171 step    74450 |    330 batches | lr 0.000123 | ms/batch 415.25 | loss  4.27 | ppl    71.329
| epoch 171 step    74500 |    380 batches | lr 0.000123 | ms/batch 316.25 | loss  4.33 | ppl    75.784
| epoch 171 step    74550 |    430 batches | lr 0.000123 | ms/batch 316.09 | loss  4.34 | ppl    77.038
| epoch 172 step    74600 |     44 batches | lr 0.000123 | ms/batch 310.62 | loss  4.32 | ppl    75.180
| epoch 172 step    74650 |     94 batches | lr 0.000122 | ms/batch 316.33 | loss  4.28 | ppl    72.156
| epoch 172 step    74700 |    144 batches | lr 0.000122 | ms/batch 316.07 | loss  4.33 | ppl    75.968
| epoch 172 step    74750 |    194 batches | lr 0.000122 | ms/batch 316.36 | loss  4.34 | ppl    76.696
| epoch 172 step    74800 |    244 batches | lr 0.000122 | ms/batch 314.77 | loss  4.35 | ppl    77.685
----------------------------------------------------------------------------------------------------
| Eval 187 at step    74800 | time: 131.05s | valid loss  4.25 | valid ppl    69.946
----------------------------------------------------------------------------------------------------
| epoch 172 step    74850 |    294 batches | lr 0.000122 | ms/batch 412.82 | loss  4.36 | ppl    78.496
| epoch 172 step    74900 |    344 batches | lr 0.000121 | ms/batch 315.08 | loss  4.24 | ppl    69.373
| epoch 172 step    74950 |    394 batches | lr 0.000121 | ms/batch 313.05 | loss  4.33 | ppl    75.672
| epoch 173 step    75000 |      8 batches | lr 0.000121 | ms/batch 307.39 | loss  4.35 | ppl    77.327
| epoch 173 step    75050 |     58 batches | lr 0.000121 | ms/batch 313.44 | loss  4.28 | ppl    72.102
| epoch 173 step    75100 |    108 batches | lr 0.000121 | ms/batch 313.84 | loss  4.29 | ppl    72.861
| epoch 173 step    75150 |    158 batches | lr 0.00012 | ms/batch 315.16 | loss  4.31 | ppl    74.301
| epoch 173 step    75200 |    208 batches | lr 0.00012 | ms/batch 313.66 | loss  4.31 | ppl    74.679
----------------------------------------------------------------------------------------------------
| Eval 188 at step    75200 | time: 130.21s | valid loss  4.26 | valid ppl    70.615
----------------------------------------------------------------------------------------------------
| epoch 173 step    75250 |    258 batches | lr 0.00012 | ms/batch 411.80 | loss  4.33 | ppl    76.307
| epoch 173 step    75300 |    308 batches | lr 0.00012 | ms/batch 313.16 | loss  4.32 | ppl    75.433
| epoch 173 step    75350 |    358 batches | lr 0.000119 | ms/batch 313.22 | loss  4.26 | ppl    70.622
| epoch 173 step    75400 |    408 batches | lr 0.000119 | ms/batch 314.71 | loss  4.30 | ppl    73.798
| epoch 174 step    75450 |     22 batches | lr 0.000119 | ms/batch 308.12 | loss  4.35 | ppl    77.639
| epoch 174 step    75500 |     72 batches | lr 0.000119 | ms/batch 314.07 | loss  4.25 | ppl    70.059
| epoch 174 step    75550 |    122 batches | lr 0.000119 | ms/batch 314.08 | loss  4.29 | ppl    73.126
| epoch 174 step    75600 |    172 batches | lr 0.000118 | ms/batch 314.07 | loss  4.32 | ppl    75.294
----------------------------------------------------------------------------------------------------
| Eval 189 at step    75600 | time: 130.17s | valid loss  4.24 | valid ppl    69.686
----------------------------------------------------------------------------------------------------
| epoch 174 step    75650 |    222 batches | lr 0.000118 | ms/batch 547.44 | loss  4.35 | ppl    77.182
| epoch 174 step    75700 |    272 batches | lr 0.000118 | ms/batch 313.22 | loss  4.34 | ppl    76.618
| epoch 174 step    75750 |    322 batches | lr 0.000118 | ms/batch 313.81 | loss  4.29 | ppl    73.163
| epoch 174 step    75800 |    372 batches | lr 0.000118 | ms/batch 314.28 | loss  4.27 | ppl    71.838
| epoch 174 step    75850 |    422 batches | lr 0.000117 | ms/batch 313.70 | loss  4.29 | ppl    73.309
| epoch 175 step    75900 |     36 batches | lr 0.000117 | ms/batch 307.43 | loss  4.32 | ppl    75.560
| epoch 175 step    75950 |     86 batches | lr 0.000117 | ms/batch 315.07 | loss  4.26 | ppl    70.713
| epoch 175 step    76000 |    136 batches | lr 0.000117 | ms/batch 316.60 | loss  4.31 | ppl    74.139
----------------------------------------------------------------------------------------------------
| Eval 190 at step    76000 | time: 130.48s | valid loss  4.24 | valid ppl    69.388
----------------------------------------------------------------------------------------------------
| epoch 175 step    76050 |    186 batches | lr 0.000117 | ms/batch 446.75 | loss  4.31 | ppl    74.287
| epoch 175 step    76100 |    236 batches | lr 0.000116 | ms/batch 315.47 | loss  4.33 | ppl    76.027
| epoch 175 step    76150 |    286 batches | lr 0.000116 | ms/batch 315.03 | loss  4.35 | ppl    77.849
| epoch 175 step    76200 |    336 batches | lr 0.000116 | ms/batch 316.04 | loss  4.24 | ppl    69.248
| epoch 175 step    76250 |    386 batches | lr 0.000116 | ms/batch 315.63 | loss  4.32 | ppl    75.001
| epoch 175 step    76300 |    436 batches | lr 0.000116 | ms/batch 311.13 | loss  4.33 | ppl    76.274
| epoch 176 step    76350 |     50 batches | lr 0.000115 | ms/batch 312.82 | loss  4.30 | ppl    73.683
| epoch 176 step    76400 |    100 batches | lr 0.000115 | ms/batch 314.03 | loss  4.25 | ppl    70.193
----------------------------------------------------------------------------------------------------
| Eval 191 at step    76400 | time: 130.69s | valid loss  4.25 | valid ppl    70.110
----------------------------------------------------------------------------------------------------
| epoch 176 step    76450 |    150 batches | lr 0.000115 | ms/batch 410.86 | loss  4.28 | ppl    71.936
| epoch 176 step    76500 |    200 batches | lr 0.000115 | ms/batch 315.03 | loss  4.33 | ppl    75.784
| epoch 176 step    76550 |    250 batches | lr 0.000114 | ms/batch 315.66 | loss  4.32 | ppl    75.371
| epoch 176 step    76600 |    300 batches | lr 0.000114 | ms/batch 315.47 | loss  4.35 | ppl    77.140
| epoch 176 step    76650 |    350 batches | lr 0.000114 | ms/batch 315.90 | loss  4.23 | ppl    68.827
| epoch 176 step    76700 |    400 batches | lr 0.000114 | ms/batch 316.14 | loss  4.29 | ppl    72.932
| epoch 177 step    76750 |     14 batches | lr 0.000114 | ms/batch 310.23 | loss  4.35 | ppl    77.642
| epoch 177 step    76800 |     64 batches | lr 0.000113 | ms/batch 316.23 | loss  4.25 | ppl    70.270
----------------------------------------------------------------------------------------------------
| Eval 192 at step    76800 | time: 130.85s | valid loss  4.25 | valid ppl    69.897
----------------------------------------------------------------------------------------------------
| epoch 177 step    76850 |    114 batches | lr 0.000113 | ms/batch 413.52 | loss  4.27 | ppl    71.193
| epoch 177 step    76900 |    164 batches | lr 0.000113 | ms/batch 313.90 | loss  4.29 | ppl    73.332
| epoch 177 step    76950 |    214 batches | lr 0.000113 | ms/batch 314.00 | loss  4.33 | ppl    75.820
| epoch 177 step    77000 |    264 batches | lr 0.000113 | ms/batch 313.98 | loss  4.31 | ppl    74.150
| epoch 177 step    77050 |    314 batches | lr 0.000112 | ms/batch 314.24 | loss  4.34 | ppl    76.942
| epoch 177 step    77100 |    364 batches | lr 0.000112 | ms/batch 313.08 | loss  4.24 | ppl    69.408
| epoch 177 step    77150 |    414 batches | lr 0.000112 | ms/batch 312.83 | loss  4.29 | ppl    72.668
| epoch 178 step    77200 |     28 batches | lr 0.000112 | ms/batch 307.89 | loss  4.33 | ppl    76.209
----------------------------------------------------------------------------------------------------
| Eval 193 at step    77200 | time: 130.12s | valid loss  4.25 | valid ppl    70.071
----------------------------------------------------------------------------------------------------
| epoch 178 step    77250 |     78 batches | lr 0.000112 | ms/batch 413.24 | loss  4.25 | ppl    70.141
| epoch 178 step    77300 |    128 batches | lr 0.000111 | ms/batch 315.89 | loss  4.29 | ppl    73.218
| epoch 178 step    77350 |    178 batches | lr 0.000111 | ms/batch 315.84 | loss  4.27 | ppl    71.796
| epoch 178 step    77400 |    228 batches | lr 0.000111 | ms/batch 313.35 | loss  4.31 | ppl    74.162
| epoch 178 step    77450 |    278 batches | lr 0.000111 | ms/batch 313.44 | loss  4.34 | ppl    76.996
| epoch 178 step    77500 |    328 batches | lr 0.000111 | ms/batch 314.23 | loss  4.25 | ppl    70.319
| epoch 178 step    77550 |    378 batches | lr 0.00011 | ms/batch 313.37 | loss  4.30 | ppl    73.516
| epoch 178 step    77600 |    428 batches | lr 0.00011 | ms/batch 313.13 | loss  4.30 | ppl    74.006
----------------------------------------------------------------------------------------------------
| Eval 194 at step    77600 | time: 130.64s | valid loss  4.24 | valid ppl    69.187
----------------------------------------------------------------------------------------------------
| epoch 179 step    77650 |     42 batches | lr 0.00011 | ms/batch 441.96 | loss  4.31 | ppl    74.537
| epoch 179 step    77700 |     92 batches | lr 0.00011 | ms/batch 314.74 | loss  4.24 | ppl    69.734
| epoch 179 step    77750 |    142 batches | lr 0.00011 | ms/batch 315.04 | loss  4.32 | ppl    74.890
| epoch 179 step    77800 |    192 batches | lr 0.000109 | ms/batch 315.83 | loss  4.28 | ppl    72.393
| epoch 179 step    77850 |    242 batches | lr 0.000109 | ms/batch 317.00 | loss  4.34 | ppl    76.761
| epoch 179 step    77900 |    292 batches | lr 0.000109 | ms/batch 316.12 | loss  4.38 | ppl    79.645
| epoch 179 step    77950 |    342 batches | lr 0.000109 | ms/batch 315.80 | loss  4.20 | ppl    66.992
| epoch 179 step    78000 |    392 batches | lr 0.000109 | ms/batch 315.11 | loss  4.29 | ppl    73.066
----------------------------------------------------------------------------------------------------
| Eval 195 at step    78000 | time: 130.84s | valid loss  4.25 | valid ppl    69.984
----------------------------------------------------------------------------------------------------
| epoch 180 step    78050 |      6 batches | lr 0.000108 | ms/batch 405.69 | loss  4.32 | ppl    75.112
| epoch 180 step    78100 |     56 batches | lr 0.000108 | ms/batch 313.10 | loss  4.26 | ppl    70.871
| epoch 180 step    78150 |    106 batches | lr 0.000108 | ms/batch 313.87 | loss  4.29 | ppl    72.608
| epoch 180 step    78200 |    156 batches | lr 0.000108 | ms/batch 314.12 | loss  4.29 | ppl    72.614
| epoch 180 step    78250 |    206 batches | lr 0.000108 | ms/batch 313.33 | loss  4.29 | ppl    73.298
| epoch 180 step    78300 |    256 batches | lr 0.000107 | ms/batch 314.94 | loss  4.30 | ppl    73.700
| epoch 180 step    78350 |    306 batches | lr 0.000107 | ms/batch 313.85 | loss  4.32 | ppl    75.483
| epoch 180 step    78400 |    356 batches | lr 0.000107 | ms/batch 313.55 | loss  4.23 | ppl    68.696
----------------------------------------------------------------------------------------------------
| Eval 196 at step    78400 | time: 130.15s | valid loss  4.23 | valid ppl    68.988
----------------------------------------------------------------------------------------------------
| epoch 180 step    78450 |    406 batches | lr 0.000107 | ms/batch 449.10 | loss  4.28 | ppl    71.976
| epoch 181 step    78500 |     20 batches | lr 0.000107 | ms/batch 308.69 | loss  4.32 | ppl    74.989
| epoch 181 step    78550 |     70 batches | lr 0.000106 | ms/batch 317.39 | loss  4.25 | ppl    70.275
| epoch 181 step    78600 |    120 batches | lr 0.000106 | ms/batch 316.02 | loss  4.27 | ppl    71.762
| epoch 181 step    78650 |    170 batches | lr 0.000106 | ms/batch 315.36 | loss  4.24 | ppl    69.603
| epoch 181 step    78700 |    220 batches | lr 0.000106 | ms/batch 316.40 | loss  4.33 | ppl    76.313
| epoch 181 step    78750 |    270 batches | lr 0.000105 | ms/batch 315.03 | loss  4.31 | ppl    74.313
| epoch 181 step    78800 |    320 batches | lr 0.000105 | ms/batch 313.27 | loss  4.27 | ppl    71.248
----------------------------------------------------------------------------------------------------
| Eval 197 at step    78800 | time: 130.84s | valid loss  4.24 | valid ppl    69.526
----------------------------------------------------------------------------------------------------
| epoch 181 step    78850 |    370 batches | lr 0.000105 | ms/batch 412.23 | loss  4.23 | ppl    69.056
| epoch 181 step    78900 |    420 batches | lr 0.000105 | ms/batch 314.01 | loss  4.28 | ppl    72.552
| epoch 182 step    78950 |     34 batches | lr 0.000105 | ms/batch 308.39 | loss  4.30 | ppl    74.020
| epoch 182 step    79000 |     84 batches | lr 0.000104 | ms/batch 313.00 | loss  4.23 | ppl    68.766
| epoch 182 step    79050 |    134 batches | lr 0.000104 | ms/batch 313.60 | loss  4.29 | ppl    73.212
| epoch 182 step    79100 |    184 batches | lr 0.000104 | ms/batch 313.83 | loss  4.31 | ppl    74.087
| epoch 182 step    79150 |    234 batches | lr 0.000104 | ms/batch 314.20 | loss  4.33 | ppl    75.678
| epoch 182 step    79200 |    284 batches | lr 0.000104 | ms/batch 313.10 | loss  4.32 | ppl    75.542
----------------------------------------------------------------------------------------------------
| Eval 198 at step    79200 | time: 130.12s | valid loss  4.24 | valid ppl    69.186
----------------------------------------------------------------------------------------------------
| epoch 182 step    79250 |    334 batches | lr 0.000103 | ms/batch 412.37 | loss  4.20 | ppl    66.788
| epoch 182 step    79300 |    384 batches | lr 0.000103 | ms/batch 314.22 | loss  4.27 | ppl    71.751
| epoch 182 step    79350 |    434 batches | lr 0.000103 | ms/batch 314.36 | loss  4.33 | ppl    76.185
| epoch 183 step    79400 |     48 batches | lr 0.000103 | ms/batch 309.05 | loss  4.24 | ppl    69.723
| epoch 183 step    79450 |     98 batches | lr 0.000103 | ms/batch 315.05 | loss  4.25 | ppl    69.761
| epoch 183 step    79500 |    148 batches | lr 0.000102 | ms/batch 315.31 | loss  4.30 | ppl    73.832
| epoch 183 step    79550 |    198 batches | lr 0.000102 | ms/batch 314.61 | loss  4.31 | ppl    74.272
| epoch 183 step    79600 |    248 batches | lr 0.000102 | ms/batch 315.12 | loss  4.29 | ppl    73.158
----------------------------------------------------------------------------------------------------
| Eval 199 at step    79600 | time: 130.52s | valid loss  4.23 | valid ppl    69.060
----------------------------------------------------------------------------------------------------
| epoch 183 step    79650 |    298 batches | lr 0.000102 | ms/batch 413.30 | loss  4.32 | ppl    75.459
| epoch 183 step    79700 |    348 batches | lr 0.000102 | ms/batch 313.64 | loss  4.21 | ppl    67.236
| epoch 183 step    79750 |    398 batches | lr 0.000101 | ms/batch 313.46 | loss  4.28 | ppl    72.501
| epoch 184 step    79800 |     12 batches | lr 0.000101 | ms/batch 310.03 | loss  4.33 | ppl    76.224
| epoch 184 step    79850 |     62 batches | lr 0.000101 | ms/batch 315.87 | loss  4.23 | ppl    68.900
| epoch 184 step    79900 |    112 batches | lr 0.000101 | ms/batch 316.17 | loss  4.28 | ppl    72.032
| epoch 184 step    79950 |    162 batches | lr 0.000101 | ms/batch 312.85 | loss  4.27 | ppl    71.712
| epoch 184 step    80000 |    212 batches | lr 0.0001 | ms/batch 313.28 | loss  4.32 | ppl    75.424
----------------------------------------------------------------------------------------------------
| Eval 200 at step    80000 | time: 130.42s | valid loss  4.24 | valid ppl    69.316
----------------------------------------------------------------------------------------------------
| epoch 184 step    80050 |    262 batches | lr 0.0001 | ms/batch 412.18 | loss  4.31 | ppl    74.133
| epoch 184 step    80100 |    312 batches | lr 0.0001 | ms/batch 313.71 | loss  4.30 | ppl    73.375
| epoch 184 step    80150 |    362 batches | lr 9.99e-05 | ms/batch 312.87 | loss  4.22 | ppl    67.933
| epoch 184 step    80200 |    412 batches | lr 9.97e-05 | ms/batch 314.02 | loss  4.29 | ppl    72.699
| epoch 185 step    80250 |     26 batches | lr 9.95e-05 | ms/batch 307.99 | loss  4.32 | ppl    75.121
| epoch 185 step    80300 |     76 batches | lr 9.93e-05 | ms/batch 312.09 | loss  4.23 | ppl    68.433
| epoch 185 step    80350 |    126 batches | lr 9.91e-05 | ms/batch 313.56 | loss  4.31 | ppl    74.194
| epoch 185 step    80400 |    176 batches | lr 9.89e-05 | ms/batch 314.12 | loss  4.27 | ppl    71.774
----------------------------------------------------------------------------------------------------
| Eval 201 at step    80400 | time: 130.02s | valid loss  4.24 | valid ppl    69.702
----------------------------------------------------------------------------------------------------
| epoch 185 step    80450 |    226 batches | lr 9.87e-05 | ms/batch 413.32 | loss  4.31 | ppl    74.598
| epoch 185 step    80500 |    276 batches | lr 9.85e-05 | ms/batch 316.14 | loss  4.30 | ppl    73.879
| epoch 185 step    80550 |    326 batches | lr 9.83e-05 | ms/batch 316.60 | loss  4.23 | ppl    68.465
| epoch 185 step    80600 |    376 batches | lr 9.81e-05 | ms/batch 319.87 | loss  4.25 | ppl    70.056
| epoch 185 step    80650 |    426 batches | lr 9.79e-05 | ms/batch 326.50 | loss  4.28 | ppl    72.557
| epoch 186 step    80700 |     40 batches | lr 9.77e-05 | ms/batch 309.54 | loss  4.29 | ppl    72.984
| epoch 186 step    80750 |     90 batches | lr 9.75e-05 | ms/batch 314.34 | loss  4.21 | ppl    67.307
| epoch 186 step    80800 |    140 batches | lr 9.73e-05 | ms/batch 314.77 | loss  4.26 | ppl    70.854
----------------------------------------------------------------------------------------------------
| Eval 202 at step    80800 | time: 131.58s | valid loss  4.24 | valid ppl    69.393
----------------------------------------------------------------------------------------------------
| epoch 186 step    80850 |    190 batches | lr 9.71e-05 | ms/batch 413.72 | loss  4.28 | ppl    71.880
| epoch 186 step    80900 |    240 batches | lr 9.69e-05 | ms/batch 315.18 | loss  4.30 | ppl    74.000
| epoch 186 step    80950 |    290 batches | lr 9.67e-05 | ms/batch 315.45 | loss  4.32 | ppl    75.159
| epoch 186 step    81000 |    340 batches | lr 9.65e-05 | ms/batch 313.28 | loss  4.17 | ppl    64.690
| epoch 186 step    81050 |    390 batches | lr 9.63e-05 | ms/batch 316.33 | loss  4.29 | ppl    72.892
| epoch 187 step    81100 |      4 batches | lr 9.61e-05 | ms/batch 309.03 | loss  4.30 | ppl    73.752
| epoch 187 step    81150 |     54 batches | lr 9.59e-05 | ms/batch 314.86 | loss  4.24 | ppl    69.091
| epoch 187 step    81200 |    104 batches | lr 9.57e-05 | ms/batch 314.57 | loss  4.25 | ppl    70.264
----------------------------------------------------------------------------------------------------
| Eval 203 at step    81200 | time: 130.63s | valid loss  4.25 | valid ppl    70.395
----------------------------------------------------------------------------------------------------
| epoch 187 step    81250 |    154 batches | lr 9.56e-05 | ms/batch 412.64 | loss  4.29 | ppl    73.292
| epoch 187 step    81300 |    204 batches | lr 9.54e-05 | ms/batch 314.58 | loss  4.29 | ppl    73.304
| epoch 187 step    81350 |    254 batches | lr 9.52e-05 | ms/batch 313.89 | loss  4.32 | ppl    75.306
| epoch 187 step    81400 |    304 batches | lr 9.5e-05 | ms/batch 313.16 | loss  4.32 | ppl    75.560
| epoch 187 step    81450 |    354 batches | lr 9.48e-05 | ms/batch 314.72 | loss  4.20 | ppl    66.694
| epoch 187 step    81500 |    404 batches | lr 9.46e-05 | ms/batch 313.72 | loss  4.26 | ppl    70.810
| epoch 188 step    81550 |     18 batches | lr 9.44e-05 | ms/batch 306.64 | loss  4.31 | ppl    74.124
| epoch 188 step    81600 |     68 batches | lr 9.42e-05 | ms/batch 313.05 | loss  4.22 | ppl    67.893
----------------------------------------------------------------------------------------------------
| Eval 204 at step    81600 | time: 130.08s | valid loss  4.24 | valid ppl    69.390
----------------------------------------------------------------------------------------------------
| epoch 188 step    81650 |    118 batches | lr 9.4e-05 | ms/batch 411.94 | loss  4.29 | ppl    72.767
| epoch 188 step    81700 |    168 batches | lr 9.38e-05 | ms/batch 313.74 | loss  4.27 | ppl    71.404
| epoch 188 step    81750 |    218 batches | lr 9.36e-05 | ms/batch 313.38 | loss  4.28 | ppl    72.145
| epoch 188 step    81800 |    268 batches | lr 9.34e-05 | ms/batch 313.10 | loss  4.28 | ppl    72.566
| epoch 188 step    81850 |    318 batches | lr 9.32e-05 | ms/batch 314.77 | loss  4.27 | ppl    71.619
| epoch 188 step    81900 |    368 batches | lr 9.3e-05 | ms/batch 315.02 | loss  4.22 | ppl    68.262
| epoch 188 step    81950 |    418 batches | lr 9.28e-05 | ms/batch 314.39 | loss  4.31 | ppl    74.391
| epoch 189 step    82000 |     32 batches | lr 9.26e-05 | ms/batch 309.57 | loss  4.28 | ppl    71.880
----------------------------------------------------------------------------------------------------
| Eval 205 at step    82000 | time: 130.34s | valid loss  4.24 | valid ppl    69.610
----------------------------------------------------------------------------------------------------
| epoch 189 step    82050 |     82 batches | lr 9.24e-05 | ms/batch 413.33 | loss  4.23 | ppl    68.747
| epoch 189 step    82100 |    132 batches | lr 9.22e-05 | ms/batch 314.24 | loss  4.27 | ppl    71.243
| epoch 189 step    82150 |    182 batches | lr 9.2e-05 | ms/batch 315.46 | loss  4.26 | ppl    70.749
| epoch 189 step    82200 |    232 batches | lr 9.19e-05 | ms/batch 315.24 | loss  4.31 | ppl    74.133
| epoch 189 step    82250 |    282 batches | lr 9.17e-05 | ms/batch 314.99 | loss  4.30 | ppl    73.654
| epoch 189 step    82300 |    332 batches | lr 9.15e-05 | ms/batch 314.45 | loss  4.21 | ppl    67.280
| epoch 189 step    82350 |    382 batches | lr 9.13e-05 | ms/batch 314.81 | loss  4.24 | ppl    69.685
| epoch 189 step    82400 |    432 batches | lr 9.11e-05 | ms/batch 314.30 | loss  4.26 | ppl    71.104
----------------------------------------------------------------------------------------------------
| Eval 206 at step    82400 | time: 130.85s | valid loss  4.23 | valid ppl    68.853
----------------------------------------------------------------------------------------------------
| epoch 190 step    82450 |     46 batches | lr 9.09e-05 | ms/batch 440.33 | loss  4.25 | ppl    70.018
| epoch 190 step    82500 |     96 batches | lr 9.07e-05 | ms/batch 314.25 | loss  4.23 | ppl    68.439
| epoch 190 step    82550 |    146 batches | lr 9.05e-05 | ms/batch 315.05 | loss  4.27 | ppl    71.421
| epoch 190 step    82600 |    196 batches | lr 9.03e-05 | ms/batch 314.36 | loss  4.29 | ppl    72.909
| epoch 190 step    82650 |    246 batches | lr 9.01e-05 | ms/batch 314.05 | loss  4.31 | ppl    74.231
| epoch 190 step    82700 |    296 batches | lr 8.99e-05 | ms/batch 314.85 | loss  4.32 | ppl    75.289
| epoch 190 step    82750 |    346 batches | lr 8.97e-05 | ms/batch 314.70 | loss  4.19 | ppl    66.051
| epoch 190 step    82800 |    396 batches | lr 8.95e-05 | ms/batch 316.07 | loss  4.26 | ppl    70.921
----------------------------------------------------------------------------------------------------
| Eval 207 at step    82800 | time: 130.59s | valid loss  4.23 | valid ppl    69.058
----------------------------------------------------------------------------------------------------
| epoch 191 step    82850 |     10 batches | lr 8.93e-05 | ms/batch 408.91 | loss  4.31 | ppl    74.487
| epoch 191 step    82900 |     60 batches | lr 8.92e-05 | ms/batch 314.99 | loss  4.23 | ppl    68.674
| epoch 191 step    82950 |    110 batches | lr 8.9e-05 | ms/batch 316.24 | loss  4.23 | ppl    68.846
| epoch 191 step    83000 |    160 batches | lr 8.88e-05 | ms/batch 315.72 | loss  4.28 | ppl    72.066
| epoch 191 step    83050 |    210 batches | lr 8.86e-05 | ms/batch 314.94 | loss  4.28 | ppl    72.365
| epoch 191 step    83100 |    260 batches | lr 8.84e-05 | ms/batch 314.72 | loss  4.32 | ppl    74.895
| epoch 191 step    83150 |    310 batches | lr 8.82e-05 | ms/batch 313.61 | loss  4.28 | ppl    72.506
| epoch 191 step    83200 |    360 batches | lr 8.8e-05 | ms/batch 312.36 | loss  4.22 | ppl    68.026
----------------------------------------------------------------------------------------------------
| Eval 208 at step    83200 | time: 130.52s | valid loss  4.23 | valid ppl    69.018
----------------------------------------------------------------------------------------------------
| epoch 191 step    83250 |    410 batches | lr 8.78e-05 | ms/batch 411.22 | loss  4.24 | ppl    69.413
| epoch 192 step    83300 |     24 batches | lr 8.76e-05 | ms/batch 307.92 | loss  4.30 | ppl    73.415
| epoch 192 step    83350 |     74 batches | lr 8.74e-05 | ms/batch 313.02 | loss  4.23 | ppl    68.626
| epoch 192 step    83400 |    124 batches | lr 8.72e-05 | ms/batch 313.03 | loss  4.26 | ppl    70.506
| epoch 192 step    83450 |    174 batches | lr 8.71e-05 | ms/batch 313.10 | loss  4.27 | ppl    71.734
| epoch 192 step    83500 |    224 batches | lr 8.69e-05 | ms/batch 313.85 | loss  4.31 | ppl    74.255
| epoch 192 step    83550 |    274 batches | lr 8.67e-05 | ms/batch 314.93 | loss  4.28 | ppl    72.424
| epoch 192 step    83600 |    324 batches | lr 8.65e-05 | ms/batch 327.17 | loss  4.23 | ppl    68.860
----------------------------------------------------------------------------------------------------
| Eval 209 at step    83600 | time: 130.77s | valid loss  4.23 | valid ppl    69.043
----------------------------------------------------------------------------------------------------
| epoch 192 step    83650 |    374 batches | lr 8.63e-05 | ms/batch 414.38 | loss  4.25 | ppl    69.767
| epoch 192 step    83700 |    424 batches | lr 8.61e-05 | ms/batch 314.68 | loss  4.24 | ppl    69.145
| epoch 193 step    83750 |     38 batches | lr 8.59e-05 | ms/batch 308.83 | loss  4.25 | ppl    69.873
| epoch 193 step    83800 |     88 batches | lr 8.57e-05 | ms/batch 316.23 | loss  4.22 | ppl    67.983
| epoch 193 step    83850 |    138 batches | lr 8.55e-05 | ms/batch 315.55 | loss  4.31 | ppl    74.301
| epoch 193 step    83900 |    188 batches | lr 8.54e-05 | ms/batch 314.23 | loss  4.26 | ppl    70.600
| epoch 193 step    83950 |    238 batches | lr 8.52e-05 | ms/batch 314.49 | loss  4.29 | ppl    73.166
| epoch 193 step    84000 |    288 batches | lr 8.5e-05 | ms/batch 314.82 | loss  4.32 | ppl    75.554
----------------------------------------------------------------------------------------------------
| Eval 210 at step    84000 | time: 130.64s | valid loss  4.24 | valid ppl    69.079
----------------------------------------------------------------------------------------------------
| epoch 193 step    84050 |    338 batches | lr 8.48e-05 | ms/batch 414.45 | loss  4.20 | ppl    66.785
| epoch 193 step    84100 |    388 batches | lr 8.46e-05 | ms/batch 315.17 | loss  4.26 | ppl    70.710
| epoch 194 step    84150 |      2 batches | lr 8.44e-05 | ms/batch 307.71 | loss  4.28 | ppl    72.052
| epoch 194 step    84200 |     52 batches | lr 8.42e-05 | ms/batch 314.49 | loss  4.25 | ppl    70.056
| epoch 194 step    84250 |    102 batches | lr 8.4e-05 | ms/batch 315.91 | loss  4.20 | ppl    66.942
| epoch 194 step    84300 |    152 batches | lr 8.38e-05 | ms/batch 315.82 | loss  4.25 | ppl    69.878
| epoch 194 step    84350 |    202 batches | lr 8.37e-05 | ms/batch 316.47 | loss  4.30 | ppl    73.683
| epoch 194 step    84400 |    252 batches | lr 8.35e-05 | ms/batch 317.37 | loss  4.30 | ppl    73.407
----------------------------------------------------------------------------------------------------
| Eval 211 at step    84400 | time: 130.86s | valid loss  4.23 | valid ppl    68.825
----------------------------------------------------------------------------------------------------
| epoch 194 step    84450 |    302 batches | lr 8.33e-05 | ms/batch 447.34 | loss  4.28 | ppl    72.501
| epoch 194 step    84500 |    352 batches | lr 8.31e-05 | ms/batch 315.42 | loss  4.19 | ppl    66.250
| epoch 194 step    84550 |    402 batches | lr 8.29e-05 | ms/batch 313.79 | loss  4.24 | ppl    69.289
| epoch 195 step    84600 |     16 batches | lr 8.27e-05 | ms/batch 308.46 | loss  4.30 | ppl    73.904
| epoch 195 step    84650 |     66 batches | lr 8.25e-05 | ms/batch 313.78 | loss  4.22 | ppl    68.113
| epoch 195 step    84700 |    116 batches | lr 8.23e-05 | ms/batch 313.06 | loss  4.25 | ppl    70.264
| epoch 195 step    84750 |    166 batches | lr 8.22e-05 | ms/batch 315.53 | loss  4.25 | ppl    70.231
| epoch 195 step    84800 |    216 batches | lr 8.2e-05 | ms/batch 315.77 | loss  4.29 | ppl    72.756
----------------------------------------------------------------------------------------------------
| Eval 212 at step    84800 | time: 130.41s | valid loss  4.24 | valid ppl    69.191
----------------------------------------------------------------------------------------------------
| epoch 195 step    84850 |    266 batches | lr 8.18e-05 | ms/batch 412.09 | loss  4.25 | ppl    69.851
| epoch 195 step    84900 |    316 batches | lr 8.16e-05 | ms/batch 313.69 | loss  4.25 | ppl    69.971
| epoch 195 step    84950 |    366 batches | lr 8.14e-05 | ms/batch 312.93 | loss  4.22 | ppl    67.975
| epoch 195 step    85000 |    416 batches | lr 8.12e-05 | ms/batch 314.19 | loss  4.24 | ppl    69.259
| epoch 196 step    85050 |     30 batches | lr 8.1e-05 | ms/batch 308.23 | loss  4.29 | ppl    73.298
| epoch 196 step    85100 |     80 batches | lr 8.09e-05 | ms/batch 313.11 | loss  4.20 | ppl    66.887
| epoch 196 step    85150 |    130 batches | lr 8.07e-05 | ms/batch 314.17 | loss  4.23 | ppl    68.497
| epoch 196 step    85200 |    180 batches | lr 8.05e-05 | ms/batch 314.00 | loss  4.25 | ppl    70.122
----------------------------------------------------------------------------------------------------
| Eval 213 at step    85200 | time: 130.14s | valid loss  4.22 | valid ppl    68.337
----------------------------------------------------------------------------------------------------
| epoch 196 step    85250 |    230 batches | lr 8.03e-05 | ms/batch 444.68 | loss  4.27 | ppl    71.656
| epoch 196 step    85300 |    280 batches | lr 8.01e-05 | ms/batch 313.28 | loss  4.27 | ppl    71.204
| epoch 196 step    85350 |    330 batches | lr 7.99e-05 | ms/batch 314.10 | loss  4.21 | ppl    67.401
| epoch 196 step    85400 |    380 batches | lr 7.97e-05 | ms/batch 313.38 | loss  4.25 | ppl    69.988
| epoch 196 step    85450 |    430 batches | lr 7.96e-05 | ms/batch 313.89 | loss  4.29 | ppl    72.904
| epoch 197 step    85500 |     44 batches | lr 7.94e-05 | ms/batch 308.05 | loss  4.25 | ppl    70.374
| epoch 197 step    85550 |     94 batches | lr 7.92e-05 | ms/batch 315.03 | loss  4.20 | ppl    66.728
| epoch 197 step    85600 |    144 batches | lr 7.9e-05 | ms/batch 315.05 | loss  4.23 | ppl    68.868
----------------------------------------------------------------------------------------------------
| Eval 214 at step    85600 | time: 130.26s | valid loss  4.23 | valid ppl    68.549
----------------------------------------------------------------------------------------------------
| epoch 197 step    85650 |    194 batches | lr 7.88e-05 | ms/batch 411.66 | loss  4.24 | ppl    69.527
| epoch 197 step    85700 |    244 batches | lr 7.86e-05 | ms/batch 314.68 | loss  4.27 | ppl    71.760
| epoch 197 step    85750 |    294 batches | lr 7.85e-05 | ms/batch 314.40 | loss  4.31 | ppl    74.260
| epoch 197 step    85800 |    344 batches | lr 7.83e-05 | ms/batch 313.32 | loss  4.18 | ppl    65.254
| epoch 197 step    85850 |    394 batches | lr 7.81e-05 | ms/batch 314.26 | loss  4.23 | ppl    68.940
| epoch 198 step    85900 |      8 batches | lr 7.79e-05 | ms/batch 308.01 | loss  4.30 | ppl    73.352
| epoch 198 step    85950 |     58 batches | lr 7.77e-05 | ms/batch 314.00 | loss  4.22 | ppl    68.230
| epoch 198 step    86000 |    108 batches | lr 7.75e-05 | ms/batch 314.74 | loss  4.23 | ppl    68.669
----------------------------------------------------------------------------------------------------
| Eval 215 at step    86000 | time: 130.26s | valid loss  4.23 | valid ppl    68.967
----------------------------------------------------------------------------------------------------
| epoch 198 step    86050 |    158 batches | lr 7.74e-05 | ms/batch 412.52 | loss  4.26 | ppl    71.165
| epoch 198 step    86100 |    208 batches | lr 7.72e-05 | ms/batch 314.73 | loss  4.24 | ppl    69.251
| epoch 198 step    86150 |    258 batches | lr 7.7e-05 | ms/batch 314.50 | loss  4.29 | ppl    72.645
| epoch 198 step    86200 |    308 batches | lr 7.68e-05 | ms/batch 313.29 | loss  4.27 | ppl    71.485
| epoch 198 step    86250 |    358 batches | lr 7.66e-05 | ms/batch 314.54 | loss  4.21 | ppl    67.530
| epoch 198 step    86300 |    408 batches | lr 7.65e-05 | ms/batch 315.48 | loss  4.24 | ppl    69.408
| epoch 199 step    86350 |     22 batches | lr 7.63e-05 | ms/batch 307.38 | loss  4.27 | ppl    71.703
| epoch 199 step    86400 |     72 batches | lr 7.61e-05 | ms/batch 314.99 | loss  4.21 | ppl    67.623
----------------------------------------------------------------------------------------------------
| Eval 216 at step    86400 | time: 130.37s | valid loss  4.23 | valid ppl    68.691
----------------------------------------------------------------------------------------------------
| epoch 199 step    86450 |    122 batches | lr 7.59e-05 | ms/batch 414.09 | loss  4.26 | ppl    71.120
| epoch 199 step    86500 |    172 batches | lr 7.57e-05 | ms/batch 317.86 | loss  4.25 | ppl    70.223
| epoch 199 step    86550 |    222 batches | lr 7.55e-05 | ms/batch 321.68 | loss  4.26 | ppl    70.716
| epoch 199 step    86600 |    272 batches | lr 7.54e-05 | ms/batch 318.63 | loss  4.28 | ppl    71.889
| epoch 199 step    86650 |    322 batches | lr 7.52e-05 | ms/batch 320.31 | loss  4.24 | ppl    69.370
| epoch 199 step    86700 |    372 batches | lr 7.5e-05 | ms/batch 320.08 | loss  4.20 | ppl    66.611
| epoch 199 step    86750 |    422 batches | lr 7.48e-05 | ms/batch 319.34 | loss  4.23 | ppl    68.814
| epoch 200 step    86800 |     36 batches | lr 7.46e-05 | ms/batch 313.98 | loss  4.25 | ppl    70.196
----------------------------------------------------------------------------------------------------
| Eval 217 at step    86800 | time: 132.30s | valid loss  4.23 | valid ppl    68.744
----------------------------------------------------------------------------------------------------
| epoch 200 step    86850 |     86 batches | lr 7.45e-05 | ms/batch 414.46 | loss  4.20 | ppl    66.543
| epoch 200 step    86900 |    136 batches | lr 7.43e-05 | ms/batch 314.28 | loss  4.27 | ppl    71.505
| epoch 200 step    86950 |    186 batches | lr 7.41e-05 | ms/batch 317.42 | loss  4.24 | ppl    69.666
| epoch 200 step    87000 |    236 batches | lr 7.39e-05 | ms/batch 315.13 | loss  4.26 | ppl    70.697
| epoch 200 step    87050 |    286 batches | lr 7.37e-05 | ms/batch 316.31 | loss  4.28 | ppl    72.580
| epoch 200 step    87100 |    336 batches | lr 7.36e-05 | ms/batch 315.55 | loss  4.16 | ppl    63.912
| epoch 200 step    87150 |    386 batches | lr 7.34e-05 | ms/batch 316.79 | loss  4.25 | ppl    70.284
| epoch 200 step    87200 |    436 batches | lr 7.32e-05 | ms/batch 312.12 | loss  4.27 | ppl    71.438
----------------------------------------------------------------------------------------------------
| Eval 218 at step    87200 | time: 131.11s | valid loss  4.22 | valid ppl    67.951
----------------------------------------------------------------------------------------------------
| epoch 201 step    87250 |     50 batches | lr 7.3e-05 | ms/batch 461.41 | loss  4.20 | ppl    66.950
| epoch 201 step    87300 |    100 batches | lr 7.29e-05 | ms/batch 316.30 | loss  4.22 | ppl    67.991
| epoch 201 step    87350 |    150 batches | lr 7.27e-05 | ms/batch 316.45 | loss  4.25 | ppl    69.859
| epoch 201 step    87400 |    200 batches | lr 7.25e-05 | ms/batch 317.26 | loss  4.23 | ppl    68.717
| epoch 201 step    87450 |    250 batches | lr 7.23e-05 | ms/batch 316.48 | loss  4.28 | ppl    72.410
| epoch 201 step    87500 |    300 batches | lr 7.21e-05 | ms/batch 316.10 | loss  4.30 | ppl    73.757
| epoch 201 step    87550 |    350 batches | lr 7.2e-05 | ms/batch 317.60 | loss  4.17 | ppl    64.809
| epoch 201 step    87600 |    400 batches | lr 7.18e-05 | ms/batch 316.03 | loss  4.23 | ppl    68.959
----------------------------------------------------------------------------------------------------
| Eval 219 at step    87600 | time: 132.14s | valid loss  4.23 | valid ppl    68.585
----------------------------------------------------------------------------------------------------
| epoch 202 step    87650 |     14 batches | lr 7.16e-05 | ms/batch 410.36 | loss  4.27 | ppl    71.282
| epoch 202 step    87700 |     64 batches | lr 7.14e-05 | ms/batch 316.10 | loss  4.19 | ppl    66.180
| epoch 202 step    87750 |    114 batches | lr 7.13e-05 | ms/batch 316.22 | loss  4.25 | ppl    69.892
| epoch 202 step    87800 |    164 batches | lr 7.11e-05 | ms/batch 319.35 | loss  4.20 | ppl    66.582
| epoch 202 step    87850 |    214 batches | lr 7.09e-05 | ms/batch 314.11 | loss  4.27 | ppl    71.449
| epoch 202 step    87900 |    264 batches | lr 7.07e-05 | ms/batch 314.71 | loss  4.26 | ppl    70.899
| epoch 202 step    87950 |    314 batches | lr 7.05e-05 | ms/batch 317.37 | loss  4.22 | ppl    68.358
| epoch 202 step    88000 |    364 batches | lr 7.04e-05 | ms/batch 316.75 | loss  4.21 | ppl    67.128
----------------------------------------------------------------------------------------------------
| Eval 220 at step    88000 | time: 131.25s | valid loss  4.23 | valid ppl    68.479
----------------------------------------------------------------------------------------------------
| epoch 202 step    88050 |    414 batches | lr 7.02e-05 | ms/batch 415.09 | loss  4.23 | ppl    68.682
| epoch 203 step    88100 |     28 batches | lr 7e-05 | ms/batch 309.33 | loss  4.26 | ppl    70.960
| epoch 203 step    88150 |     78 batches | lr 6.98e-05 | ms/batch 316.19 | loss  4.21 | ppl    67.120
| epoch 203 step    88200 |    128 batches | lr 6.97e-05 | ms/batch 315.08 | loss  4.24 | ppl    69.299
| epoch 203 step    88250 |    178 batches | lr 6.95e-05 | ms/batch 318.18 | loss  4.24 | ppl    69.582
| epoch 203 step    88300 |    228 batches | lr 6.93e-05 | ms/batch 315.85 | loss  4.27 | ppl    71.187
| epoch 203 step    88350 |    278 batches | lr 6.91e-05 | ms/batch 316.82 | loss  4.24 | ppl    69.446
| epoch 203 step    88400 |    328 batches | lr 6.9e-05 | ms/batch 315.82 | loss  4.19 | ppl    66.351
----------------------------------------------------------------------------------------------------
| Eval 221 at step    88400 | time: 131.11s | valid loss  4.23 | valid ppl    68.463
----------------------------------------------------------------------------------------------------
| epoch 203 step    88450 |    378 batches | lr 6.88e-05 | ms/batch 414.12 | loss  4.21 | ppl    67.301
| epoch 203 step    88500 |    428 batches | lr 6.86e-05 | ms/batch 314.78 | loss  4.25 | ppl    70.363
| epoch 204 step    88550 |     42 batches | lr 6.84e-05 | ms/batch 309.11 | loss  4.26 | ppl    70.893
| epoch 204 step    88600 |     92 batches | lr 6.83e-05 | ms/batch 314.20 | loss  4.17 | ppl    64.468
| epoch 204 step    88650 |    142 batches | lr 6.81e-05 | ms/batch 314.41 | loss  4.24 | ppl    69.346
| epoch 204 step    88700 |    192 batches | lr 6.79e-05 | ms/batch 313.85 | loss  4.25 | ppl    70.089
| epoch 204 step    88750 |    242 batches | lr 6.77e-05 | ms/batch 314.15 | loss  4.27 | ppl    71.717
| epoch 204 step    88800 |    292 batches | lr 6.76e-05 | ms/batch 314.02 | loss  4.29 | ppl    72.776
----------------------------------------------------------------------------------------------------
| Eval 222 at step    88800 | time: 130.41s | valid loss  4.23 | valid ppl    68.498
----------------------------------------------------------------------------------------------------
| epoch 204 step    88850 |    342 batches | lr 6.74e-05 | ms/batch 414.13 | loss  4.15 | ppl    63.578
| epoch 204 step    88900 |    392 batches | lr 6.72e-05 | ms/batch 315.88 | loss  4.21 | ppl    67.128
| epoch 205 step    88950 |      6 batches | lr 6.7e-05 | ms/batch 307.97 | loss  4.27 | ppl    71.382
| epoch 205 step    89000 |     56 batches | lr 6.69e-05 | ms/batch 313.90 | loss  4.21 | ppl    67.278
| epoch 205 step    89050 |    106 batches | lr 6.67e-05 | ms/batch 313.67 | loss  4.20 | ppl    66.694
| epoch 205 step    89100 |    156 batches | lr 6.65e-05 | ms/batch 315.14 | loss  4.25 | ppl    69.843
| epoch 205 step    89150 |    206 batches | lr 6.64e-05 | ms/batch 316.38 | loss  4.24 | ppl    69.636
| epoch 205 step    89200 |    256 batches | lr 6.62e-05 | ms/batch 315.87 | loss  4.27 | ppl    71.209
----------------------------------------------------------------------------------------------------
| Eval 223 at step    89200 | time: 130.66s | valid loss  4.23 | valid ppl    68.451
----------------------------------------------------------------------------------------------------
| epoch 205 step    89250 |    306 batches | lr 6.6e-05 | ms/batch 414.13 | loss  4.29 | ppl    72.744
| epoch 205 step    89300 |    356 batches | lr 6.58e-05 | ms/batch 315.84 | loss  4.18 | ppl    65.363
| epoch 205 step    89350 |    406 batches | lr 6.57e-05 | ms/batch 315.58 | loss  4.21 | ppl    67.097
| epoch 206 step    89400 |     20 batches | lr 6.55e-05 | ms/batch 310.71 | loss  4.27 | ppl    71.446
| epoch 206 step    89450 |     70 batches | lr 6.53e-05 | ms/batch 316.53 | loss  4.17 | ppl    64.589
| epoch 206 step    89500 |    120 batches | lr 6.52e-05 | ms/batch 315.52 | loss  4.27 | ppl    71.182
| epoch 206 step    89550 |    170 batches | lr 6.5e-05 | ms/batch 315.70 | loss  4.24 | ppl    69.478
| epoch 206 step    89600 |    220 batches | lr 6.48e-05 | ms/batch 315.05 | loss  4.26 | ppl    70.849
----------------------------------------------------------------------------------------------------
| Eval 224 at step    89600 | time: 130.92s | valid loss  4.23 | valid ppl    68.443
----------------------------------------------------------------------------------------------------
| epoch 206 step    89650 |    270 batches | lr 6.46e-05 | ms/batch 411.22 | loss  4.27 | ppl    71.346
| epoch 206 step    89700 |    320 batches | lr 6.45e-05 | ms/batch 313.59 | loss  4.24 | ppl    69.658
| epoch 206 step    89750 |    370 batches | lr 6.43e-05 | ms/batch 313.35 | loss  4.22 | ppl    68.033
| epoch 206 step    89800 |    420 batches | lr 6.41e-05 | ms/batch 312.98 | loss  4.23 | ppl    69.010
| epoch 207 step    89850 |     34 batches | lr 6.4e-05 | ms/batch 306.48 | loss  4.28 | ppl    71.967
| epoch 207 step    89900 |     84 batches | lr 6.38e-05 | ms/batch 313.83 | loss  4.18 | ppl    65.170
| epoch 207 step    89950 |    134 batches | lr 6.36e-05 | ms/batch 313.29 | loss  4.25 | ppl    69.780
| epoch 207 step    90000 |    184 batches | lr 6.35e-05 | ms/batch 313.50 | loss  4.23 | ppl    69.054
----------------------------------------------------------------------------------------------------
| Eval 225 at step    90000 | time: 129.92s | valid loss  4.22 | valid ppl    67.912
----------------------------------------------------------------------------------------------------
| epoch 207 step    90050 |    234 batches | lr 6.33e-05 | ms/batch 446.39 | loss  4.27 | ppl    71.594
| epoch 207 step    90100 |    284 batches | lr 6.31e-05 | ms/batch 316.21 | loss  4.29 | ppl    72.631
| epoch 207 step    90150 |    334 batches | lr 6.29e-05 | ms/batch 315.81 | loss  4.16 | ppl    64.112
| epoch 207 step    90200 |    384 batches | lr 6.28e-05 | ms/batch 314.97 | loss  4.20 | ppl    66.738
| epoch 207 step    90250 |    434 batches | lr 6.26e-05 | ms/batch 315.69 | loss  4.26 | ppl    70.854
| epoch 208 step    90300 |     48 batches | lr 6.24e-05 | ms/batch 308.29 | loss  4.21 | ppl    67.215
| epoch 208 step    90350 |     98 batches | lr 6.23e-05 | ms/batch 312.78 | loss  4.20 | ppl    66.707
| epoch 208 step    90400 |    148 batches | lr 6.21e-05 | ms/batch 313.50 | loss  4.22 | ppl    67.983
----------------------------------------------------------------------------------------------------
| Eval 226 at step    90400 | time: 130.52s | valid loss  4.22 | valid ppl    68.012
----------------------------------------------------------------------------------------------------
| epoch 208 step    90450 |    198 batches | lr 6.19e-05 | ms/batch 412.48 | loss  4.24 | ppl    69.500
| epoch 208 step    90500 |    248 batches | lr 6.18e-05 | ms/batch 316.37 | loss  4.28 | ppl    72.427
| epoch 208 step    90550 |    298 batches | lr 6.16e-05 | ms/batch 314.61 | loss  4.29 | ppl    72.858
| epoch 208 step    90600 |    348 batches | lr 6.14e-05 | ms/batch 317.59 | loss  4.15 | ppl    63.167
| epoch 208 step    90650 |    398 batches | lr 6.13e-05 | ms/batch 315.48 | loss  4.24 | ppl    69.232
| epoch 209 step    90700 |     12 batches | lr 6.11e-05 | ms/batch 308.57 | loss  4.26 | ppl    70.813
| epoch 209 step    90750 |     62 batches | lr 6.09e-05 | ms/batch 313.94 | loss  4.16 | ppl    64.249
| epoch 209 step    90800 |    112 batches | lr 6.08e-05 | ms/batch 314.86 | loss  4.21 | ppl    67.507
----------------------------------------------------------------------------------------------------
| Eval 227 at step    90800 | time: 130.73s | valid loss  4.22 | valid ppl    68.293
----------------------------------------------------------------------------------------------------
| epoch 209 step    90850 |    162 batches | lr 6.06e-05 | ms/batch 414.71 | loss  4.24 | ppl    69.519
| epoch 209 step    90900 |    212 batches | lr 6.04e-05 | ms/batch 315.48 | loss  4.24 | ppl    69.590
| epoch 209 step    90950 |    262 batches | lr 6.03e-05 | ms/batch 315.07 | loss  4.25 | ppl    70.330
| epoch 209 step    91000 |    312 batches | lr 6.01e-05 | ms/batch 314.04 | loss  4.24 | ppl    69.731
| epoch 209 step    91050 |    362 batches | lr 5.99e-05 | ms/batch 313.94 | loss  4.18 | ppl    65.277
| epoch 209 step    91100 |    412 batches | lr 5.98e-05 | ms/batch 313.21 | loss  4.22 | ppl    67.890
| epoch 210 step    91150 |     26 batches | lr 5.96e-05 | ms/batch 307.26 | loss  4.27 | ppl    71.726
| epoch 210 step    91200 |     76 batches | lr 5.94e-05 | ms/batch 314.59 | loss  4.19 | ppl    66.074
----------------------------------------------------------------------------------------------------
| Eval 228 at step    91200 | time: 130.40s | valid loss  4.22 | valid ppl    68.198
----------------------------------------------------------------------------------------------------
| epoch 210 step    91250 |    126 batches | lr 5.93e-05 | ms/batch 412.77 | loss  4.19 | ppl    65.879
| epoch 210 step    91300 |    176 batches | lr 5.91e-05 | ms/batch 313.82 | loss  4.23 | ppl    68.543
| epoch 210 step    91350 |    226 batches | lr 5.89e-05 | ms/batch 313.49 | loss  4.23 | ppl    69.002
| epoch 210 step    91400 |    276 batches | lr 5.88e-05 | ms/batch 313.17 | loss  4.26 | ppl    70.501
| epoch 210 step    91450 |    326 batches | lr 5.86e-05 | ms/batch 314.46 | loss  4.20 | ppl    66.442
| epoch 210 step    91500 |    376 batches | lr 5.84e-05 | ms/batch 315.08 | loss  4.20 | ppl    66.817
| epoch 210 step    91550 |    426 batches | lr 5.83e-05 | ms/batch 313.58 | loss  4.21 | ppl    67.575
| epoch 211 step    91600 |     40 batches | lr 5.81e-05 | ms/batch 308.13 | loss  4.25 | ppl    69.791
----------------------------------------------------------------------------------------------------
| Eval 229 at step    91600 | time: 130.24s | valid loss  4.22 | valid ppl    68.311
----------------------------------------------------------------------------------------------------
| epoch 211 step    91650 |     90 batches | lr 5.8e-05 | ms/batch 413.54 | loss  4.17 | ppl    64.655
| epoch 211 step    91700 |    140 batches | lr 5.78e-05 | ms/batch 312.60 | loss  4.23 | ppl    68.736
| epoch 211 step    91750 |    190 batches | lr 5.76e-05 | ms/batch 313.35 | loss  4.25 | ppl    70.210
| epoch 211 step    91800 |    240 batches | lr 5.75e-05 | ms/batch 313.01 | loss  4.28 | ppl    72.311
| epoch 211 step    91850 |    290 batches | lr 5.73e-05 | ms/batch 312.78 | loss  4.29 | ppl    72.659
| epoch 211 step    91900 |    340 batches | lr 5.71e-05 | ms/batch 314.19 | loss  4.13 | ppl    62.326
| epoch 211 step    91950 |    390 batches | lr 5.7e-05 | ms/batch 314.07 | loss  4.24 | ppl    69.062
| epoch 212 step    92000 |      4 batches | lr 5.68e-05 | ms/batch 307.05 | loss  4.25 | ppl    70.092
----------------------------------------------------------------------------------------------------
| Eval 230 at step    92000 | time: 130.03s | valid loss  4.22 | valid ppl    67.871
----------------------------------------------------------------------------------------------------
| epoch 212 step    92050 |     54 batches | lr 5.67e-05 | ms/batch 449.78 | loss  4.19 | ppl    65.740
| epoch 212 step    92100 |    104 batches | lr 5.65e-05 | ms/batch 313.90 | loss  4.18 | ppl    65.348
| epoch 212 step    92150 |    154 batches | lr 5.63e-05 | ms/batch 315.54 | loss  4.23 | ppl    68.546
| epoch 212 step    92200 |    204 batches | lr 5.62e-05 | ms/batch 314.28 | loss  4.23 | ppl    68.948
| epoch 212 step    92250 |    254 batches | lr 5.6e-05 | ms/batch 313.69 | loss  4.26 | ppl    70.932
| epoch 212 step    92300 |    304 batches | lr 5.58e-05 | ms/batch 315.09 | loss  4.24 | ppl    69.506
| epoch 212 step    92350 |    354 batches | lr 5.57e-05 | ms/batch 315.14 | loss  4.16 | ppl    63.762
| epoch 212 step    92400 |    404 batches | lr 5.55e-05 | ms/batch 313.66 | loss  4.21 | ppl    67.139
----------------------------------------------------------------------------------------------------
| Eval 231 at step    92400 | time: 130.78s | valid loss  4.22 | valid ppl    67.869
----------------------------------------------------------------------------------------------------
| epoch 213 step    92450 |     18 batches | lr 5.54e-05 | ms/batch 440.95 | loss  4.29 | ppl    72.605
| epoch 213 step    92500 |     68 batches | lr 5.52e-05 | ms/batch 315.03 | loss  4.18 | ppl    65.358
| epoch 213 step    92550 |    118 batches | lr 5.5e-05 | ms/batch 314.72 | loss  4.22 | ppl    67.874
| epoch 213 step    92600 |    168 batches | lr 5.49e-05 | ms/batch 315.46 | loss  4.21 | ppl    67.430
| epoch 213 step    92650 |    218 batches | lr 5.47e-05 | ms/batch 315.03 | loss  4.23 | ppl    68.779
| epoch 213 step    92700 |    268 batches | lr 5.46e-05 | ms/batch 316.27 | loss  4.23 | ppl    68.908
| epoch 213 step    92750 |    318 batches | lr 5.44e-05 | ms/batch 314.66 | loss  4.24 | ppl    69.208
| epoch 213 step    92800 |    368 batches | lr 5.42e-05 | ms/batch 314.95 | loss  4.20 | ppl    66.359
----------------------------------------------------------------------------------------------------
| Eval 232 at step    92800 | time: 130.72s | valid loss  4.22 | valid ppl    67.820
----------------------------------------------------------------------------------------------------
| epoch 213 step    92850 |    418 batches | lr 5.41e-05 | ms/batch 446.74 | loss  4.22 | ppl    68.324
| epoch 214 step    92900 |     32 batches | lr 5.39e-05 | ms/batch 309.08 | loss  4.26 | ppl    71.037
| epoch 214 step    92950 |     82 batches | lr 5.38e-05 | ms/batch 313.90 | loss  4.17 | ppl    64.645
| epoch 214 step    93000 |    132 batches | lr 5.36e-05 | ms/batch 313.70 | loss  4.24 | ppl    69.137
| epoch 214 step    93050 |    182 batches | lr 5.35e-05 | ms/batch 313.67 | loss  4.21 | ppl    67.694
| epoch 214 step    93100 |    232 batches | lr 5.33e-05 | ms/batch 313.63 | loss  4.23 | ppl    68.524
| epoch 214 step    93150 |    282 batches | lr 5.31e-05 | ms/batch 313.95 | loss  4.29 | ppl    72.944
| epoch 214 step    93200 |    332 batches | lr 5.3e-05 | ms/batch 313.85 | loss  4.17 | ppl    64.645
----------------------------------------------------------------------------------------------------
| Eval 233 at step    93200 | time: 130.22s | valid loss  4.23 | valid ppl    68.425
----------------------------------------------------------------------------------------------------
| epoch 214 step    93250 |    382 batches | lr 5.28e-05 | ms/batch 414.86 | loss  4.22 | ppl    67.771
| epoch 214 step    93300 |    432 batches | lr 5.27e-05 | ms/batch 316.08 | loss  4.23 | ppl    68.798
| epoch 215 step    93350 |     46 batches | lr 5.25e-05 | ms/batch 309.47 | loss  4.20 | ppl    66.598
| epoch 215 step    93400 |     96 batches | lr 5.23e-05 | ms/batch 315.94 | loss  4.18 | ppl    65.658
| epoch 215 step    93450 |    146 batches | lr 5.22e-05 | ms/batch 315.05 | loss  4.23 | ppl    68.830
| epoch 215 step    93500 |    196 batches | lr 5.2e-05 | ms/batch 314.38 | loss  4.22 | ppl    67.943
| epoch 215 step    93550 |    246 batches | lr 5.19e-05 | ms/batch 314.88 | loss  4.25 | ppl    70.092
| epoch 215 step    93600 |    296 batches | lr 5.17e-05 | ms/batch 314.83 | loss  4.27 | ppl    71.198
----------------------------------------------------------------------------------------------------
| Eval 234 at step    93600 | time: 130.79s | valid loss  4.22 | valid ppl    67.958
----------------------------------------------------------------------------------------------------
| epoch 215 step    93650 |    346 batches | lr 5.16e-05 | ms/batch 413.29 | loss  4.15 | ppl    63.543
| epoch 215 step    93700 |    396 batches | lr 5.14e-05 | ms/batch 314.57 | loss  4.23 | ppl    68.992
| epoch 216 step    93750 |     10 batches | lr 5.13e-05 | ms/batch 308.21 | loss  4.25 | ppl    70.196
| epoch 216 step    93800 |     60 batches | lr 5.11e-05 | ms/batch 315.57 | loss  4.17 | ppl    64.551
| epoch 216 step    93850 |    110 batches | lr 5.09e-05 | ms/batch 315.20 | loss  4.20 | ppl    66.406
| epoch 216 step    93900 |    160 batches | lr 5.08e-05 | ms/batch 314.03 | loss  4.23 | ppl    68.415
| epoch 216 step    93950 |    210 batches | lr 5.06e-05 | ms/batch 315.54 | loss  4.22 | ppl    67.964
| epoch 216 step    94000 |    260 batches | lr 5.05e-05 | ms/batch 315.23 | loss  4.22 | ppl    68.174
----------------------------------------------------------------------------------------------------
| Eval 235 at step    94000 | time: 130.57s | valid loss  4.22 | valid ppl    68.020
----------------------------------------------------------------------------------------------------
| epoch 216 step    94050 |    310 batches | lr 5.03e-05 | ms/batch 412.51 | loss  4.23 | ppl    68.481
| epoch 216 step    94100 |    360 batches | lr 5.02e-05 | ms/batch 314.59 | loss  4.15 | ppl    63.251
| epoch 216 step    94150 |    410 batches | lr 5e-05 | ms/batch 314.38 | loss  4.18 | ppl    65.681
| epoch 217 step    94200 |     24 batches | lr 4.99e-05 | ms/batch 308.92 | loss  4.27 | ppl    71.240
| epoch 217 step    94250 |     74 batches | lr 4.97e-05 | ms/batch 314.23 | loss  4.16 | ppl    64.305
| epoch 217 step    94300 |    124 batches | lr 4.96e-05 | ms/batch 314.26 | loss  4.21 | ppl    67.573
| epoch 217 step    94350 |    174 batches | lr 4.94e-05 | ms/batch 313.14 | loss  4.18 | ppl    65.422
| epoch 217 step    94400 |    224 batches | lr 4.93e-05 | ms/batch 314.90 | loss  4.23 | ppl    68.792
----------------------------------------------------------------------------------------------------
| Eval 236 at step    94400 | time: 130.39s | valid loss  4.22 | valid ppl    67.945
----------------------------------------------------------------------------------------------------
| epoch 217 step    94450 |    274 batches | lr 4.91e-05 | ms/batch 412.60 | loss  4.25 | ppl    70.040
| epoch 217 step    94500 |    324 batches | lr 4.89e-05 | ms/batch 315.46 | loss  4.19 | ppl    65.884
| epoch 217 step    94550 |    374 batches | lr 4.88e-05 | ms/batch 314.22 | loss  4.17 | ppl    64.979
| epoch 217 step    94600 |    424 batches | lr 4.86e-05 | ms/batch 314.84 | loss  4.20 | ppl    66.767
| epoch 218 step    94650 |     38 batches | lr 4.85e-05 | ms/batch 310.84 | loss  4.22 | ppl    67.832
| epoch 218 step    94700 |     88 batches | lr 4.83e-05 | ms/batch 317.52 | loss  4.14 | ppl    63.078
| epoch 218 step    94750 |    138 batches | lr 4.82e-05 | ms/batch 316.33 | loss  4.22 | ppl    67.903
| epoch 218 step    94800 |    188 batches | lr 4.8e-05 | ms/batch 314.65 | loss  4.22 | ppl    67.715
----------------------------------------------------------------------------------------------------
| Eval 237 at step    94800 | time: 130.83s | valid loss  4.22 | valid ppl    68.142
----------------------------------------------------------------------------------------------------
| epoch 218 step    94850 |    238 batches | lr 4.79e-05 | ms/batch 411.30 | loss  4.23 | ppl    68.546
| epoch 218 step    94900 |    288 batches | lr 4.77e-05 | ms/batch 315.31 | loss  4.27 | ppl    71.628
| epoch 218 step    94950 |    338 batches | lr 4.76e-05 | ms/batch 315.14 | loss  4.14 | ppl    62.732
| epoch 218 step    95000 |    388 batches | lr 4.74e-05 | ms/batch 313.99 | loss  4.20 | ppl    66.699
| epoch 219 step    95050 |      2 batches | lr 4.73e-05 | ms/batch 308.18 | loss  4.25 | ppl    69.791
| epoch 219 step    95100 |     52 batches | lr 4.71e-05 | ms/batch 314.22 | loss  4.16 | ppl    64.327
| epoch 219 step    95150 |    102 batches | lr 4.7e-05 | ms/batch 315.05 | loss  4.17 | ppl    64.501
| epoch 219 step    95200 |    152 batches | lr 4.68e-05 | ms/batch 314.45 | loss  4.21 | ppl    67.559
----------------------------------------------------------------------------------------------------
| Eval 238 at step    95200 | time: 130.34s | valid loss  4.21 | valid ppl    67.448
----------------------------------------------------------------------------------------------------
| epoch 219 step    95250 |    202 batches | lr 4.67e-05 | ms/batch 445.07 | loss  4.22 | ppl    68.020
| epoch 219 step    95300 |    252 batches | lr 4.65e-05 | ms/batch 314.43 | loss  4.27 | ppl    71.622
| epoch 219 step    95350 |    302 batches | lr 4.64e-05 | ms/batch 315.63 | loss  4.26 | ppl    71.043
| epoch 219 step    95400 |    352 batches | lr 4.62e-05 | ms/batch 314.44 | loss  4.14 | ppl    63.026
| epoch 219 step    95450 |    402 batches | lr 4.61e-05 | ms/batch 315.80 | loss  4.22 | ppl    67.805
| epoch 220 step    95500 |     16 batches | lr 4.59e-05 | ms/batch 309.29 | loss  4.22 | ppl    68.204
| epoch 220 step    95550 |     66 batches | lr 4.58e-05 | ms/batch 314.87 | loss  4.16 | ppl    63.872
| epoch 220 step    95600 |    116 batches | lr 4.56e-05 | ms/batch 316.50 | loss  4.17 | ppl    64.619
----------------------------------------------------------------------------------------------------
| Eval 239 at step    95600 | time: 130.67s | valid loss  4.22 | valid ppl    67.864
----------------------------------------------------------------------------------------------------
| epoch 220 step    95650 |    166 batches | lr 4.55e-05 | ms/batch 414.83 | loss  4.20 | ppl    66.710
| epoch 220 step    95700 |    216 batches | lr 4.53e-05 | ms/batch 314.60 | loss  4.21 | ppl    67.388
| epoch 220 step    95750 |    266 batches | lr 4.52e-05 | ms/batch 315.31 | loss  4.25 | ppl    70.133
| epoch 220 step    95800 |    316 batches | lr 4.5e-05 | ms/batch 314.76 | loss  4.21 | ppl    67.275
| epoch 220 step    95850 |    366 batches | lr 4.49e-05 | ms/batch 318.94 | loss  4.18 | ppl    65.663
| epoch 220 step    95900 |    416 batches | lr 4.48e-05 | ms/batch 330.77 | loss  4.20 | ppl    66.932
| epoch 221 step    95950 |     30 batches | lr 4.46e-05 | ms/batch 325.13 | loss  4.26 | ppl    70.523
| epoch 221 step    96000 |     80 batches | lr 4.45e-05 | ms/batch 316.54 | loss  4.15 | ppl    63.715
----------------------------------------------------------------------------------------------------
| Eval 240 at step    96000 | time: 132.49s | valid loss  4.21 | valid ppl    67.626
----------------------------------------------------------------------------------------------------
| epoch 221 step    96050 |    130 batches | lr 4.43e-05 | ms/batch 411.78 | loss  4.20 | ppl    66.432
| epoch 221 step    96100 |    180 batches | lr 4.42e-05 | ms/batch 312.59 | loss  4.20 | ppl    66.356
| epoch 221 step    96150 |    230 batches | lr 4.4e-05 | ms/batch 314.58 | loss  4.22 | ppl    67.821
| epoch 221 step    96200 |    280 batches | lr 4.39e-05 | ms/batch 315.25 | loss  4.26 | ppl    70.702
| epoch 221 step    96250 |    330 batches | lr 4.37e-05 | ms/batch 316.23 | loss  4.17 | ppl    64.753
| epoch 221 step    96300 |    380 batches | lr 4.36e-05 | ms/batch 316.95 | loss  4.19 | ppl    65.863
| epoch 221 step    96350 |    430 batches | lr 4.34e-05 | ms/batch 315.42 | loss  4.22 | ppl    68.159
| epoch 222 step    96400 |     44 batches | lr 4.33e-05 | ms/batch 325.64 | loss  4.20 | ppl    66.699
----------------------------------------------------------------------------------------------------
| Eval 241 at step    96400 | time: 131.75s | valid loss  4.22 | valid ppl    67.861
----------------------------------------------------------------------------------------------------
| epoch 222 step    96450 |     94 batches | lr 4.32e-05 | ms/batch 435.71 | loss  4.15 | ppl    63.685
| epoch 222 step    96500 |    144 batches | lr 4.3e-05 | ms/batch 330.57 | loss  4.20 | ppl    66.512
| epoch 222 step    96550 |    194 batches | lr 4.29e-05 | ms/batch 331.31 | loss  4.20 | ppl    66.989
| epoch 222 step    96600 |    244 batches | lr 4.27e-05 | ms/batch 331.26 | loss  4.22 | ppl    68.148
| epoch 222 step    96650 |    294 batches | lr 4.26e-05 | ms/batch 330.39 | loss  4.24 | ppl    69.625
| epoch 222 step    96700 |    344 batches | lr 4.24e-05 | ms/batch 331.58 | loss  4.10 | ppl    60.317
| epoch 222 step    96750 |    394 batches | lr 4.23e-05 | ms/batch 331.22 | loss  4.22 | ppl    68.209
| epoch 223 step    96800 |      8 batches | lr 4.21e-05 | ms/batch 325.90 | loss  4.25 | ppl    70.347
----------------------------------------------------------------------------------------------------
| Eval 242 at step    96800 | time: 137.35s | valid loss  4.21 | valid ppl    67.243
----------------------------------------------------------------------------------------------------
| epoch 223 step    96850 |     58 batches | lr 4.2e-05 | ms/batch 466.86 | loss  4.19 | ppl    65.817
| epoch 223 step    96900 |    108 batches | lr 4.19e-05 | ms/batch 330.79 | loss  4.17 | ppl    64.954
| epoch 223 step    96950 |    158 batches | lr 4.17e-05 | ms/batch 330.42 | loss  4.20 | ppl    66.517
| epoch 223 step    97000 |    208 batches | lr 4.16e-05 | ms/batch 331.47 | loss  4.22 | ppl    67.864
| epoch 223 step    97050 |    258 batches | lr 4.14e-05 | ms/batch 331.13 | loss  4.23 | ppl    68.983
| epoch 223 step    97100 |    308 batches | lr 4.13e-05 | ms/batch 331.54 | loss  4.24 | ppl    69.362
| epoch 223 step    97150 |    358 batches | lr 4.11e-05 | ms/batch 314.84 | loss  4.15 | ppl    63.162
| epoch 223 step    97200 |    408 batches | lr 4.1e-05 | ms/batch 315.97 | loss  4.18 | ppl    65.251
----------------------------------------------------------------------------------------------------
| Eval 243 at step    97200 | time: 135.80s | valid loss  4.22 | valid ppl    67.724
----------------------------------------------------------------------------------------------------
| epoch 224 step    97250 |     22 batches | lr 4.09e-05 | ms/batch 408.54 | loss  4.27 | ppl    71.488
| epoch 224 step    97300 |     72 batches | lr 4.07e-05 | ms/batch 315.89 | loss  4.16 | ppl    64.024
| epoch 224 step    97350 |    122 batches | lr 4.06e-05 | ms/batch 316.06 | loss  4.20 | ppl    66.406
| epoch 224 step    97400 |    172 batches | lr 4.04e-05 | ms/batch 315.79 | loss  4.20 | ppl    66.468
| epoch 224 step    97450 |    222 batches | lr 4.03e-05 | ms/batch 315.19 | loss  4.25 | ppl    70.259
| epoch 224 step    97500 |    272 batches | lr 4.02e-05 | ms/batch 315.49 | loss  4.23 | ppl    68.838
| epoch 224 step    97550 |    322 batches | lr 4e-05 | ms/batch 315.61 | loss  4.17 | ppl    64.438
| epoch 224 step    97600 |    372 batches | lr 3.99e-05 | ms/batch 315.07 | loss  4.19 | ppl    66.305
----------------------------------------------------------------------------------------------------
| Eval 244 at step    97600 | time: 130.88s | valid loss  4.21 | valid ppl    67.511
----------------------------------------------------------------------------------------------------
| epoch 224 step    97650 |    422 batches | lr 3.97e-05 | ms/batch 413.18 | loss  4.20 | ppl    66.413
| epoch 225 step    97700 |     36 batches | lr 3.96e-05 | ms/batch 308.45 | loss  4.25 | ppl    70.037
| epoch 225 step    97750 |     86 batches | lr 3.95e-05 | ms/batch 314.54 | loss  4.13 | ppl    62.392
| epoch 225 step    97800 |    136 batches | lr 3.93e-05 | ms/batch 314.39 | loss  4.22 | ppl    67.819
| epoch 225 step    97850 |    186 batches | lr 3.92e-05 | ms/batch 314.79 | loss  4.24 | ppl    69.592
| epoch 225 step    97900 |    236 batches | lr 3.9e-05 | ms/batch 316.77 | loss  4.22 | ppl    67.766
| epoch 225 step    97950 |    286 batches | lr 3.89e-05 | ms/batch 315.66 | loss  4.26 | ppl    70.943
| epoch 225 step    98000 |    336 batches | lr 3.88e-05 | ms/batch 313.98 | loss  4.12 | ppl    61.595
----------------------------------------------------------------------------------------------------
| Eval 245 at step    98000 | time: 130.61s | valid loss  4.21 | valid ppl    67.660
----------------------------------------------------------------------------------------------------
| epoch 225 step    98050 |    386 batches | lr 3.86e-05 | ms/batch 415.80 | loss  4.21 | ppl    67.131
| epoch 225 step    98100 |    436 batches | lr 3.85e-05 | ms/batch 312.91 | loss  4.21 | ppl    67.573
| epoch 226 step    98150 |     50 batches | lr 3.84e-05 | ms/batch 319.15 | loss  4.21 | ppl    67.486
| epoch 226 step    98200 |    100 batches | lr 3.82e-05 | ms/batch 330.83 | loss  4.19 | ppl    65.964
| epoch 226 step    98250 |    150 batches | lr 3.81e-05 | ms/batch 314.05 | loss  4.20 | ppl    66.879
| epoch 226 step    98300 |    200 batches | lr 3.79e-05 | ms/batch 313.94 | loss  4.22 | ppl    67.880
| epoch 226 step    98350 |    250 batches | lr 3.78e-05 | ms/batch 314.89 | loss  4.21 | ppl    67.301
| epoch 226 step    98400 |    300 batches | lr 3.77e-05 | ms/batch 313.10 | loss  4.25 | ppl    70.391
----------------------------------------------------------------------------------------------------
| Eval 246 at step    98400 | time: 131.70s | valid loss  4.21 | valid ppl    67.436
----------------------------------------------------------------------------------------------------
| epoch 226 step    98450 |    350 batches | lr 3.75e-05 | ms/batch 412.37 | loss  4.12 | ppl    61.516
| epoch 226 step    98500 |    400 batches | lr 3.74e-05 | ms/batch 316.06 | loss  4.19 | ppl    66.206
| epoch 227 step    98550 |     14 batches | lr 3.73e-05 | ms/batch 310.88 | loss  4.21 | ppl    67.270
| epoch 227 step    98600 |     64 batches | lr 3.71e-05 | ms/batch 316.08 | loss  4.14 | ppl    62.985
| epoch 227 step    98650 |    114 batches | lr 3.7e-05 | ms/batch 314.78 | loss  4.18 | ppl    65.053
| epoch 227 step    98700 |    164 batches | lr 3.69e-05 | ms/batch 316.80 | loss  4.20 | ppl    66.746
| epoch 227 step    98750 |    214 batches | lr 3.67e-05 | ms/batch 316.15 | loss  4.22 | ppl    68.156
| epoch 227 step    98800 |    264 batches | lr 3.66e-05 | ms/batch 314.60 | loss  4.20 | ppl    66.629
----------------------------------------------------------------------------------------------------
| Eval 247 at step    98800 | time: 130.88s | valid loss  4.22 | valid ppl    67.797
----------------------------------------------------------------------------------------------------
| epoch 227 step    98850 |    314 batches | lr 3.65e-05 | ms/batch 413.15 | loss  4.21 | ppl    67.272
| epoch 227 step    98900 |    364 batches | lr 3.63e-05 | ms/batch 314.32 | loss  4.13 | ppl    61.916
| epoch 227 step    98950 |    414 batches | lr 3.62e-05 | ms/batch 313.28 | loss  4.18 | ppl    65.269
| epoch 228 step    99000 |     28 batches | lr 3.61e-05 | ms/batch 308.68 | loss  4.23 | ppl    68.514
| epoch 228 step    99050 |     78 batches | lr 3.59e-05 | ms/batch 313.43 | loss  4.16 | ppl    64.109
| epoch 228 step    99100 |    128 batches | lr 3.58e-05 | ms/batch 313.57 | loss  4.18 | ppl    65.663
| epoch 228 step    99150 |    178 batches | lr 3.57e-05 | ms/batch 314.07 | loss  4.20 | ppl    66.697
| epoch 228 step    99200 |    228 batches | lr 3.55e-05 | ms/batch 313.62 | loss  4.22 | ppl    67.999
----------------------------------------------------------------------------------------------------
| Eval 248 at step    99200 | time: 130.21s | valid loss  4.21 | valid ppl    67.623
----------------------------------------------------------------------------------------------------
| epoch 228 step    99250 |    278 batches | lr 3.54e-05 | ms/batch 414.26 | loss  4.22 | ppl    68.348
| epoch 228 step    99300 |    328 batches | lr 3.53e-05 | ms/batch 316.13 | loss  4.14 | ppl    62.497
| epoch 228 step    99350 |    378 batches | lr 3.51e-05 | ms/batch 315.38 | loss  4.18 | ppl    65.136
| epoch 228 step    99400 |    428 batches | lr 3.5e-05 | ms/batch 315.22 | loss  4.19 | ppl    66.198
| epoch 229 step    99450 |     42 batches | lr 3.49e-05 | ms/batch 311.52 | loss  4.21 | ppl    67.296
| epoch 229 step    99500 |     92 batches | lr 3.47e-05 | ms/batch 319.42 | loss  4.14 | ppl    62.538
| epoch 229 step    99550 |    142 batches | lr 3.46e-05 | ms/batch 318.20 | loss  4.20 | ppl    66.697
| epoch 229 step    99600 |    192 batches | lr 3.45e-05 | ms/batch 319.71 | loss  4.20 | ppl    66.725
----------------------------------------------------------------------------------------------------
| Eval 249 at step    99600 | time: 131.48s | valid loss  4.21 | valid ppl    67.164
----------------------------------------------------------------------------------------------------
| epoch 229 step    99650 |    242 batches | lr 3.43e-05 | ms/batch 446.90 | loss  4.20 | ppl    66.848
| epoch 229 step    99700 |    292 batches | lr 3.42e-05 | ms/batch 316.68 | loss  4.26 | ppl    71.090
| epoch 229 step    99750 |    342 batches | lr 3.41e-05 | ms/batch 315.43 | loss  4.09 | ppl    59.780
| epoch 229 step    99800 |    392 batches | lr 3.39e-05 | ms/batch 316.94 | loss  4.18 | ppl    65.121
| epoch 230 step    99850 |      6 batches | lr 3.38e-05 | ms/batch 310.23 | loss  4.24 | ppl    69.402
| epoch 230 step    99900 |     56 batches | lr 3.37e-05 | ms/batch 314.80 | loss  4.16 | ppl    64.034
| epoch 230 step    99950 |    106 batches | lr 3.36e-05 | ms/batch 315.71 | loss  4.16 | ppl    64.016
| epoch 230 step   100000 |    156 batches | lr 3.34e-05 | ms/batch 314.99 | loss  4.19 | ppl    66.025
----------------------------------------------------------------------------------------------------
| Eval 250 at step   100000 | time: 130.96s | valid loss  4.21 | valid ppl    67.104
----------------------------------------------------------------------------------------------------
| epoch 230 step   100050 |    206 batches | lr 3.33e-05 | ms/batch 445.59 | loss  4.21 | ppl    67.152
| epoch 230 step   100100 |    256 batches | lr 3.32e-05 | ms/batch 314.73 | loss  4.21 | ppl    67.057
| epoch 230 step   100150 |    306 batches | lr 3.3e-05 | ms/batch 313.73 | loss  4.25 | ppl    70.163
| epoch 230 step   100200 |    356 batches | lr 3.29e-05 | ms/batch 314.97 | loss  4.14 | ppl    62.710
| epoch 230 step   100250 |    406 batches | lr 3.28e-05 | ms/batch 315.61 | loss  4.18 | ppl    65.450
| epoch 231 step   100300 |     20 batches | lr 3.27e-05 | ms/batch 307.73 | loss  4.25 | ppl    69.780
| epoch 231 step   100350 |     70 batches | lr 3.25e-05 | ms/batch 314.45 | loss  4.14 | ppl    62.975
| epoch 231 step   100400 |    120 batches | lr 3.24e-05 | ms/batch 314.65 | loss  4.19 | ppl    66.330
----------------------------------------------------------------------------------------------------
| Eval 251 at step   100400 | time: 130.47s | valid loss  4.21 | valid ppl    67.456
----------------------------------------------------------------------------------------------------
| epoch 231 step   100450 |    170 batches | lr 3.23e-05 | ms/batch 411.52 | loss  4.21 | ppl    67.031
| epoch 231 step   100500 |    220 batches | lr 3.21e-05 | ms/batch 313.62 | loss  4.22 | ppl    68.321
| epoch 231 step   100550 |    270 batches | lr 3.2e-05 | ms/batch 314.87 | loss  4.22 | ppl    68.018
| epoch 231 step   100600 |    320 batches | lr 3.19e-05 | ms/batch 313.74 | loss  4.17 | ppl    64.941
| epoch 231 step   100650 |    370 batches | lr 3.18e-05 | ms/batch 313.41 | loss  4.14 | ppl    63.017
| epoch 231 step   100700 |    420 batches | lr 3.16e-05 | ms/batch 314.31 | loss  4.18 | ppl    65.686
| epoch 232 step   100750 |     34 batches | lr 3.15e-05 | ms/batch 307.84 | loss  4.24 | ppl    69.647
| epoch 232 step   100800 |     84 batches | lr 3.14e-05 | ms/batch 313.57 | loss  4.16 | ppl    63.774
----------------------------------------------------------------------------------------------------
| Eval 252 at step   100800 | time: 130.15s | valid loss  4.21 | valid ppl    67.507
----------------------------------------------------------------------------------------------------
| epoch 232 step   100850 |    134 batches | lr 3.13e-05 | ms/batch 411.75 | loss  4.19 | ppl    66.317
| epoch 232 step   100900 |    184 batches | lr 3.11e-05 | ms/batch 313.55 | loss  4.18 | ppl    65.404
| epoch 232 step   100950 |    234 batches | lr 3.1e-05 | ms/batch 313.16 | loss  4.23 | ppl    68.637
| epoch 232 step   101000 |    284 batches | lr 3.09e-05 | ms/batch 313.37 | loss  4.22 | ppl    68.375
| epoch 232 step   101050 |    334 batches | lr 3.08e-05 | ms/batch 314.29 | loss  4.13 | ppl    62.331
| epoch 232 step   101100 |    384 batches | lr 3.06e-05 | ms/batch 314.75 | loss  4.18 | ppl    65.053
| epoch 232 step   101150 |    434 batches | lr 3.05e-05 | ms/batch 313.61 | loss  4.24 | ppl    69.411
| epoch 233 step   101200 |     48 batches | lr 3.04e-05 | ms/batch 309.69 | loss  4.17 | ppl    64.872
----------------------------------------------------------------------------------------------------
| Eval 253 at step   101200 | time: 130.23s | valid loss  4.21 | valid ppl    67.515
----------------------------------------------------------------------------------------------------
| epoch 233 step   101250 |     98 batches | lr 3.03e-05 | ms/batch 413.30 | loss  4.17 | ppl    64.597
| epoch 233 step   101300 |    148 batches | lr 3.01e-05 | ms/batch 315.38 | loss  4.17 | ppl    64.617
| epoch 233 step   101350 |    198 batches | lr 3e-05 | ms/batch 315.37 | loss  4.19 | ppl    66.005
| epoch 233 step   101400 |    248 batches | lr 2.99e-05 | ms/batch 314.06 | loss  4.23 | ppl    68.704
| epoch 233 step   101450 |    298 batches | lr 2.98e-05 | ms/batch 315.82 | loss  4.24 | ppl    69.267
| epoch 233 step   101500 |    348 batches | lr 2.96e-05 | ms/batch 314.82 | loss  4.12 | ppl    61.384
| epoch 233 step   101550 |    398 batches | lr 2.95e-05 | ms/batch 314.38 | loss  4.18 | ppl    65.448
| epoch 234 step   101600 |     12 batches | lr 2.94e-05 | ms/batch 309.71 | loss  4.23 | ppl    68.618
----------------------------------------------------------------------------------------------------
| Eval 254 at step   101600 | time: 130.62s | valid loss  4.21 | valid ppl    67.296
----------------------------------------------------------------------------------------------------
| epoch 234 step   101650 |     62 batches | lr 2.93e-05 | ms/batch 414.27 | loss  4.13 | ppl    62.460
| epoch 234 step   101700 |    112 batches | lr 2.92e-05 | ms/batch 315.47 | loss  4.16 | ppl    64.312
| epoch 234 step   101750 |    162 batches | lr 2.9e-05 | ms/batch 314.48 | loss  4.20 | ppl    66.811
| epoch 234 step   101800 |    212 batches | lr 2.89e-05 | ms/batch 314.23 | loss  4.20 | ppl    66.981
| epoch 234 step   101850 |    262 batches | lr 2.88e-05 | ms/batch 314.84 | loss  4.22 | ppl    67.941
| epoch 234 step   101900 |    312 batches | lr 2.87e-05 | ms/batch 314.14 | loss  4.20 | ppl    67.018
| epoch 234 step   101950 |    362 batches | lr 2.86e-05 | ms/batch 312.91 | loss  4.15 | ppl    63.603
| epoch 234 step   102000 |    412 batches | lr 2.84e-05 | ms/batch 316.55 | loss  4.15 | ppl    63.580
----------------------------------------------------------------------------------------------------
| Eval 255 at step   102000 | time: 130.92s | valid loss  4.21 | valid ppl    67.161
----------------------------------------------------------------------------------------------------
| epoch 235 step   102050 |     26 batches | lr 2.83e-05 | ms/batch 408.94 | loss  4.23 | ppl    68.449
| epoch 235 step   102100 |     76 batches | lr 2.82e-05 | ms/batch 314.66 | loss  4.17 | ppl    64.609
| epoch 235 step   102150 |    126 batches | lr 2.81e-05 | ms/batch 314.94 | loss  4.19 | ppl    66.237
| epoch 235 step   102200 |    176 batches | lr 2.8e-05 | ms/batch 316.26 | loss  4.16 | ppl    64.199
| epoch 235 step   102250 |    226 batches | lr 2.78e-05 | ms/batch 315.12 | loss  4.25 | ppl    70.155
| epoch 235 step   102300 |    276 batches | lr 2.77e-05 | ms/batch 314.77 | loss  4.23 | ppl    68.607
| epoch 235 step   102350 |    326 batches | lr 2.76e-05 | ms/batch 315.12 | loss  4.16 | ppl    64.312
| epoch 235 step   102400 |    376 batches | lr 2.75e-05 | ms/batch 315.10 | loss  4.17 | ppl    64.453
----------------------------------------------------------------------------------------------------
| Eval 256 at step   102400 | time: 130.68s | valid loss  4.20 | valid ppl    67.004
----------------------------------------------------------------------------------------------------
| epoch 235 step   102450 |    426 batches | lr 2.74e-05 | ms/batch 445.04 | loss  4.20 | ppl    66.767
| epoch 236 step   102500 |     40 batches | lr 2.72e-05 | ms/batch 309.29 | loss  4.18 | ppl    65.473
| epoch 236 step   102550 |     90 batches | lr 2.71e-05 | ms/batch 314.84 | loss  4.12 | ppl    61.706
| epoch 236 step   102600 |    140 batches | lr 2.7e-05 | ms/batch 314.96 | loss  4.18 | ppl    65.560
| epoch 236 step   102650 |    190 batches | lr 2.69e-05 | ms/batch 314.55 | loss  4.22 | ppl    68.188
| epoch 236 step   102700 |    240 batches | lr 2.68e-05 | ms/batch 314.82 | loss  4.22 | ppl    67.729
| epoch 236 step   102750 |    290 batches | lr 2.67e-05 | ms/batch 313.81 | loss  4.23 | ppl    68.712
| epoch 236 step   102800 |    340 batches | lr 2.65e-05 | ms/batch 315.56 | loss  4.09 | ppl    59.500
----------------------------------------------------------------------------------------------------
| Eval 257 at step   102800 | time: 130.53s | valid loss  4.21 | valid ppl    67.357
----------------------------------------------------------------------------------------------------
| epoch 236 step   102850 |    390 batches | lr 2.64e-05 | ms/batch 413.73 | loss  4.19 | ppl    66.214
| epoch 237 step   102900 |      4 batches | lr 2.63e-05 | ms/batch 307.89 | loss  4.24 | ppl    69.086
| epoch 237 step   102950 |     54 batches | lr 2.62e-05 | ms/batch 315.97 | loss  4.16 | ppl    64.144
| epoch 237 step   103000 |    104 batches | lr 2.61e-05 | ms/batch 313.88 | loss  4.16 | ppl    64.044
| epoch 237 step   103050 |    154 batches | lr 2.6e-05 | ms/batch 313.74 | loss  4.18 | ppl    65.241
| epoch 237 step   103100 |    204 batches | lr 2.58e-05 | ms/batch 314.77 | loss  4.17 | ppl    64.862
| epoch 237 step   103150 |    254 batches | lr 2.57e-05 | ms/batch 314.57 | loss  4.20 | ppl    66.634
| epoch 237 step   103200 |    304 batches | lr 2.56e-05 | ms/batch 313.69 | loss  4.20 | ppl    67.013
----------------------------------------------------------------------------------------------------
| Eval 258 at step   103200 | time: 130.41s | valid loss  4.20 | valid ppl    66.974
----------------------------------------------------------------------------------------------------
| epoch 237 step   103250 |    354 batches | lr 2.55e-05 | ms/batch 446.33 | loss  4.10 | ppl    60.520
| epoch 237 step   103300 |    404 batches | lr 2.54e-05 | ms/batch 313.96 | loss  4.18 | ppl    65.558
| epoch 238 step   103350 |     18 batches | lr 2.53e-05 | ms/batch 309.20 | loss  4.21 | ppl    67.359
| epoch 238 step   103400 |     68 batches | lr 2.52e-05 | ms/batch 314.87 | loss  4.10 | ppl    60.475
| epoch 238 step   103450 |    118 batches | lr 2.5e-05 | ms/batch 313.54 | loss  4.16 | ppl    64.305
| epoch 238 step   103500 |    168 batches | lr 2.49e-05 | ms/batch 314.99 | loss  4.18 | ppl    65.389
| epoch 238 step   103550 |    218 batches | lr 2.48e-05 | ms/batch 315.54 | loss  4.21 | ppl    67.436
| epoch 238 step   103600 |    268 batches | lr 2.47e-05 | ms/batch 314.20 | loss  4.20 | ppl    66.473
----------------------------------------------------------------------------------------------------
| Eval 259 at step   103600 | time: 130.48s | valid loss  4.21 | valid ppl    67.269
----------------------------------------------------------------------------------------------------
| epoch 238 step   103650 |    318 batches | lr 2.46e-05 | ms/batch 414.91 | loss  4.18 | ppl    65.655
| epoch 238 step   103700 |    368 batches | lr 2.45e-05 | ms/batch 317.16 | loss  4.17 | ppl    64.443
| epoch 238 step   103750 |    418 batches | lr 2.44e-05 | ms/batch 316.52 | loss  4.20 | ppl    66.645
| epoch 239 step   103800 |     32 batches | lr 2.43e-05 | ms/batch 308.35 | loss  4.21 | ppl    67.270
| epoch 239 step   103850 |     82 batches | lr 2.41e-05 | ms/batch 313.63 | loss  4.13 | ppl    62.210
| epoch 239 step   103900 |    132 batches | lr 2.4e-05 | ms/batch 313.45 | loss  4.17 | ppl    64.954
| epoch 239 step   103950 |    182 batches | lr 2.39e-05 | ms/batch 316.62 | loss  4.19 | ppl    66.113
| epoch 239 step   104000 |    232 batches | lr 2.38e-05 | ms/batch 314.60 | loss  4.20 | ppl    66.973
----------------------------------------------------------------------------------------------------
| Eval 260 at step   104000 | time: 130.76s | valid loss  4.21 | valid ppl    67.310
----------------------------------------------------------------------------------------------------
| epoch 239 step   104050 |    282 batches | lr 2.37e-05 | ms/batch 415.29 | loss  4.23 | ppl    69.010
| epoch 239 step   104100 |    332 batches | lr 2.36e-05 | ms/batch 317.60 | loss  4.13 | ppl    61.974
| epoch 239 step   104150 |    382 batches | lr 2.35e-05 | ms/batch 315.77 | loss  4.15 | ppl    63.590
| epoch 239 step   104200 |    432 batches | lr 2.34e-05 | ms/batch 317.18 | loss  4.19 | ppl    66.242
| epoch 240 step   104250 |     46 batches | lr 2.33e-05 | ms/batch 310.11 | loss  4.17 | ppl    64.921
| epoch 240 step   104300 |     96 batches | lr 2.32e-05 | ms/batch 315.89 | loss  4.16 | ppl    63.759
| epoch 240 step   104350 |    146 batches | lr 2.3e-05 | ms/batch 316.23 | loss  4.18 | ppl    65.683
| epoch 240 step   104400 |    196 batches | lr 2.29e-05 | ms/batch 315.88 | loss  4.21 | ppl    67.596
----------------------------------------------------------------------------------------------------
| Eval 261 at step   104400 | time: 131.19s | valid loss  4.21 | valid ppl    67.035
----------------------------------------------------------------------------------------------------
| epoch 240 step   104450 |    246 batches | lr 2.28e-05 | ms/batch 415.44 | loss  4.21 | ppl    67.228
| epoch 240 step   104500 |    296 batches | lr 2.27e-05 | ms/batch 313.66 | loss  4.25 | ppl    69.988
| epoch 240 step   104550 |    346 batches | lr 2.26e-05 | ms/batch 313.48 | loss  4.10 | ppl    60.185
| epoch 240 step   104600 |    396 batches | lr 2.25e-05 | ms/batch 314.34 | loss  4.18 | ppl    65.277
| epoch 241 step   104650 |     10 batches | lr 2.24e-05 | ms/batch 307.81 | loss  4.22 | ppl    68.366
| epoch 241 step   104700 |     60 batches | lr 2.23e-05 | ms/batch 313.42 | loss  4.15 | ppl    63.125
| epoch 241 step   104750 |    110 batches | lr 2.22e-05 | ms/batch 313.68 | loss  4.17 | ppl    64.420
| epoch 241 step   104800 |    160 batches | lr 2.21e-05 | ms/batch 314.05 | loss  4.20 | ppl    66.697
----------------------------------------------------------------------------------------------------
| Eval 262 at step   104800 | time: 130.24s | valid loss  4.20 | valid ppl    66.857
----------------------------------------------------------------------------------------------------
| epoch 241 step   104850 |    210 batches | lr 2.2e-05 | ms/batch 446.74 | loss  4.19 | ppl    66.235
| epoch 241 step   104900 |    260 batches | lr 2.19e-05 | ms/batch 315.20 | loss  4.20 | ppl    66.452
| epoch 241 step   104950 |    310 batches | lr 2.18e-05 | ms/batch 313.39 | loss  4.19 | ppl    66.018
| epoch 241 step   105000 |    360 batches | lr 2.16e-05 | ms/batch 314.48 | loss  4.13 | ppl    61.979
| epoch 241 step   105050 |    410 batches | lr 2.15e-05 | ms/batch 315.40 | loss  4.19 | ppl    65.837
| epoch 242 step   105100 |     24 batches | lr 2.14e-05 | ms/batch 308.09 | loss  4.24 | ppl    69.135
| epoch 242 step   105150 |     74 batches | lr 2.13e-05 | ms/batch 314.79 | loss  4.13 | ppl    62.426
| epoch 242 step   105200 |    124 batches | lr 2.12e-05 | ms/batch 315.32 | loss  4.19 | ppl    65.935
----------------------------------------------------------------------------------------------------
| Eval 263 at step   105200 | time: 130.51s | valid loss  4.21 | valid ppl    67.100
----------------------------------------------------------------------------------------------------
| epoch 242 step   105250 |    174 batches | lr 2.11e-05 | ms/batch 413.02 | loss  4.18 | ppl    65.287
| epoch 242 step   105300 |    224 batches | lr 2.1e-05 | ms/batch 314.22 | loss  4.22 | ppl    67.967
| epoch 242 step   105350 |    274 batches | lr 2.09e-05 | ms/batch 313.90 | loss  4.21 | ppl    67.604
| epoch 242 step   105400 |    324 batches | lr 2.08e-05 | ms/batch 314.89 | loss  4.17 | ppl    64.941
| epoch 242 step   105450 |    374 batches | lr 2.07e-05 | ms/batch 314.75 | loss  4.17 | ppl    64.486
| epoch 242 step   105500 |    424 batches | lr 2.06e-05 | ms/batch 314.71 | loss  4.19 | ppl    66.196
| epoch 243 step   105550 |     38 batches | lr 2.05e-05 | ms/batch 309.20 | loss  4.17 | ppl    64.956
| epoch 243 step   105600 |     88 batches | lr 2.04e-05 | ms/batch 315.76 | loss  4.14 | ppl    62.595
----------------------------------------------------------------------------------------------------
| Eval 264 at step   105600 | time: 130.55s | valid loss  4.21 | valid ppl    67.068
----------------------------------------------------------------------------------------------------
| epoch 243 step   105650 |    138 batches | lr 2.03e-05 | ms/batch 413.25 | loss  4.18 | ppl    65.330
| epoch 243 step   105700 |    188 batches | lr 2.02e-05 | ms/batch 313.92 | loss  4.19 | ppl    65.902
| epoch 243 step   105750 |    238 batches | lr 2.01e-05 | ms/batch 314.94 | loss  4.20 | ppl    66.419
| epoch 243 step   105800 |    288 batches | lr 2e-05 | ms/batch 315.02 | loss  4.23 | ppl    68.685
| epoch 243 step   105850 |    338 batches | lr 1.99e-05 | ms/batch 313.13 | loss  4.09 | ppl    59.899
| epoch 243 step   105900 |    388 batches | lr 1.98e-05 | ms/batch 313.15 | loss  4.19 | ppl    66.031
| epoch 244 step   105950 |      2 batches | lr 1.97e-05 | ms/batch 306.77 | loss  4.19 | ppl    65.778
| epoch 244 step   106000 |     52 batches | lr 1.96e-05 | ms/batch 313.26 | loss  4.14 | ppl    62.631
----------------------------------------------------------------------------------------------------
| Eval 265 at step   106000 | time: 130.15s | valid loss  4.21 | valid ppl    67.182
----------------------------------------------------------------------------------------------------
| epoch 244 step   106050 |    102 batches | lr 1.95e-05 | ms/batch 411.09 | loss  4.13 | ppl    62.049
| epoch 244 step   106100 |    152 batches | lr 1.94e-05 | ms/batch 314.20 | loss  4.20 | ppl    66.359
| epoch 244 step   106150 |    202 batches | lr 1.93e-05 | ms/batch 313.38 | loss  4.18 | ppl    65.471
| epoch 244 step   106200 |    252 batches | lr 1.92e-05 | ms/batch 312.76 | loss  4.22 | ppl    68.089
| epoch 244 step   106250 |    302 batches | lr 1.91e-05 | ms/batch 315.10 | loss  4.22 | ppl    68.182
| epoch 244 step   106300 |    352 batches | lr 1.9e-05 | ms/batch 312.73 | loss  4.11 | ppl    60.899
| epoch 244 step   106350 |    402 batches | lr 1.89e-05 | ms/batch 315.16 | loss  4.20 | ppl    66.463
| epoch 245 step   106400 |     16 batches | lr 1.88e-05 | ms/batch 308.92 | loss  4.24 | ppl    69.086
----------------------------------------------------------------------------------------------------
| Eval 266 at step   106400 | time: 130.17s | valid loss  4.20 | valid ppl    66.862
----------------------------------------------------------------------------------------------------
| epoch 245 step   106450 |     66 batches | lr 1.87e-05 | ms/batch 414.02 | loss  4.10 | ppl    60.598
| epoch 245 step   106500 |    116 batches | lr 1.86e-05 | ms/batch 314.26 | loss  4.18 | ppl    65.081
| epoch 245 step   106550 |    166 batches | lr 1.85e-05 | ms/batch 313.64 | loss  4.20 | ppl    66.398
| epoch 245 step   106600 |    216 batches | lr 1.84e-05 | ms/batch 315.74 | loss  4.19 | ppl    66.260
| epoch 245 step   106650 |    266 batches | lr 1.83e-05 | ms/batch 314.74 | loss  4.18 | ppl    65.581
| epoch 245 step   106700 |    316 batches | lr 1.82e-05 | ms/batch 315.28 | loss  4.20 | ppl    66.822
| epoch 245 step   106750 |    366 batches | lr 1.81e-05 | ms/batch 313.03 | loss  4.15 | ppl    63.337
| epoch 245 step   106800 |    416 batches | lr 1.8e-05 | ms/batch 315.07 | loss  4.19 | ppl    66.250
----------------------------------------------------------------------------------------------------
| Eval 267 at step   106800 | time: 130.76s | valid loss  4.20 | valid ppl    66.846
----------------------------------------------------------------------------------------------------
| epoch 246 step   106850 |     30 batches | lr 1.79e-05 | ms/batch 439.99 | loss  4.19 | ppl    65.791
| epoch 246 step   106900 |     80 batches | lr 1.78e-05 | ms/batch 315.37 | loss  4.14 | ppl    62.602
| epoch 246 step   106950 |    130 batches | lr 1.77e-05 | ms/batch 313.82 | loss  4.18 | ppl    65.463
| epoch 246 step   107000 |    180 batches | lr 1.76e-05 | ms/batch 315.66 | loss  4.20 | ppl    66.887
| epoch 246 step   107050 |    230 batches | lr 1.75e-05 | ms/batch 314.29 | loss  4.20 | ppl    66.945
| epoch 246 step   107100 |    280 batches | lr 1.74e-05 | ms/batch 314.10 | loss  4.19 | ppl    66.007
| epoch 246 step   107150 |    330 batches | lr 1.73e-05 | ms/batch 314.76 | loss  4.10 | ppl    60.437
| epoch 246 step   107200 |    380 batches | lr 1.72e-05 | ms/batch 314.57 | loss  4.16 | ppl    64.097
----------------------------------------------------------------------------------------------------
| Eval 268 at step   107200 | time: 130.56s | valid loss  4.20 | valid ppl    66.734
----------------------------------------------------------------------------------------------------
| epoch 246 step   107250 |    430 batches | lr 1.71e-05 | ms/batch 446.59 | loss  4.18 | ppl    65.366
| epoch 247 step   107300 |     44 batches | lr 1.7e-05 | ms/batch 307.36 | loss  4.18 | ppl    65.294
| epoch 247 step   107350 |     94 batches | lr 1.69e-05 | ms/batch 314.09 | loss  4.13 | ppl    62.057
| epoch 247 step   107400 |    144 batches | lr 1.68e-05 | ms/batch 313.31 | loss  4.18 | ppl    65.254
| epoch 247 step   107450 |    194 batches | lr 1.67e-05 | ms/batch 313.40 | loss  4.19 | ppl    66.020
| epoch 247 step   107500 |    244 batches | lr 1.67e-05 | ms/batch 313.20 | loss  4.21 | ppl    67.241
| epoch 247 step   107550 |    294 batches | lr 1.66e-05 | ms/batch 313.53 | loss  4.24 | ppl    69.356
| epoch 247 step   107600 |    344 batches | lr 1.65e-05 | ms/batch 312.84 | loss  4.10 | ppl    60.364
----------------------------------------------------------------------------------------------------
| Eval 269 at step   107600 | time: 130.09s | valid loss  4.21 | valid ppl    67.062
----------------------------------------------------------------------------------------------------
| epoch 247 step   107650 |    394 batches | lr 1.64e-05 | ms/batch 412.75 | loss  4.19 | ppl    65.938
| epoch 248 step   107700 |      8 batches | lr 1.63e-05 | ms/batch 307.18 | loss  4.20 | ppl    66.673
| epoch 248 step   107750 |     58 batches | lr 1.62e-05 | ms/batch 313.83 | loss  4.16 | ppl    63.814
| epoch 248 step   107800 |    108 batches | lr 1.61e-05 | ms/batch 313.39 | loss  4.17 | ppl    64.466
| epoch 248 step   107850 |    158 batches | lr 1.6e-05 | ms/batch 312.93 | loss  4.19 | ppl    65.840
| epoch 248 step   107900 |    208 batches | lr 1.59e-05 | ms/batch 314.13 | loss  4.19 | ppl    65.717
| epoch 248 step   107950 |    258 batches | lr 1.58e-05 | ms/batch 314.17 | loss  4.22 | ppl    67.824
| epoch 248 step   108000 |    308 batches | lr 1.57e-05 | ms/batch 313.28 | loss  4.21 | ppl    67.501
----------------------------------------------------------------------------------------------------
| Eval 270 at step   108000 | time: 130.09s | valid loss  4.20 | valid ppl    66.836
----------------------------------------------------------------------------------------------------
| epoch 248 step   108050 |    358 batches | lr 1.56e-05 | ms/batch 412.21 | loss  4.09 | ppl    59.997
| epoch 248 step   108100 |    408 batches | lr 1.55e-05 | ms/batch 313.38 | loss  4.18 | ppl    65.486
| epoch 249 step   108150 |     22 batches | lr 1.55e-05 | ms/batch 308.74 | loss  4.21 | ppl    67.644
| epoch 249 step   108200 |     72 batches | lr 1.54e-05 | ms/batch 313.98 | loss  4.12 | ppl    61.314
| epoch 249 step   108250 |    122 batches | lr 1.53e-05 | ms/batch 313.31 | loss  4.16 | ppl    63.924
| epoch 249 step   108300 |    172 batches | lr 1.52e-05 | ms/batch 314.17 | loss  4.17 | ppl    64.418
| epoch 249 step   108350 |    222 batches | lr 1.51e-05 | ms/batch 313.16 | loss  4.25 | ppl    69.810
| epoch 249 step   108400 |    272 batches | lr 1.5e-05 | ms/batch 313.35 | loss  4.20 | ppl    66.364
----------------------------------------------------------------------------------------------------
| Eval 271 at step   108400 | time: 130.10s | valid loss  4.20 | valid ppl    66.999
----------------------------------------------------------------------------------------------------
| epoch 249 step   108450 |    322 batches | lr 1.49e-05 | ms/batch 413.08 | loss  4.16 | ppl    63.949
| epoch 249 step   108500 |    372 batches | lr 1.48e-05 | ms/batch 315.77 | loss  4.14 | ppl    62.761
| epoch 249 step   108550 |    422 batches | lr 1.47e-05 | ms/batch 315.55 | loss  4.14 | ppl    62.869
| epoch 250 step   108600 |     36 batches | lr 1.47e-05 | ms/batch 311.10 | loss  4.23 | ppl    68.409
| epoch 250 step   108650 |     86 batches | lr 1.46e-05 | ms/batch 314.73 | loss  4.14 | ppl    62.573
| epoch 250 step   108700 |    136 batches | lr 1.45e-05 | ms/batch 313.94 | loss  4.15 | ppl    63.325
| epoch 250 step   108750 |    186 batches | lr 1.44e-05 | ms/batch 315.71 | loss  4.21 | ppl    67.031
| epoch 250 step   108800 |    236 batches | lr 1.43e-05 | ms/batch 313.12 | loss  4.19 | ppl    66.224
----------------------------------------------------------------------------------------------------
| Eval 272 at step   108800 | time: 130.66s | valid loss  4.20 | valid ppl    66.933
----------------------------------------------------------------------------------------------------
| epoch 250 step   108850 |    286 batches | lr 1.42e-05 | ms/batch 429.47 | loss  4.25 | ppl    70.341
| epoch 250 step   108900 |    336 batches | lr 1.41e-05 | ms/batch 329.99 | loss  4.07 | ppl    58.752
| epoch 250 step   108950 |    386 batches | lr 1.4e-05 | ms/batch 329.78 | loss  4.16 | ppl    64.209
| epoch 250 step   109000 |    436 batches | lr 1.4e-05 | ms/batch 326.93 | loss  4.17 | ppl    64.766
| epoch 251 step   109050 |     50 batches | lr 1.39e-05 | ms/batch 328.99 | loss  4.15 | ppl    63.315
| epoch 251 step   109100 |    100 batches | lr 1.38e-05 | ms/batch 329.10 | loss  4.15 | ppl    63.588
| epoch 251 step   109150 |    150 batches | lr 1.37e-05 | ms/batch 314.39 | loss  4.20 | ppl    66.585
| epoch 251 step   109200 |    200 batches | lr 1.36e-05 | ms/batch 315.29 | loss  4.19 | ppl    65.866
----------------------------------------------------------------------------------------------------
| Eval 273 at step   109200 | time: 135.20s | valid loss  4.20 | valid ppl    66.741
----------------------------------------------------------------------------------------------------
| epoch 251 step   109250 |    250 batches | lr 1.35e-05 | ms/batch 412.99 | loss  4.20 | ppl    66.671
| epoch 251 step   109300 |    300 batches | lr 1.34e-05 | ms/batch 314.86 | loss  4.24 | ppl    69.286
| epoch 251 step   109350 |    350 batches | lr 1.34e-05 | ms/batch 314.65 | loss  4.09 | ppl    59.960
| epoch 251 step   109400 |    400 batches | lr 1.33e-05 | ms/batch 315.34 | loss  4.16 | ppl    63.912
| epoch 252 step   109450 |     14 batches | lr 1.32e-05 | ms/batch 308.01 | loss  4.19 | ppl    66.198
| epoch 252 step   109500 |     64 batches | lr 1.31e-05 | ms/batch 312.77 | loss  4.14 | ppl    62.715
| epoch 252 step   109550 |    114 batches | lr 1.3e-05 | ms/batch 315.45 | loss  4.17 | ppl    64.781
| epoch 252 step   109600 |    164 batches | lr 1.29e-05 | ms/batch 316.63 | loss  4.19 | ppl    66.002
----------------------------------------------------------------------------------------------------
| Eval 274 at step   109600 | time: 130.58s | valid loss  4.20 | valid ppl    66.658
----------------------------------------------------------------------------------------------------
| epoch 252 step   109650 |    214 batches | lr 1.29e-05 | ms/batch 448.34 | loss  4.17 | ppl    64.400
| epoch 252 step   109700 |    264 batches | lr 1.28e-05 | ms/batch 316.76 | loss  4.21 | ppl    67.296
| epoch 252 step   109750 |    314 batches | lr 1.27e-05 | ms/batch 315.95 | loss  4.17 | ppl    64.446
| epoch 252 step   109800 |    364 batches | lr 1.26e-05 | ms/batch 317.01 | loss  4.12 | ppl    61.718
| epoch 252 step   109850 |    414 batches | lr 1.25e-05 | ms/batch 318.74 | loss  4.17 | ppl    64.458
| epoch 253 step   109900 |     28 batches | lr 1.25e-05 | ms/batch 309.75 | loss  4.21 | ppl    67.549
| epoch 253 step   109950 |     78 batches | lr 1.24e-05 | ms/batch 316.55 | loss  4.12 | ppl    61.841
| epoch 253 step   110000 |    128 batches | lr 1.23e-05 | ms/batch 316.05 | loss  4.17 | ppl    64.566
----------------------------------------------------------------------------------------------------
| Eval 275 at step   110000 | time: 131.25s | valid loss  4.20 | valid ppl    66.735
----------------------------------------------------------------------------------------------------
| epoch 253 step   110050 |    178 batches | lr 1.22e-05 | ms/batch 411.00 | loss  4.16 | ppl    64.342
| epoch 253 step   110100 |    228 batches | lr 1.21e-05 | ms/batch 314.09 | loss  4.19 | ppl    66.028
| epoch 253 step   110150 |    278 batches | lr 1.2e-05 | ms/batch 314.00 | loss  4.21 | ppl    67.657
| epoch 253 step   110200 |    328 batches | lr 1.2e-05 | ms/batch 316.39 | loss  4.14 | ppl    62.673
| epoch 253 step   110250 |    378 batches | lr 1.19e-05 | ms/batch 315.35 | loss  4.16 | ppl    63.942
| epoch 253 step   110300 |    428 batches | lr 1.18e-05 | ms/batch 315.23 | loss  4.19 | ppl    66.297
| epoch 254 step   110350 |     42 batches | lr 1.17e-05 | ms/batch 310.96 | loss  4.18 | ppl    65.317
| epoch 254 step   110400 |     92 batches | lr 1.16e-05 | ms/batch 315.01 | loss  4.13 | ppl    62.205
----------------------------------------------------------------------------------------------------
| Eval 276 at step   110400 | time: 130.65s | valid loss  4.20 | valid ppl    66.937
----------------------------------------------------------------------------------------------------
| epoch 254 step   110450 |    142 batches | lr 1.16e-05 | ms/batch 414.20 | loss  4.17 | ppl    64.895
| epoch 254 step   110500 |    192 batches | lr 1.15e-05 | ms/batch 316.70 | loss  4.19 | ppl    65.825
| epoch 254 step   110550 |    242 batches | lr 1.14e-05 | ms/batch 316.73 | loss  4.18 | ppl    65.233
| epoch 254 step   110600 |    292 batches | lr 1.13e-05 | ms/batch 316.21 | loss  4.23 | ppl    68.755
| epoch 254 step   110650 |    342 batches | lr 1.13e-05 | ms/batch 315.04 | loss  4.08 | ppl    59.115
| epoch 254 step   110700 |    392 batches | lr 1.12e-05 | ms/batch 315.94 | loss  4.20 | ppl    66.738
| epoch 255 step   110750 |      6 batches | lr 1.11e-05 | ms/batch 309.87 | loss  4.22 | ppl    67.983
| epoch 255 step   110800 |     56 batches | lr 1.1e-05 | ms/batch 316.05 | loss  4.15 | ppl    63.357
----------------------------------------------------------------------------------------------------
| Eval 277 at step   110800 | time: 131.06s | valid loss  4.20 | valid ppl    66.801
----------------------------------------------------------------------------------------------------
| epoch 255 step   110850 |    106 batches | lr 1.1e-05 | ms/batch 413.10 | loss  4.16 | ppl    64.350
| epoch 255 step   110900 |    156 batches | lr 1.09e-05 | ms/batch 314.85 | loss  4.17 | ppl    64.738
| epoch 255 step   110950 |    206 batches | lr 1.08e-05 | ms/batch 314.05 | loss  4.16 | ppl    64.202
| epoch 255 step   111000 |    256 batches | lr 1.07e-05 | ms/batch 313.18 | loss  4.20 | ppl    66.598
| epoch 255 step   111050 |    306 batches | lr 1.06e-05 | ms/batch 314.65 | loss  4.21 | ppl    67.233
| epoch 255 step   111100 |    356 batches | lr 1.06e-05 | ms/batch 321.99 | loss  4.11 | ppl    60.837
| epoch 255 step   111150 |    406 batches | lr 1.05e-05 | ms/batch 314.89 | loss  4.15 | ppl    63.239
| epoch 256 step   111200 |     20 batches | lr 1.04e-05 | ms/batch 308.28 | loss  4.22 | ppl    68.020
----------------------------------------------------------------------------------------------------
| Eval 278 at step   111200 | time: 130.72s | valid loss  4.20 | valid ppl    66.710
----------------------------------------------------------------------------------------------------
| epoch 256 step   111250 |     70 batches | lr 1.03e-05 | ms/batch 411.83 | loss  4.12 | ppl    61.607
| epoch 256 step   111300 |    120 batches | lr 1.03e-05 | ms/batch 314.03 | loss  4.18 | ppl    65.647
| epoch 256 step   111350 |    170 batches | lr 1.02e-05 | ms/batch 313.63 | loss  4.18 | ppl    65.070
| epoch 256 step   111400 |    220 batches | lr 1.01e-05 | ms/batch 313.07 | loss  4.19 | ppl    65.994
| epoch 256 step   111450 |    270 batches | lr 1e-05 | ms/batch 313.86 | loss  4.18 | ppl    65.356
| epoch 256 step   111500 |    320 batches | lr 9.98e-06 | ms/batch 313.65 | loss  4.15 | ppl    63.427
| epoch 256 step   111550 |    370 batches | lr 9.9e-06 | ms/batch 312.86 | loss  4.13 | ppl    61.952
| epoch 256 step   111600 |    420 batches | lr 9.83e-06 | ms/batch 315.15 | loss  4.18 | ppl    65.601
----------------------------------------------------------------------------------------------------
| Eval 279 at step   111600 | time: 130.40s | valid loss  4.20 | valid ppl    66.688
----------------------------------------------------------------------------------------------------
| epoch 257 step   111650 |     34 batches | lr 9.76e-06 | ms/batch 408.14 | loss  4.20 | ppl    66.973
| epoch 257 step   111700 |     84 batches | lr 9.69e-06 | ms/batch 313.62 | loss  4.13 | ppl    62.175
| epoch 257 step   111750 |    134 batches | lr 9.61e-06 | ms/batch 314.09 | loss  4.15 | ppl    63.715
| epoch 257 step   111800 |    184 batches | lr 9.54e-06 | ms/batch 313.18 | loss  4.18 | ppl    65.269
| epoch 257 step   111850 |    234 batches | lr 9.47e-06 | ms/batch 314.84 | loss  4.20 | ppl    66.424
| epoch 257 step   111900 |    284 batches | lr 9.4e-06 | ms/batch 313.75 | loss  4.21 | ppl    67.612
| epoch 257 step   111950 |    334 batches | lr 9.33e-06 | ms/batch 315.63 | loss  4.10 | ppl    60.131
| epoch 257 step   112000 |    384 batches | lr 9.26e-06 | ms/batch 316.18 | loss  4.15 | ppl    63.531
----------------------------------------------------------------------------------------------------
| Eval 280 at step   112000 | time: 130.49s | valid loss  4.20 | valid ppl    66.723
----------------------------------------------------------------------------------------------------
| epoch 257 step   112050 |    434 batches | lr 9.19e-06 | ms/batch 416.48 | loss  4.21 | ppl    67.278
| epoch 258 step   112100 |     48 batches | lr 9.12e-06 | ms/batch 310.43 | loss  4.13 | ppl    62.324
| epoch 258 step   112150 |     98 batches | lr 9.05e-06 | ms/batch 316.97 | loss  4.12 | ppl    61.819
| epoch 258 step   112200 |    148 batches | lr 8.98e-06 | ms/batch 315.46 | loss  4.16 | ppl    64.280
| epoch 258 step   112250 |    198 batches | lr 8.91e-06 | ms/batch 313.68 | loss  4.18 | ppl    65.494
| epoch 258 step   112300 |    248 batches | lr 8.84e-06 | ms/batch 314.29 | loss  4.17 | ppl    64.860
| epoch 258 step   112350 |    298 batches | lr 8.77e-06 | ms/batch 314.42 | loss  4.22 | ppl    68.180
| epoch 258 step   112400 |    348 batches | lr 8.7e-06 | ms/batch 314.32 | loss  4.05 | ppl    57.501
----------------------------------------------------------------------------------------------------
| Eval 281 at step   112400 | time: 130.79s | valid loss  4.20 | valid ppl    66.750
----------------------------------------------------------------------------------------------------
| epoch 258 step   112450 |    398 batches | lr 8.63e-06 | ms/batch 413.03 | loss  4.17 | ppl    64.834
| epoch 259 step   112500 |     12 batches | lr 8.57e-06 | ms/batch 307.05 | loss  4.21 | ppl    67.133
| epoch 259 step   112550 |     62 batches | lr 8.5e-06 | ms/batch 314.56 | loss  4.11 | ppl    61.004
| epoch 259 step   112600 |    112 batches | lr 8.43e-06 | ms/batch 314.28 | loss  4.18 | ppl    65.397
| epoch 259 step   112650 |    162 batches | lr 8.36e-06 | ms/batch 313.29 | loss  4.17 | ppl    64.781
| epoch 259 step   112700 |    212 batches | lr 8.3e-06 | ms/batch 313.86 | loss  4.17 | ppl    65.040
| epoch 259 step   112750 |    262 batches | lr 8.23e-06 | ms/batch 314.44 | loss  4.20 | ppl    66.681
| epoch 259 step   112800 |    312 batches | lr 8.16e-06 | ms/batch 314.17 | loss  4.16 | ppl    63.782
----------------------------------------------------------------------------------------------------
| Eval 282 at step   112800 | time: 130.23s | valid loss  4.20 | valid ppl    66.758
----------------------------------------------------------------------------------------------------
| epoch 259 step   112850 |    362 batches | lr 8.1e-06 | ms/batch 412.88 | loss  4.13 | ppl    62.168
| epoch 259 step   112900 |    412 batches | lr 8.03e-06 | ms/batch 314.16 | loss  4.17 | ppl    64.433
| epoch 260 step   112950 |     26 batches | lr 7.96e-06 | ms/batch 309.07 | loss  4.21 | ppl    67.639
| epoch 260 step   113000 |     76 batches | lr 7.9e-06 | ms/batch 314.41 | loss  4.11 | ppl    60.678
| epoch 260 step   113050 |    126 batches | lr 7.83e-06 | ms/batch 313.37 | loss  4.18 | ppl    65.284
| epoch 260 step   113100 |    176 batches | lr 7.77e-06 | ms/batch 314.55 | loss  4.19 | ppl    66.147
| epoch 260 step   113150 |    226 batches | lr 7.7e-06 | ms/batch 314.28 | loss  4.22 | ppl    68.254
| epoch 260 step   113200 |    276 batches | lr 7.64e-06 | ms/batch 313.10 | loss  4.20 | ppl    66.559
----------------------------------------------------------------------------------------------------
| Eval 283 at step   113200 | time: 130.28s | valid loss  4.20 | valid ppl    66.762
----------------------------------------------------------------------------------------------------
| epoch 260 step   113250 |    326 batches | lr 7.58e-06 | ms/batch 412.00 | loss  4.13 | ppl    62.460
| epoch 260 step   113300 |    376 batches | lr 7.51e-06 | ms/batch 314.39 | loss  4.16 | ppl    64.355
| epoch 260 step   113350 |    426 batches | lr 7.45e-06 | ms/batch 314.09 | loss  4.17 | ppl    64.405
| epoch 261 step   113400 |     40 batches | lr 7.38e-06 | ms/batch 308.33 | loss  4.16 | ppl    63.917
| epoch 261 step   113450 |     90 batches | lr 7.32e-06 | ms/batch 315.62 | loss  4.13 | ppl    62.407
| epoch 261 step   113500 |    140 batches | lr 7.26e-06 | ms/batch 314.11 | loss  4.18 | ppl    65.688
| epoch 261 step   113550 |    190 batches | lr 7.2e-06 | ms/batch 313.20 | loss  4.18 | ppl    65.496
| epoch 261 step   113600 |    240 batches | lr 7.13e-06 | ms/batch 316.12 | loss  4.19 | ppl    66.328
----------------------------------------------------------------------------------------------------
| Eval 284 at step   113600 | time: 130.42s | valid loss  4.20 | valid ppl    66.648
----------------------------------------------------------------------------------------------------
| epoch 261 step   113650 |    290 batches | lr 7.07e-06 | ms/batch 446.66 | loss  4.25 | ppl    69.764
| epoch 261 step   113700 |    340 batches | lr 7.01e-06 | ms/batch 314.39 | loss  4.07 | ppl    58.546
| epoch 261 step   113750 |    390 batches | lr 6.95e-06 | ms/batch 313.66 | loss  4.17 | ppl    64.966
| epoch 262 step   113800 |      4 batches | lr 6.89e-06 | ms/batch 308.65 | loss  4.23 | ppl    68.940
| epoch 262 step   113850 |     54 batches | lr 6.83e-06 | ms/batch 314.51 | loss  4.12 | ppl    61.413
| epoch 262 step   113900 |    104 batches | lr 6.77e-06 | ms/batch 314.97 | loss  4.14 | ppl    62.585
| epoch 262 step   113950 |    154 batches | lr 6.71e-06 | ms/batch 314.93 | loss  4.17 | ppl    64.999
| epoch 262 step   114000 |    204 batches | lr 6.65e-06 | ms/batch 315.27 | loss  4.16 | ppl    64.104
----------------------------------------------------------------------------------------------------
| Eval 285 at step   114000 | time: 130.50s | valid loss  4.20 | valid ppl    66.578
----------------------------------------------------------------------------------------------------
| epoch 262 step   114050 |    254 batches | lr 6.59e-06 | ms/batch 446.28 | loss  4.21 | ppl    67.454
| epoch 262 step   114100 |    304 batches | lr 6.53e-06 | ms/batch 315.20 | loss  4.19 | ppl    66.302
| epoch 262 step   114150 |    354 batches | lr 6.47e-06 | ms/batch 313.04 | loss  4.10 | ppl    60.503
| epoch 262 step   114200 |    404 batches | lr 6.41e-06 | ms/batch 315.11 | loss  4.17 | ppl    64.987
| epoch 263 step   114250 |     18 batches | lr 6.35e-06 | ms/batch 308.79 | loss  4.23 | ppl    68.393
| epoch 263 step   114300 |     68 batches | lr 6.29e-06 | ms/batch 314.10 | loss  4.12 | ppl    61.706
| epoch 263 step   114350 |    118 batches | lr 6.23e-06 | ms/batch 314.47 | loss  4.16 | ppl    64.365
| epoch 263 step   114400 |    168 batches | lr 6.17e-06 | ms/batch 314.61 | loss  4.15 | ppl    63.570
----------------------------------------------------------------------------------------------------
| Eval 286 at step   114400 | time: 130.41s | valid loss  4.20 | valid ppl    66.545
----------------------------------------------------------------------------------------------------
| epoch 263 step   114450 |    218 batches | lr 6.12e-06 | ms/batch 451.76 | loss  4.21 | ppl    67.112
| epoch 263 step   114500 |    268 batches | lr 6.06e-06 | ms/batch 316.59 | loss  4.21 | ppl    67.215
| epoch 263 step   114550 |    318 batches | lr 6e-06 | ms/batch 312.99 | loss  4.17 | ppl    64.446
| epoch 263 step   114600 |    368 batches | lr 5.94e-06 | ms/batch 314.67 | loss  4.10 | ppl    60.277
| epoch 263 step   114650 |    418 batches | lr 5.89e-06 | ms/batch 314.20 | loss  4.16 | ppl    63.999
| epoch 264 step   114700 |     32 batches | lr 5.83e-06 | ms/batch 307.12 | loss  4.19 | ppl    65.819
| epoch 264 step   114750 |     82 batches | lr 5.77e-06 | ms/batch 313.81 | loss  4.11 | ppl    61.231
| epoch 264 step   114800 |    132 batches | lr 5.72e-06 | ms/batch 313.90 | loss  4.15 | ppl    63.531
----------------------------------------------------------------------------------------------------
| Eval 287 at step   114800 | time: 130.47s | valid loss  4.20 | valid ppl    66.650
----------------------------------------------------------------------------------------------------
| epoch 264 step   114850 |    182 batches | lr 5.66e-06 | ms/batch 413.21 | loss  4.16 | ppl    64.234
| epoch 264 step   114900 |    232 batches | lr 5.61e-06 | ms/batch 314.87 | loss  4.20 | ppl    66.679
| epoch 264 step   114950 |    282 batches | lr 5.55e-06 | ms/batch 313.07 | loss  4.20 | ppl    66.450
| epoch 264 step   115000 |    332 batches | lr 5.5e-06 | ms/batch 314.91 | loss  4.12 | ppl    61.827
| epoch 264 step   115050 |    382 batches | lr 5.44e-06 | ms/batch 314.57 | loss  4.14 | ppl    62.945
| epoch 264 step   115100 |    432 batches | lr 5.39e-06 | ms/batch 313.32 | loss  4.22 | ppl    67.821
| epoch 265 step   115150 |     46 batches | lr 5.34e-06 | ms/batch 312.03 | loss  4.17 | ppl    64.534
| epoch 265 step   115200 |     96 batches | lr 5.28e-06 | ms/batch 316.39 | loss  4.12 | ppl    61.444
----------------------------------------------------------------------------------------------------
| Eval 288 at step   115200 | time: 130.63s | valid loss  4.20 | valid ppl    66.697
----------------------------------------------------------------------------------------------------
| epoch 265 step   115250 |    146 batches | lr 5.23e-06 | ms/batch 414.69 | loss  4.18 | ppl    65.305
| epoch 265 step   115300 |    196 batches | lr 5.17e-06 | ms/batch 316.63 | loss  4.19 | ppl    65.735
| epoch 265 step   115350 |    246 batches | lr 5.12e-06 | ms/batch 316.46 | loss  4.21 | ppl    67.083
| epoch 265 step   115400 |    296 batches | lr 5.07e-06 | ms/batch 317.13 | loss  4.20 | ppl    67.013
| epoch 265 step   115450 |    346 batches | lr 5.02e-06 | ms/batch 316.69 | loss  4.09 | ppl    59.953
| epoch 265 step   115500 |    396 batches | lr 4.96e-06 | ms/batch 314.02 | loss  4.15 | ppl    63.695
| epoch 266 step   115550 |     10 batches | lr 4.91e-06 | ms/batch 308.87 | loss  4.22 | ppl    68.127
| epoch 266 step   115600 |     60 batches | lr 4.86e-06 | ms/batch 314.21 | loss  4.10 | ppl    60.581
----------------------------------------------------------------------------------------------------
| Eval 289 at step   115600 | time: 130.94s | valid loss  4.20 | valid ppl    66.583
----------------------------------------------------------------------------------------------------
| epoch 266 step   115650 |    110 batches | lr 4.81e-06 | ms/batch 413.78 | loss  4.15 | ppl    63.749
| epoch 266 step   115700 |    160 batches | lr 4.76e-06 | ms/batch 314.74 | loss  4.17 | ppl    64.582
| epoch 266 step   115750 |    210 batches | lr 4.71e-06 | ms/batch 313.61 | loss  4.16 | ppl    63.819
| epoch 266 step   115800 |    260 batches | lr 4.66e-06 | ms/batch 313.39 | loss  4.20 | ppl    66.400
| epoch 266 step   115850 |    310 batches | lr 4.61e-06 | ms/batch 314.40 | loss  4.19 | ppl    65.879
| epoch 266 step   115900 |    360 batches | lr 4.56e-06 | ms/batch 314.18 | loss  4.11 | ppl    61.164
| epoch 266 step   115950 |    410 batches | lr 4.51e-06 | ms/batch 314.11 | loss  4.14 | ppl    62.822
| epoch 267 step   116000 |     24 batches | lr 4.46e-06 | ms/batch 308.46 | loss  4.22 | ppl    68.209
----------------------------------------------------------------------------------------------------
| Eval 290 at step   116000 | time: 130.32s | valid loss  4.20 | valid ppl    66.512
----------------------------------------------------------------------------------------------------
| epoch 267 step   116050 |     74 batches | lr 4.41e-06 | ms/batch 443.81 | loss  4.12 | ppl    61.550
| epoch 267 step   116100 |    124 batches | lr 4.36e-06 | ms/batch 313.44 | loss  4.17 | ppl    64.483
| epoch 267 step   116150 |    174 batches | lr 4.31e-06 | ms/batch 313.70 | loss  4.15 | ppl    63.204
| epoch 267 step   116200 |    224 batches | lr 4.26e-06 | ms/batch 312.71 | loss  4.17 | ppl    64.817
| epoch 267 step   116250 |    274 batches | lr 4.21e-06 | ms/batch 313.88 | loss  4.22 | ppl    68.185
| epoch 267 step   116300 |    324 batches | lr 4.17e-06 | ms/batch 313.31 | loss  4.12 | ppl    61.793
| epoch 267 step   116350 |    374 batches | lr 4.12e-06 | ms/batch 313.02 | loss  4.14 | ppl    62.737
| epoch 267 step   116400 |    424 batches | lr 4.07e-06 | ms/batch 313.54 | loss  4.15 | ppl    63.712
----------------------------------------------------------------------------------------------------
| Eval 291 at step   116400 | time: 130.25s | valid loss  4.20 | valid ppl    66.510
----------------------------------------------------------------------------------------------------
| epoch 268 step   116450 |     38 batches | lr 4.02e-06 | ms/batch 452.58 | loss  4.17 | ppl    64.789
| epoch 268 step   116500 |     88 batches | lr 3.98e-06 | ms/batch 314.87 | loss  4.11 | ppl    61.052
| epoch 268 step   116550 |    138 batches | lr 3.93e-06 | ms/batch 314.19 | loss  4.17 | ppl    64.718
| epoch 268 step   116600 |    188 batches | lr 3.89e-06 | ms/batch 313.32 | loss  4.16 | ppl    63.979
| epoch 268 step   116650 |    238 batches | lr 3.84e-06 | ms/batch 313.96 | loss  4.22 | ppl    67.978
| epoch 268 step   116700 |    288 batches | lr 3.79e-06 | ms/batch 313.91 | loss  4.21 | ppl    67.549
| epoch 268 step   116750 |    338 batches | lr 3.75e-06 | ms/batch 317.84 | loss  4.08 | ppl    58.906
| epoch 268 step   116800 |    388 batches | lr 3.7e-06 | ms/batch 315.37 | loss  4.16 | ppl    63.937
----------------------------------------------------------------------------------------------------
| Eval 292 at step   116800 | time: 131.08s | valid loss  4.20 | valid ppl    66.522
----------------------------------------------------------------------------------------------------
| epoch 269 step   116850 |      2 batches | lr 3.66e-06 | ms/batch 407.86 | loss  4.19 | ppl    65.964
| epoch 269 step   116900 |     52 batches | lr 3.61e-06 | ms/batch 320.12 | loss  4.11 | ppl    61.114
| epoch 269 step   116950 |    102 batches | lr 3.57e-06 | ms/batch 325.92 | loss  4.15 | ppl    63.365
| epoch 269 step   117000 |    152 batches | lr 3.53e-06 | ms/batch 315.27 | loss  4.18 | ppl    65.322
| epoch 269 step   117050 |    202 batches | lr 3.48e-06 | ms/batch 331.35 | loss  4.21 | ppl    67.349
| epoch 269 step   117100 |    252 batches | lr 3.44e-06 | ms/batch 331.36 | loss  4.19 | ppl    66.077
| epoch 269 step   117150 |    302 batches | lr 3.39e-06 | ms/batch 330.21 | loss  4.21 | ppl    67.293
| epoch 269 step   117200 |    352 batches | lr 3.35e-06 | ms/batch 317.09 | loss  4.09 | ppl    59.831
----------------------------------------------------------------------------------------------------
| Eval 293 at step   117200 | time: 133.96s | valid loss  4.20 | valid ppl    66.599
----------------------------------------------------------------------------------------------------
| epoch 269 step   117250 |    402 batches | lr 3.31e-06 | ms/batch 413.98 | loss  4.15 | ppl    63.323
| epoch 270 step   117300 |     16 batches | lr 3.27e-06 | ms/batch 309.26 | loss  4.21 | ppl    67.222
| epoch 270 step   117350 |     66 batches | lr 3.22e-06 | ms/batch 314.70 | loss  4.10 | ppl    60.215
| epoch 270 step   117400 |    116 batches | lr 3.18e-06 | ms/batch 315.17 | loss  4.14 | ppl    63.022
| epoch 270 step   117450 |    166 batches | lr 3.14e-06 | ms/batch 315.89 | loss  4.19 | ppl    65.861
| epoch 270 step   117500 |    216 batches | lr 3.1e-06 | ms/batch 316.27 | loss  4.18 | ppl    65.432
| epoch 270 step   117550 |    266 batches | lr 3.06e-06 | ms/batch 315.80 | loss  4.19 | ppl    66.105
| epoch 270 step   117600 |    316 batches | lr 3.02e-06 | ms/batch 317.25 | loss  4.18 | ppl    65.060
----------------------------------------------------------------------------------------------------
| Eval 294 at step   117600 | time: 130.92s | valid loss  4.20 | valid ppl    66.516
----------------------------------------------------------------------------------------------------
| epoch 270 step   117650 |    366 batches | lr 2.98e-06 | ms/batch 415.64 | loss  4.13 | ppl    62.088
| epoch 270 step   117700 |    416 batches | lr 2.94e-06 | ms/batch 316.90 | loss  4.14 | ppl    62.982
| epoch 271 step   117750 |     30 batches | lr 2.9e-06 | ms/batch 310.27 | loss  4.18 | ppl    65.501
| epoch 271 step   117800 |     80 batches | lr 2.86e-06 | ms/batch 316.00 | loss  4.10 | ppl    60.629
| epoch 271 step   117850 |    130 batches | lr 2.82e-06 | ms/batch 315.45 | loss  4.18 | ppl    65.529
| epoch 271 step   117900 |    180 batches | lr 2.78e-06 | ms/batch 315.39 | loss  4.15 | ppl    63.600
| epoch 271 step   117950 |    230 batches | lr 2.74e-06 | ms/batch 314.27 | loss  4.19 | ppl    66.139
| epoch 271 step   118000 |    280 batches | lr 2.7e-06 | ms/batch 314.83 | loss  4.20 | ppl    66.642
----------------------------------------------------------------------------------------------------
| Eval 295 at step   118000 | time: 130.91s | valid loss  4.20 | valid ppl    66.504
----------------------------------------------------------------------------------------------------
| epoch 271 step   118050 |    330 batches | lr 2.66e-06 | ms/batch 446.00 | loss  4.13 | ppl    62.129
| epoch 271 step   118100 |    380 batches | lr 2.62e-06 | ms/batch 314.83 | loss  4.16 | ppl    64.102
| epoch 271 step   118150 |    430 batches | lr 2.59e-06 | ms/batch 314.24 | loss  4.18 | ppl    65.190
| epoch 272 step   118200 |     44 batches | lr 2.55e-06 | ms/batch 308.24 | loss  4.15 | ppl    63.518
| epoch 272 step   118250 |     94 batches | lr 2.51e-06 | ms/batch 313.99 | loss  4.10 | ppl    60.140
| epoch 272 step   118300 |    144 batches | lr 2.48e-06 | ms/batch 324.82 | loss  4.17 | ppl    64.408
| epoch 272 step   118350 |    194 batches | lr 2.44e-06 | ms/batch 325.00 | loss  4.18 | ppl    65.599
| epoch 272 step   118400 |    244 batches | lr 2.4e-06 | ms/batch 318.95 | loss  4.19 | ppl    66.069
----------------------------------------------------------------------------------------------------
| Eval 296 at step   118400 | time: 131.68s | valid loss  4.20 | valid ppl    66.518
----------------------------------------------------------------------------------------------------
| epoch 272 step   118450 |    294 batches | lr 2.37e-06 | ms/batch 411.54 | loss  4.23 | ppl    68.930
| epoch 272 step   118500 |    344 batches | lr 2.33e-06 | ms/batch 313.04 | loss  4.09 | ppl    59.537
| epoch 272 step   118550 |    394 batches | lr 2.29e-06 | ms/batch 313.50 | loss  4.17 | ppl    64.870
| epoch 273 step   118600 |      8 batches | lr 2.26e-06 | ms/batch 307.66 | loss  4.23 | ppl    68.680
| epoch 273 step   118650 |     58 batches | lr 2.22e-06 | ms/batch 313.49 | loss  4.10 | ppl    60.397
| epoch 273 step   118700 |    108 batches | lr 2.19e-06 | ms/batch 313.19 | loss  4.12 | ppl    61.523
| epoch 273 step   118750 |    158 batches | lr 2.15e-06 | ms/batch 314.60 | loss  4.20 | ppl    66.611
| epoch 273 step   118800 |    208 batches | lr 2.12e-06 | ms/batch 313.42 | loss  4.15 | ppl    63.578
----------------------------------------------------------------------------------------------------
| Eval 297 at step   118800 | time: 130.06s | valid loss  4.20 | valid ppl    66.492
----------------------------------------------------------------------------------------------------
| epoch 273 step   118850 |    258 batches | lr 2.09e-06 | ms/batch 447.49 | loss  4.23 | ppl    68.844
| epoch 273 step   118900 |    308 batches | lr 2.05e-06 | ms/batch 314.48 | loss  4.19 | ppl    65.938
| epoch 273 step   118950 |    358 batches | lr 2.02e-06 | ms/batch 315.86 | loss  4.10 | ppl    60.614
| epoch 273 step   119000 |    408 batches | lr 1.99e-06 | ms/batch 314.59 | loss  4.16 | ppl    64.119
| epoch 274 step   119050 |     22 batches | lr 1.95e-06 | ms/batch 308.18 | loss  4.20 | ppl    66.733
| epoch 274 step   119100 |     72 batches | lr 1.92e-06 | ms/batch 314.32 | loss  4.13 | ppl    62.008
| epoch 274 step   119150 |    122 batches | lr 1.89e-06 | ms/batch 314.81 | loss  4.15 | ppl    63.690
| epoch 274 step   119200 |    172 batches | lr 1.86e-06 | ms/batch 314.77 | loss  4.15 | ppl    63.551
----------------------------------------------------------------------------------------------------
| Eval 298 at step   119200 | time: 130.60s | valid loss  4.20 | valid ppl    66.514
----------------------------------------------------------------------------------------------------
| epoch 274 step   119250 |    222 batches | lr 1.82e-06 | ms/batch 414.84 | loss  4.17 | ppl    64.943
| epoch 274 step   119300 |    272 batches | lr 1.79e-06 | ms/batch 316.47 | loss  4.17 | ppl    64.599
| epoch 274 step   119350 |    322 batches | lr 1.76e-06 | ms/batch 315.79 | loss  4.17 | ppl    64.506
| epoch 274 step   119400 |    372 batches | lr 1.73e-06 | ms/batch 313.43 | loss  4.14 | ppl    62.776
| epoch 274 step   119450 |    422 batches | lr 1.7e-06 | ms/batch 313.45 | loss  4.16 | ppl    63.887
| epoch 275 step   119500 |     36 batches | lr 1.67e-06 | ms/batch 308.11 | loss  4.19 | ppl    66.292
| epoch 275 step   119550 |     86 batches | lr 1.64e-06 | ms/batch 313.84 | loss  4.12 | ppl    61.391
| epoch 275 step   119600 |    136 batches | lr 1.61e-06 | ms/batch 312.79 | loss  4.19 | ppl    65.879
----------------------------------------------------------------------------------------------------
| Eval 299 at step   119600 | time: 130.39s | valid loss  4.20 | valid ppl    66.515
----------------------------------------------------------------------------------------------------
| epoch 275 step   119650 |    186 batches | lr 1.58e-06 | ms/batch 412.37 | loss  4.15 | ppl    63.546
| epoch 275 step   119700 |    236 batches | lr 1.55e-06 | ms/batch 314.20 | loss  4.18 | ppl    65.277
| epoch 275 step   119750 |    286 batches | lr 1.52e-06 | ms/batch 312.91 | loss  4.22 | ppl    68.018
| epoch 275 step   119800 |    336 batches | lr 1.49e-06 | ms/batch 313.32 | loss  4.07 | ppl    58.795
| epoch 275 step   119850 |    386 batches | lr 1.46e-06 | ms/batch 313.21 | loss  4.17 | ppl    64.430
| epoch 275 step   119900 |    436 batches | lr 1.44e-06 | ms/batch 309.81 | loss  4.15 | ppl    63.720
| epoch 276 step   119950 |     50 batches | lr 1.41e-06 | ms/batch 314.95 | loss  4.15 | ppl    63.479
| epoch 276 step   120000 |    100 batches | lr 1.38e-06 | ms/batch 328.95 | loss  4.12 | ppl    61.819
----------------------------------------------------------------------------------------------------
| Eval 300 at step   120000 | time: 131.01s | valid loss  4.20 | valid ppl    66.545
----------------------------------------------------------------------------------------------------
| epoch 276 step   120050 |    150 batches | lr 1.35e-06 | ms/batch 412.37 | loss  4.16 | ppl    63.959
| epoch 276 step   120100 |    200 batches | lr 1.33e-06 | ms/batch 314.21 | loss  4.17 | ppl    65.017
| epoch 276 step   120150 |    250 batches | lr 1.3e-06 | ms/batch 313.26 | loss  4.21 | ppl    67.044
| epoch 276 step   120200 |    300 batches | lr 1.27e-06 | ms/batch 314.69 | loss  4.21 | ppl    67.441
| epoch 276 step   120250 |    350 batches | lr 1.25e-06 | ms/batch 315.87 | loss  4.07 | ppl    58.784
| epoch 276 step   120300 |    400 batches | lr 1.22e-06 | ms/batch 314.73 | loss  4.16 | ppl    64.214
| epoch 277 step   120350 |     14 batches | lr 1.19e-06 | ms/batch 309.13 | loss  4.21 | ppl    67.146
| epoch 277 step   120400 |     64 batches | lr 1.17e-06 | ms/batch 315.80 | loss  4.11 | ppl    60.921
----------------------------------------------------------------------------------------------------
| Eval 301 at step   120400 | time: 130.50s | valid loss  4.20 | valid ppl    66.517
----------------------------------------------------------------------------------------------------
| epoch 277 step   120450 |    114 batches | lr 1.14e-06 | ms/batch 413.72 | loss  4.16 | ppl    63.889
| epoch 277 step   120500 |    164 batches | lr 1.12e-06 | ms/batch 315.36 | loss  4.15 | ppl    63.737
| epoch 277 step   120550 |    214 batches | lr 1.09e-06 | ms/batch 313.50 | loss  4.18 | ppl    65.136
| epoch 277 step   120600 |    264 batches | lr 1.07e-06 | ms/batch 314.77 | loss  4.19 | ppl    65.781
| epoch 277 step   120650 |    314 batches | lr 1.04e-06 | ms/batch 316.20 | loss  4.17 | ppl    64.486
| epoch 277 step   120700 |    364 batches | lr 1.02e-06 | ms/batch 315.09 | loss  4.14 | ppl    62.810
| epoch 277 step   120750 |    414 batches | lr 9.97e-07 | ms/batch 316.54 | loss  4.17 | ppl    64.493
| epoch 278 step   120800 |     28 batches | lr 9.74e-07 | ms/batch 310.14 | loss  4.18 | ppl    65.478
----------------------------------------------------------------------------------------------------
| Eval 302 at step   120800 | time: 130.79s | valid loss  4.20 | valid ppl    66.519
----------------------------------------------------------------------------------------------------
| epoch 278 step   120850 |     78 batches | lr 9.51e-07 | ms/batch 415.09 | loss  4.12 | ppl    61.747
| epoch 278 step   120900 |    128 batches | lr 9.28e-07 | ms/batch 316.16 | loss  4.14 | ppl    62.565
| epoch 278 step   120950 |    178 batches | lr 9.06e-07 | ms/batch 316.13 | loss  4.18 | ppl    65.565
| epoch 278 step   121000 |    228 batches | lr 8.84e-07 | ms/batch 317.06 | loss  4.17 | ppl    64.908
| epoch 278 step   121050 |    278 batches | lr 8.62e-07 | ms/batch 316.59 | loss  4.20 | ppl    66.960
| epoch 278 step   121100 |    328 batches | lr 8.4e-07 | ms/batch 314.90 | loss  4.11 | ppl    61.116
| epoch 278 step   121150 |    378 batches | lr 8.19e-07 | ms/batch 314.18 | loss  4.15 | ppl    63.553
| epoch 278 step   121200 |    428 batches | lr 7.97e-07 | ms/batch 314.16 | loss  4.20 | ppl    66.624
----------------------------------------------------------------------------------------------------
| Eval 303 at step   121200 | time: 131.15s | valid loss  4.20 | valid ppl    66.535
----------------------------------------------------------------------------------------------------
| epoch 279 step   121250 |     42 batches | lr 7.77e-07 | ms/batch 404.37 | loss  4.16 | ppl    64.307
| epoch 279 step   121300 |     92 batches | lr 7.56e-07 | ms/batch 313.70 | loss  4.12 | ppl    61.836
| epoch 279 step   121350 |    142 batches | lr 7.36e-07 | ms/batch 313.65 | loss  4.16 | ppl    63.919
| epoch 279 step   121400 |    192 batches | lr 7.16e-07 | ms/batch 313.98 | loss  4.16 | ppl    64.352
| epoch 279 step   121450 |    242 batches | lr 6.96e-07 | ms/batch 313.50 | loss  4.18 | ppl    65.320
| epoch 279 step   121500 |    292 batches | lr 6.77e-07 | ms/batch 315.97 | loss  4.20 | ppl    66.356
| epoch 279 step   121550 |    342 batches | lr 6.57e-07 | ms/batch 315.76 | loss  4.08 | ppl    59.405
| epoch 279 step   121600 |    392 batches | lr 6.39e-07 | ms/batch 315.84 | loss  4.15 | ppl    63.258
----------------------------------------------------------------------------------------------------
| Eval 304 at step   121600 | time: 130.40s | valid loss  4.20 | valid ppl    66.511
----------------------------------------------------------------------------------------------------
| epoch 280 step   121650 |      6 batches | lr 6.2e-07 | ms/batch 408.13 | loss  4.22 | ppl    68.002
| epoch 280 step   121700 |     56 batches | lr 6.02e-07 | ms/batch 317.31 | loss  4.13 | ppl    61.902
| epoch 280 step   121750 |    106 batches | lr 5.83e-07 | ms/batch 316.65 | loss  4.15 | ppl    63.508
| epoch 280 step   121800 |    156 batches | lr 5.66e-07 | ms/batch 315.88 | loss  4.17 | ppl    64.622
| epoch 280 step   121850 |    206 batches | lr 5.48e-07 | ms/batch 315.84 | loss  4.20 | ppl    66.515
| epoch 280 step   121900 |    256 batches | lr 5.31e-07 | ms/batch 314.50 | loss  4.19 | ppl    66.258
| epoch 280 step   121950 |    306 batches | lr 5.14e-07 | ms/batch 313.80 | loss  4.18 | ppl    65.045
| epoch 280 step   122000 |    356 batches | lr 4.97e-07 | ms/batch 314.50 | loss  4.09 | ppl    59.777
----------------------------------------------------------------------------------------------------
| Eval 305 at step   122000 | time: 130.80s | valid loss  4.20 | valid ppl    66.523
----------------------------------------------------------------------------------------------------
| epoch 280 step   122050 |    406 batches | lr 4.81e-07 | ms/batch 411.74 | loss  4.17 | ppl    64.850
| epoch 281 step   122100 |     20 batches | lr 4.65e-07 | ms/batch 308.84 | loss  4.19 | ppl    66.010
| epoch 281 step   122150 |     70 batches | lr 4.49e-07 | ms/batch 314.30 | loss  4.13 | ppl    61.885
| epoch 281 step   122200 |    120 batches | lr 4.33e-07 | ms/batch 313.35 | loss  4.15 | ppl    63.633
| epoch 281 step   122250 |    170 batches | lr 4.18e-07 | ms/batch 315.41 | loss  4.15 | ppl    63.318
| epoch 281 step   122300 |    220 batches | lr 4.03e-07 | ms/batch 314.08 | loss  4.20 | ppl    66.598
| epoch 281 step   122350 |    270 batches | lr 3.88e-07 | ms/batch 312.41 | loss  4.17 | ppl    64.801
| epoch 281 step   122400 |    320 batches | lr 3.73e-07 | ms/batch 313.73 | loss  4.14 | ppl    62.690
----------------------------------------------------------------------------------------------------
| Eval 306 at step   122400 | time: 130.18s | valid loss  4.20 | valid ppl    66.519
----------------------------------------------------------------------------------------------------
| epoch 281 step   122450 |    370 batches | lr 3.59e-07 | ms/batch 412.63 | loss  4.12 | ppl    61.807
| epoch 281 step   122500 |    420 batches | lr 3.45e-07 | ms/batch 313.88 | loss  4.12 | ppl    61.856
| epoch 282 step   122550 |     34 batches | lr 3.32e-07 | ms/batch 307.42 | loss  4.20 | ppl    66.942
| epoch 282 step   122600 |     84 batches | lr 3.18e-07 | ms/batch 313.16 | loss  4.12 | ppl    61.704
| epoch 282 step   122650 |    134 batches | lr 3.05e-07 | ms/batch 313.98 | loss  4.19 | ppl    66.069
| epoch 282 step   122700 |    184 batches | lr 2.92e-07 | ms/batch 314.86 | loss  4.17 | ppl    64.698
| epoch 282 step   122750 |    234 batches | lr 2.8e-07 | ms/batch 315.24 | loss  4.21 | ppl    67.499
| epoch 282 step   122800 |    284 batches | lr 2.67e-07 | ms/batch 317.27 | loss  4.21 | ppl    67.173
----------------------------------------------------------------------------------------------------
| Eval 307 at step   122800 | time: 130.47s | valid loss  4.20 | valid ppl    66.509
----------------------------------------------------------------------------------------------------
| epoch 282 step   122850 |    334 batches | lr 2.55e-07 | ms/batch 414.87 | loss  4.11 | ppl    60.652
| epoch 282 step   122900 |    384 batches | lr 2.44e-07 | ms/batch 315.44 | loss  4.17 | ppl    64.574
| epoch 282 step   122950 |    434 batches | lr 2.32e-07 | ms/batch 315.27 | loss  4.18 | ppl    65.512
| epoch 283 step   123000 |     48 batches | lr 2.21e-07 | ms/batch 308.60 | loss  4.15 | ppl    63.749
| epoch 283 step   123050 |     98 batches | lr 2.1e-07 | ms/batch 314.54 | loss  4.12 | ppl    61.648
| epoch 283 step   123100 |    148 batches | lr 1.99e-07 | ms/batch 314.20 | loss  4.16 | ppl    64.041
| epoch 283 step   123150 |    198 batches | lr 1.89e-07 | ms/batch 314.29 | loss  4.17 | ppl    65.009
| epoch 283 step   123200 |    248 batches | lr 1.79e-07 | ms/batch 315.43 | loss  4.18 | ppl    65.358
----------------------------------------------------------------------------------------------------
| Eval 308 at step   123200 | time: 130.60s | valid loss  4.20 | valid ppl    66.506
----------------------------------------------------------------------------------------------------
| epoch 283 step   123250 |    298 batches | lr 1.69e-07 | ms/batch 412.33 | loss  4.20 | ppl    66.806
| epoch 283 step   123300 |    348 batches | lr 1.6e-07 | ms/batch 313.79 | loss  4.08 | ppl    59.189
| epoch 283 step   123350 |    398 batches | lr 1.5e-07 | ms/batch 315.79 | loss  4.15 | ppl    63.449
| epoch 284 step   123400 |     12 batches | lr 1.41e-07 | ms/batch 309.38 | loss  4.21 | ppl    67.089
| epoch 284 step   123450 |     62 batches | lr 1.33e-07 | ms/batch 314.30 | loss  4.12 | ppl    61.713
| epoch 284 step   123500 |    112 batches | lr 1.24e-07 | ms/batch 315.41 | loss  4.16 | ppl    63.969
| epoch 284 step   123550 |    162 batches | lr 1.16e-07 | ms/batch 315.17 | loss  4.15 | ppl    63.372
| epoch 284 step   123600 |    212 batches | lr 1.08e-07 | ms/batch 314.40 | loss  4.18 | ppl    65.139
----------------------------------------------------------------------------------------------------
| Eval 309 at step   123600 | time: 130.54s | valid loss  4.20 | valid ppl    66.509
----------------------------------------------------------------------------------------------------
| epoch 284 step   123650 |    262 batches | lr 1.01e-07 | ms/batch 411.77 | loss  4.18 | ppl    65.504
| epoch 284 step   123700 |    312 batches | lr 9.34e-08 | ms/batch 312.88 | loss  4.19 | ppl    65.915
| epoch 284 step   123750 |    362 batches | lr 8.64e-08 | ms/batch 313.78 | loss  4.12 | ppl    61.454
| epoch 284 step   123800 |    412 batches | lr 7.96e-08 | ms/batch 314.66 | loss  4.15 | ppl    63.394
| epoch 285 step   123850 |     26 batches | lr 7.31e-08 | ms/batch 307.93 | loss  4.22 | ppl    67.699
| epoch 285 step   123900 |     76 batches | lr 6.69e-08 | ms/batch 313.01 | loss  4.11 | ppl    61.042
| epoch 285 step   123950 |    126 batches | lr 6.09e-08 | ms/batch 313.55 | loss  4.15 | ppl    63.182
| epoch 285 step   124000 |    176 batches | lr 5.53e-08 | ms/batch 313.15 | loss  4.18 | ppl    65.506
----------------------------------------------------------------------------------------------------
| Eval 310 at step   124000 | time: 130.05s | valid loss  4.20 | valid ppl    66.510
----------------------------------------------------------------------------------------------------
| epoch 285 step   124050 |    226 batches | lr 4.99e-08 | ms/batch 412.16 | loss  4.18 | ppl    65.601
| epoch 285 step   124100 |    276 batches | lr 4.48e-08 | ms/batch 315.21 | loss  4.21 | ppl    67.099
| epoch 285 step   124150 |    326 batches | lr 3.99e-08 | ms/batch 313.78 | loss  4.13 | ppl    61.904
| epoch 285 step   124200 |    376 batches | lr 3.54e-08 | ms/batch 314.58 | loss  4.16 | ppl    63.822
| epoch 285 step   124250 |    426 batches | lr 3.11e-08 | ms/batch 314.24 | loss  4.16 | ppl    63.787
| epoch 286 step   124300 |     40 batches | lr 2.71e-08 | ms/batch 310.10 | loss  4.15 | ppl    63.588
| epoch 286 step   124350 |     90 batches | lr 2.34e-08 | ms/batch 315.64 | loss  4.12 | ppl    61.494
| epoch 286 step   124400 |    140 batches | lr 1.99e-08 | ms/batch 312.43 | loss  4.16 | ppl    63.799
----------------------------------------------------------------------------------------------------
| Eval 311 at step   124400 | time: 130.37s | valid loss  4.20 | valid ppl    66.510
----------------------------------------------------------------------------------------------------
| epoch 286 step   124450 |    190 batches | lr 1.67e-08 | ms/batch 411.40 | loss  4.21 | ppl    67.073
| epoch 286 step   124500 |    240 batches | lr 1.38e-08 | ms/batch 314.81 | loss  4.21 | ppl    67.107
| epoch 286 step   124550 |    290 batches | lr 1.12e-08 | ms/batch 314.26 | loss  4.20 | ppl    66.772
| epoch 286 step   124600 |    340 batches | lr 8.84e-09 | ms/batch 315.32 | loss  4.08 | ppl    59.171
| epoch 286 step   124650 |    390 batches | lr 6.77e-09 | ms/batch 315.65 | loss  4.17 | ppl    64.683
| epoch 287 step   124700 |      4 batches | lr 4.97e-09 | ms/batch 310.78 | loss  4.19 | ppl    66.139
| epoch 287 step   124750 |     54 batches | lr 3.45e-09 | ms/batch 316.34 | loss  4.13 | ppl    62.200
| epoch 287 step   124800 |    104 batches | lr 2.21e-09 | ms/batch 316.91 | loss  4.13 | ppl    62.011
----------------------------------------------------------------------------------------------------
| Eval 312 at step   124800 | time: 130.83s | valid loss  4.20 | valid ppl    66.510
----------------------------------------------------------------------------------------------------
| epoch 287 step   124850 |    154 batches | lr 1.24e-09 | ms/batch 415.06 | loss  4.17 | ppl    64.619
| epoch 287 step   124900 |    204 batches | lr 5.53e-10 | ms/batch 316.35 | loss  4.18 | ppl    65.198
| epoch 287 step   124950 |    254 batches | lr 1.38e-10 | ms/batch 315.00 | loss  4.22 | ppl    67.999
| epoch 287 step   125000 |    304 batches | lr 0 | ms/batch 317.13 | loss  4.23 | ppl    69.029
----------------------------------------------------------------------------------------------------
End of training
====================================================================================================
| End of training | test loss  4.15 | test ppl    63.531
====================================================================================================
