Producing dataset...
====================================================================================================
    - work_dir : TFM/20201012-163013
    - data : data/penn/
    - n_layer : 16
    - n_head : 10
    - d_head : 38
    - d_model : 380
    - d_inner : 900
    - not_tied : False
    - clamp_len : -1
    - dropoute : 0.2
    - dropouti : 0.6
    - dropouta : 0.2
    - dropoutf : 0.2
    - dropouth : 0.0
    - dropouto : 0.5
    - init : normal
    - emb_init : normal
    - init_range : 0.1
    - init_std : 0.02
    - optimizer : adam
    - lr : 0.0003
    - lr_min : 0.0001
    - emb_mult : 2
    - scheduler : cosine
    - warmup_step : 3000
    - clip : 0.25
    - alpha : 0.2
    - beta : 0.1
    - wdecay : 1.2e-06
    - std_epochs : 125
    - ema_epochs : 50
    - decay_epochs : 125
    - mu : -1
    - epoch_ema : False
    - ema_lr_mult : 0.5
    - batch_size : 10
    - bptt : 70
    - ext_len : 70
    - mem_len : 0
    - seed : 2
    - cuda : True
    - log_interval : 200
    - save : TFM/20201012-163013/model.pt
    - resume : 
    - debug : False
    - when : []
    - tied : True
    - epochs : 175
    - max_decay_step : 166000
    - total_params : 24040400
    - nonemb_params : 20240400
    - emb_params : 3800000
====================================================================================================
| epoch   1 |   200/ 1327 batches | lr 2.01e-05 | ms/batch 195.01 | loss  8.54 | ppl  5139.43 | bpt   12.327 
| epoch   1 |   400/ 1327 batches | lr 4.01e-05 | ms/batch 196.25 | loss  6.85 | ppl   941.93 | bpt    9.879 
| epoch   1 |   600/ 1327 batches | lr 6.01e-05 | ms/batch 204.15 | loss  6.65 | ppl   770.90 | bpt    9.590 
| epoch   1 |   800/ 1327 batches | lr 8.01e-05 | ms/batch 212.80 | loss  6.53 | ppl   685.53 | bpt    9.421 
| epoch   1 |  1000/ 1327 batches | lr 0.0001001 | ms/batch 207.92 | loss  6.46 | ppl   641.13 | bpt    9.324 
| epoch   1 |  1200/ 1327 batches | lr 0.0001201 | ms/batch 209.25 | loss  6.28 | ppl   535.91 | bpt    9.066 
-----------------------------------------------------------------------------------------
| end of epoch   1 | time: 328.76s | valid loss  5.91 | valid ppl   370.37 | valid bpt    8.533
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch   2 |   200/ 1327 batches | lr 0.000158 | ms/batch 209.56 | loss  6.11 | ppl   450.51 | bpt    8.815 
| epoch   2 |   400/ 1327 batches | lr 0.000178 | ms/batch 212.13 | loss  6.04 | ppl   421.60 | bpt    8.720 
| epoch   2 |   600/ 1327 batches | lr 0.000198 | ms/batch 213.52 | loss  5.96 | ppl   388.48 | bpt    8.602 
| epoch   2 |   800/ 1327 batches | lr 0.000218 | ms/batch 212.50 | loss  5.86 | ppl   351.92 | bpt    8.459 
| epoch   2 |  1000/ 1327 batches | lr 0.000238 | ms/batch 212.27 | loss  5.88 | ppl   356.40 | bpt    8.477 
| epoch   2 |  1200/ 1327 batches | lr 0.000258 | ms/batch 212.15 | loss  5.77 | ppl   319.59 | bpt    8.320 
-----------------------------------------------------------------------------------------
| end of epoch   2 | time: 336.46s | valid loss  5.46 | valid ppl   234.45 | valid bpt    7.873
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch   3 |   200/ 1327 batches | lr 0.000295 | ms/batch 212.77 | loss  5.73 | ppl   308.36 | bpt    8.268 
| epoch   3 |   400/ 1327 batches | lr 0.0002999 | ms/batch 211.02 | loss  5.72 | ppl   304.08 | bpt    8.248 
| epoch   3 |   600/ 1327 batches | lr 0.0002999 | ms/batch 212.50 | loss  5.67 | ppl   291.38 | bpt    8.187 
| epoch   3 |   800/ 1327 batches | lr 0.0002999 | ms/batch 211.41 | loss  5.61 | ppl   272.74 | bpt    8.091 
| epoch   3 |  1000/ 1327 batches | lr 0.0002999 | ms/batch 211.85 | loss  5.63 | ppl   279.88 | bpt    8.129 
| epoch   3 |  1200/ 1327 batches | lr 0.0002999 | ms/batch 212.22 | loss  5.54 | ppl   254.11 | bpt    7.989 
-----------------------------------------------------------------------------------------
| end of epoch   3 | time: 337.85s | valid loss  5.23 | valid ppl   187.39 | valid bpt    7.550
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch   4 |   200/ 1327 batches | lr 0.0002999 | ms/batch 212.84 | loss  5.52 | ppl   250.67 | bpt    7.970 
| epoch   4 |   400/ 1327 batches | lr 0.0002999 | ms/batch 210.43 | loss  5.52 | ppl   250.43 | bpt    7.968 
| epoch   4 |   600/ 1327 batches | lr 0.0002998 | ms/batch 206.07 | loss  5.49 | ppl   243.31 | bpt    7.927 
| epoch   4 |   800/ 1327 batches | lr 0.0002998 | ms/batch 213.96 | loss  5.44 | ppl   229.41 | bpt    7.842 
| epoch   4 |  1000/ 1327 batches | lr 0.0002998 | ms/batch 211.80 | loss  5.48 | ppl   239.25 | bpt    7.902 
| epoch   4 |  1200/ 1327 batches | lr 0.0002998 | ms/batch 212.75 | loss  5.40 | ppl   222.50 | bpt    7.798 
-----------------------------------------------------------------------------------------
| end of epoch   4 | time: 336.77s | valid loss  5.09 | valid ppl   162.19 | valid bpt    7.342
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch   5 |   200/ 1327 batches | lr 0.0002998 | ms/batch 212.06 | loss  5.40 | ppl   220.43 | bpt    7.784 
| epoch   5 |   400/ 1327 batches | lr 0.0002997 | ms/batch 213.42 | loss  5.39 | ppl   219.93 | bpt    7.781 
| epoch   5 |   600/ 1327 batches | lr 0.0002997 | ms/batch 210.15 | loss  5.37 | ppl   214.95 | bpt    7.748 
| epoch   5 |   800/ 1327 batches | lr 0.0002997 | ms/batch 213.00 | loss  5.34 | ppl   208.25 | bpt    7.702 
| epoch   5 |  1000/ 1327 batches | lr 0.0002997 | ms/batch 212.34 | loss  5.37 | ppl   215.46 | bpt    7.751 
| epoch   5 |  1200/ 1327 batches | lr 0.0002996 | ms/batch 210.58 | loss  5.30 | ppl   199.80 | bpt    7.642 
-----------------------------------------------------------------------------------------
| end of epoch   5 | time: 335.88s | valid loss  4.99 | valid ppl   147.20 | valid bpt    7.202
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch   6 |   200/ 1327 batches | lr 0.0002996 | ms/batch 207.80 | loss  5.29 | ppl   199.02 | bpt    7.637 
| epoch   6 |   400/ 1327 batches | lr 0.0002996 | ms/batch 208.74 | loss  5.30 | ppl   199.89 | bpt    7.643 
| epoch   6 |   600/ 1327 batches | lr 0.0002995 | ms/batch 208.91 | loss  5.29 | ppl   198.20 | bpt    7.631 
| epoch   6 |   800/ 1327 batches | lr 0.0002995 | ms/batch 211.59 | loss  5.24 | ppl   188.08 | bpt    7.555 
| epoch   6 |  1000/ 1327 batches | lr 0.0002995 | ms/batch 205.80 | loss  5.29 | ppl   198.18 | bpt    7.631 
| epoch   6 |  1200/ 1327 batches | lr 0.0002994 | ms/batch 206.33 | loss  5.22 | ppl   185.70 | bpt    7.537 
-----------------------------------------------------------------------------------------
| end of epoch   6 | time: 333.45s | valid loss  4.92 | valid ppl   136.49 | valid bpt    7.093
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch   7 |   200/ 1327 batches | lr 0.0002994 | ms/batch 212.23 | loss  5.22 | ppl   185.02 | bpt    7.532 
| epoch   7 |   400/ 1327 batches | lr 0.0002993 | ms/batch 212.91 | loss  5.23 | ppl   185.95 | bpt    7.539 
| epoch   7 |   600/ 1327 batches | lr 0.0002993 | ms/batch 207.44 | loss  5.22 | ppl   184.60 | bpt    7.528 
| epoch   7 |   800/ 1327 batches | lr 0.0002992 | ms/batch 211.65 | loss  5.19 | ppl   178.88 | bpt    7.483 
| epoch   7 |  1000/ 1327 batches | lr 0.0002992 | ms/batch 212.02 | loss  5.21 | ppl   183.51 | bpt    7.520 
| epoch   7 |  1200/ 1327 batches | lr 0.0002991 | ms/batch 213.34 | loss  5.16 | ppl   174.47 | bpt    7.447 
-----------------------------------------------------------------------------------------
| end of epoch   7 | time: 334.74s | valid loss  4.84 | valid ppl   126.09 | valid bpt    6.978
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch   8 |   200/ 1327 batches | lr 0.000299 | ms/batch 210.98 | loss  5.15 | ppl   172.02 | bpt    7.426 
| epoch   8 |   400/ 1327 batches | lr 0.000299 | ms/batch 210.41 | loss  5.14 | ppl   170.75 | bpt    7.416 
| epoch   8 |   600/ 1327 batches | lr 0.0002989 | ms/batch 209.88 | loss  5.17 | ppl   175.05 | bpt    7.452 
| epoch   8 |   800/ 1327 batches | lr 0.0002989 | ms/batch 209.07 | loss  5.12 | ppl   167.71 | bpt    7.390 
| epoch   8 |  1000/ 1327 batches | lr 0.0002988 | ms/batch 214.12 | loss  5.17 | ppl   176.31 | bpt    7.462 
| epoch   8 |  1200/ 1327 batches | lr 0.0002988 | ms/batch 210.48 | loss  5.10 | ppl   164.24 | bpt    7.360 
-----------------------------------------------------------------------------------------
| end of epoch   8 | time: 335.87s | valid loss  4.79 | valid ppl   120.81 | valid bpt    6.917
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch   9 |   200/ 1327 batches | lr 0.0002987 | ms/batch 210.79 | loss  5.10 | ppl   164.23 | bpt    7.360 
| epoch   9 |   400/ 1327 batches | lr 0.0002986 | ms/batch 212.33 | loss  5.10 | ppl   164.63 | bpt    7.363 
| epoch   9 |   600/ 1327 batches | lr 0.0002985 | ms/batch 213.21 | loss  5.11 | ppl   165.75 | bpt    7.373 
| epoch   9 |   800/ 1327 batches | lr 0.0002985 | ms/batch 211.19 | loss  5.07 | ppl   159.35 | bpt    7.316 
| epoch   9 |  1000/ 1327 batches | lr 0.0002984 | ms/batch 210.94 | loss  5.11 | ppl   166.26 | bpt    7.377 
| epoch   9 |  1200/ 1327 batches | lr 0.0002983 | ms/batch 212.56 | loss  5.05 | ppl   156.25 | bpt    7.288 
-----------------------------------------------------------------------------------------
| end of epoch   9 | time: 336.15s | valid loss  4.75 | valid ppl   115.75 | valid bpt    6.855
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  10 |   200/ 1327 batches | lr 0.0002982 | ms/batch 214.75 | loss  5.07 | ppl   158.91 | bpt    7.312 
| epoch  10 |   400/ 1327 batches | lr 0.0002981 | ms/batch 210.86 | loss  5.04 | ppl   153.75 | bpt    7.264 
| epoch  10 |   600/ 1327 batches | lr 0.0002981 | ms/batch 208.46 | loss  5.07 | ppl   158.54 | bpt    7.309 
| epoch  10 |   800/ 1327 batches | lr 0.000298 | ms/batch 212.66 | loss  5.04 | ppl   154.75 | bpt    7.274 
| epoch  10 |  1000/ 1327 batches | lr 0.0002979 | ms/batch 211.27 | loss  5.08 | ppl   161.39 | bpt    7.334 
| epoch  10 |  1200/ 1327 batches | lr 0.0002978 | ms/batch 206.41 | loss  5.02 | ppl   150.77 | bpt    7.236 
-----------------------------------------------------------------------------------------
| end of epoch  10 | time: 335.39s | valid loss  4.71 | valid ppl   111.45 | valid bpt    6.800
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  11 |   200/ 1327 batches | lr 0.0002977 | ms/batch 205.71 | loss  5.01 | ppl   150.43 | bpt    7.233 
| epoch  11 |   400/ 1327 batches | lr 0.0002976 | ms/batch 210.64 | loss  5.01 | ppl   149.87 | bpt    7.228 
| epoch  11 |   600/ 1327 batches | lr 0.0002975 | ms/batch 209.51 | loss  5.03 | ppl   152.81 | bpt    7.256 
| epoch  11 |   800/ 1327 batches | lr 0.0002974 | ms/batch 208.98 | loss  5.00 | ppl   148.35 | bpt    7.213 
| epoch  11 |  1000/ 1327 batches | lr 0.0002974 | ms/batch 209.32 | loss  5.05 | ppl   156.36 | bpt    7.289 
| epoch  11 |  1200/ 1327 batches | lr 0.0002973 | ms/batch 210.40 | loss  4.98 | ppl   145.31 | bpt    7.183 
-----------------------------------------------------------------------------------------
| end of epoch  11 | time: 333.80s | valid loss  4.69 | valid ppl   108.97 | valid bpt    6.768
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  12 |   200/ 1327 batches | lr 0.0002971 | ms/batch 209.31 | loss  4.99 | ppl   146.27 | bpt    7.192 
| epoch  12 |   400/ 1327 batches | lr 0.000297 | ms/batch 209.69 | loss  4.98 | ppl   145.10 | bpt    7.181 
| epoch  12 |   600/ 1327 batches | lr 0.0002969 | ms/batch 208.30 | loss  5.00 | ppl   147.79 | bpt    7.207 
| epoch  12 |   800/ 1327 batches | lr 0.0002968 | ms/batch 210.96 | loss  4.96 | ppl   143.05 | bpt    7.160 
| epoch  12 |  1000/ 1327 batches | lr 0.0002967 | ms/batch 210.07 | loss  5.01 | ppl   150.56 | bpt    7.234 
| epoch  12 |  1200/ 1327 batches | lr 0.0002966 | ms/batch 213.67 | loss  4.95 | ppl   141.36 | bpt    7.143 
-----------------------------------------------------------------------------------------
| end of epoch  12 | time: 335.61s | valid loss  4.66 | valid ppl   105.28 | valid bpt    6.718
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  13 |   200/ 1327 batches | lr 0.0002964 | ms/batch 210.41 | loss  4.94 | ppl   139.83 | bpt    7.128 
| epoch  13 |   400/ 1327 batches | lr 0.0002963 | ms/batch 208.59 | loss  4.96 | ppl   142.13 | bpt    7.151 
| epoch  13 |   600/ 1327 batches | lr 0.0002962 | ms/batch 210.85 | loss  4.96 | ppl   142.60 | bpt    7.156 
| epoch  13 |   800/ 1327 batches | lr 0.0002961 | ms/batch 205.27 | loss  4.92 | ppl   137.41 | bpt    7.102 
| epoch  13 |  1000/ 1327 batches | lr 0.000296 | ms/batch 209.71 | loss  4.98 | ppl   144.81 | bpt    7.178 
| epoch  13 |  1200/ 1327 batches | lr 0.0002959 | ms/batch 210.49 | loss  4.94 | ppl   139.70 | bpt    7.126 
-----------------------------------------------------------------------------------------
| end of epoch  13 | time: 335.86s | valid loss  4.63 | valid ppl   102.69 | valid bpt    6.682
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  14 |   200/ 1327 batches | lr 0.0002957 | ms/batch 212.48 | loss  4.92 | ppl   136.61 | bpt    7.094 
| epoch  14 |   400/ 1327 batches | lr 0.0002956 | ms/batch 211.66 | loss  4.92 | ppl   136.74 | bpt    7.095 
| epoch  14 |   600/ 1327 batches | lr 0.0002955 | ms/batch 210.41 | loss  4.92 | ppl   137.59 | bpt    7.104 
| epoch  14 |   800/ 1327 batches | lr 0.0002954 | ms/batch 213.42 | loss  4.93 | ppl   137.87 | bpt    7.107 
| epoch  14 |  1000/ 1327 batches | lr 0.0002953 | ms/batch 212.38 | loss  4.95 | ppl   140.86 | bpt    7.138 
| epoch  14 |  1200/ 1327 batches | lr 0.0002952 | ms/batch 212.04 | loss  4.89 | ppl   133.30 | bpt    7.059 
-----------------------------------------------------------------------------------------
| end of epoch  14 | time: 337.43s | valid loss  4.60 | valid ppl    99.90 | valid bpt    6.642
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  15 |   200/ 1327 batches | lr 0.0002949 | ms/batch 210.60 | loss  4.88 | ppl   131.08 | bpt    7.034 
| epoch  15 |   400/ 1327 batches | lr 0.0002948 | ms/batch 208.20 | loss  4.90 | ppl   133.77 | bpt    7.064 
| epoch  15 |   600/ 1327 batches | lr 0.0002947 | ms/batch 210.76 | loss  4.91 | ppl   135.64 | bpt    7.084 
| epoch  15 |   800/ 1327 batches | lr 0.0002946 | ms/batch 209.45 | loss  4.88 | ppl   131.97 | bpt    7.044 
| epoch  15 |  1000/ 1327 batches | lr 0.0002944 | ms/batch 207.37 | loss  4.92 | ppl   137.62 | bpt    7.105 
| epoch  15 |  1200/ 1327 batches | lr 0.0002943 | ms/batch 207.41 | loss  4.87 | ppl   130.25 | bpt    7.025 
-----------------------------------------------------------------------------------------
| end of epoch  15 | time: 331.93s | valid loss  4.58 | valid ppl    97.94 | valid bpt    6.614
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  16 |   200/ 1327 batches | lr 0.0002941 | ms/batch 210.03 | loss  4.86 | ppl   129.44 | bpt    7.016 
| epoch  16 |   400/ 1327 batches | lr 0.0002939 | ms/batch 210.26 | loss  4.88 | ppl   131.32 | bpt    7.037 
| epoch  16 |   600/ 1327 batches | lr 0.0002938 | ms/batch 209.67 | loss  4.88 | ppl   131.72 | bpt    7.041 
| epoch  16 |   800/ 1327 batches | lr 0.0002937 | ms/batch 209.00 | loss  4.86 | ppl   129.57 | bpt    7.018 
| epoch  16 |  1000/ 1327 batches | lr 0.0002935 | ms/batch 208.02 | loss  4.91 | ppl   136.22 | bpt    7.090 
| epoch  16 |  1200/ 1327 batches | lr 0.0002934 | ms/batch 209.11 | loss  4.84 | ppl   126.24 | bpt    6.980 
-----------------------------------------------------------------------------------------
| end of epoch  16 | time: 333.23s | valid loss  4.56 | valid ppl    95.67 | valid bpt    6.580
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  17 |   200/ 1327 batches | lr 0.0002932 | ms/batch 206.84 | loss  4.83 | ppl   125.41 | bpt    6.970 
| epoch  17 |   400/ 1327 batches | lr 0.000293 | ms/batch 209.02 | loss  4.85 | ppl   127.32 | bpt    6.992 
| epoch  17 |   600/ 1327 batches | lr 0.0002929 | ms/batch 208.20 | loss  4.86 | ppl   128.75 | bpt    7.008 
| epoch  17 |   800/ 1327 batches | lr 0.0002927 | ms/batch 207.65 | loss  4.84 | ppl   126.04 | bpt    6.978 
| epoch  17 |  1000/ 1327 batches | lr 0.0002926 | ms/batch 210.55 | loss  4.89 | ppl   132.46 | bpt    7.049 
| epoch  17 |  1200/ 1327 batches | lr 0.0002924 | ms/batch 211.02 | loss  4.83 | ppl   124.92 | bpt    6.965 
-----------------------------------------------------------------------------------------
| end of epoch  17 | time: 334.57s | valid loss  4.54 | valid ppl    93.80 | valid bpt    6.551
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  18 |   200/ 1327 batches | lr 0.0002922 | ms/batch 209.83 | loss  4.81 | ppl   123.09 | bpt    6.944 
| epoch  18 |   400/ 1327 batches | lr 0.000292 | ms/batch 211.18 | loss  4.82 | ppl   124.48 | bpt    6.960 
| epoch  18 |   600/ 1327 batches | lr 0.0002919 | ms/batch 211.55 | loss  4.84 | ppl   126.95 | bpt    6.988 
| epoch  18 |   800/ 1327 batches | lr 0.0002917 | ms/batch 209.05 | loss  4.81 | ppl   123.33 | bpt    6.946 
| epoch  18 |  1000/ 1327 batches | lr 0.0002916 | ms/batch 207.78 | loss  4.86 | ppl   129.17 | bpt    7.013 
| epoch  18 |  1200/ 1327 batches | lr 0.0002914 | ms/batch 212.79 | loss  4.80 | ppl   121.19 | bpt    6.921 
-----------------------------------------------------------------------------------------
| end of epoch  18 | time: 334.84s | valid loss  4.52 | valid ppl    92.19 | valid bpt    6.526
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  19 |   200/ 1327 batches | lr 0.0002911 | ms/batch 212.32 | loss  4.78 | ppl   119.65 | bpt    6.903 
| epoch  19 |   400/ 1327 batches | lr 0.0002909 | ms/batch 210.46 | loss  4.81 | ppl   122.35 | bpt    6.935 
| epoch  19 |   600/ 1327 batches | lr 0.0002908 | ms/batch 205.96 | loss  4.82 | ppl   124.05 | bpt    6.955 
| epoch  19 |   800/ 1327 batches | lr 0.0002906 | ms/batch 211.32 | loss  4.79 | ppl   119.85 | bpt    6.905 
| epoch  19 |  1000/ 1327 batches | lr 0.0002905 | ms/batch 212.23 | loss  4.85 | ppl   128.07 | bpt    7.001 
| epoch  19 |  1200/ 1327 batches | lr 0.0002903 | ms/batch 210.23 | loss  4.79 | ppl   120.44 | bpt    6.912 
-----------------------------------------------------------------------------------------
| end of epoch  19 | time: 337.25s | valid loss  4.51 | valid ppl    91.08 | valid bpt    6.509
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  20 |   200/ 1327 batches | lr 0.00029 | ms/batch 212.25 | loss  4.77 | ppl   118.37 | bpt    6.887 
| epoch  20 |   400/ 1327 batches | lr 0.0002898 | ms/batch 208.78 | loss  4.79 | ppl   120.51 | bpt    6.913 
| epoch  20 |   600/ 1327 batches | lr 0.0002896 | ms/batch 211.05 | loss  4.80 | ppl   120.91 | bpt    6.918 
| epoch  20 |   800/ 1327 batches | lr 0.0002895 | ms/batch 212.33 | loss  4.78 | ppl   119.60 | bpt    6.902 
| epoch  20 |  1000/ 1327 batches | lr 0.0002893 | ms/batch 210.55 | loss  4.82 | ppl   124.22 | bpt    6.957 
| epoch  20 |  1200/ 1327 batches | lr 0.0002891 | ms/batch 210.32 | loss  4.77 | ppl   117.37 | bpt    6.875 
-----------------------------------------------------------------------------------------
| end of epoch  20 | time: 335.55s | valid loss  4.49 | valid ppl    89.13 | valid bpt    6.478
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  21 |   200/ 1327 batches | lr 0.0002888 | ms/batch 214.35 | loss  4.77 | ppl   117.71 | bpt    6.879 
| epoch  21 |   400/ 1327 batches | lr 0.0002886 | ms/batch 210.80 | loss  4.76 | ppl   116.52 | bpt    6.864 
| epoch  21 |   600/ 1327 batches | lr 0.0002884 | ms/batch 210.45 | loss  4.80 | ppl   121.53 | bpt    6.925 
| epoch  21 |   800/ 1327 batches | lr 0.0002883 | ms/batch 210.43 | loss  4.76 | ppl   117.13 | bpt    6.872 
| epoch  21 |  1000/ 1327 batches | lr 0.0002881 | ms/batch 210.51 | loss  4.81 | ppl   122.26 | bpt    6.934 
| epoch  21 |  1200/ 1327 batches | lr 0.0002879 | ms/batch 211.64 | loss  4.75 | ppl   116.02 | bpt    6.858 
-----------------------------------------------------------------------------------------
| end of epoch  21 | time: 334.40s | valid loss  4.49 | valid ppl    88.69 | valid bpt    6.471
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  22 |   200/ 1327 batches | lr 0.0002876 | ms/batch 208.23 | loss  4.74 | ppl   114.33 | bpt    6.837 
| epoch  22 |   400/ 1327 batches | lr 0.0002874 | ms/batch 208.48 | loss  4.75 | ppl   115.92 | bpt    6.857 
| epoch  22 |   600/ 1327 batches | lr 0.0002872 | ms/batch 209.45 | loss  4.79 | ppl   119.92 | bpt    6.906 
| epoch  22 |   800/ 1327 batches | lr 0.000287 | ms/batch 209.61 | loss  4.74 | ppl   114.36 | bpt    6.837 
| epoch  22 |  1000/ 1327 batches | lr 0.0002868 | ms/batch 213.91 | loss  4.80 | ppl   121.60 | bpt    6.926 
| epoch  22 |  1200/ 1327 batches | lr 0.0002866 | ms/batch 211.13 | loss  4.73 | ppl   113.47 | bpt    6.826 
-----------------------------------------------------------------------------------------
| end of epoch  22 | time: 335.60s | valid loss  4.47 | valid ppl    87.48 | valid bpt    6.451
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  23 |   200/ 1327 batches | lr 0.0002863 | ms/batch 210.76 | loss  4.72 | ppl   112.43 | bpt    6.813 
| epoch  23 |   400/ 1327 batches | lr 0.0002861 | ms/batch 210.56 | loss  4.73 | ppl   113.24 | bpt    6.823 
| epoch  23 |   600/ 1327 batches | lr 0.0002859 | ms/batch 214.22 | loss  4.76 | ppl   116.50 | bpt    6.864 
| epoch  23 |   800/ 1327 batches | lr 0.0002857 | ms/batch 206.68 | loss  4.74 | ppl   114.08 | bpt    6.834 
| epoch  23 |  1000/ 1327 batches | lr 0.0002855 | ms/batch 208.52 | loss  4.79 | ppl   120.18 | bpt    6.909 
| epoch  23 |  1200/ 1327 batches | lr 0.0002853 | ms/batch 209.14 | loss  4.72 | ppl   111.63 | bpt    6.803 
-----------------------------------------------------------------------------------------
| end of epoch  23 | time: 335.19s | valid loss  4.46 | valid ppl    86.38 | valid bpt    6.433
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  24 |   200/ 1327 batches | lr 0.0002849 | ms/batch 210.19 | loss  4.71 | ppl   110.93 | bpt    6.794 
| epoch  24 |   400/ 1327 batches | lr 0.0002847 | ms/batch 211.38 | loss  4.71 | ppl   111.32 | bpt    6.799 
| epoch  24 |   600/ 1327 batches | lr 0.0002845 | ms/batch 211.41 | loss  4.76 | ppl   116.25 | bpt    6.861 
| epoch  24 |   800/ 1327 batches | lr 0.0002843 | ms/batch 206.75 | loss  4.72 | ppl   111.90 | bpt    6.806 
| epoch  24 |  1000/ 1327 batches | lr 0.0002841 | ms/batch 210.91 | loss  4.78 | ppl   118.77 | bpt    6.892 
| epoch  24 |  1200/ 1327 batches | lr 0.0002839 | ms/batch 210.64 | loss  4.69 | ppl   108.87 | bpt    6.766 
-----------------------------------------------------------------------------------------
| end of epoch  24 | time: 335.80s | valid loss  4.45 | valid ppl    85.87 | valid bpt    6.424
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  25 |   200/ 1327 batches | lr 0.0002835 | ms/batch 213.58 | loss  4.71 | ppl   110.69 | bpt    6.790 
| epoch  25 |   400/ 1327 batches | lr 0.0002833 | ms/batch 206.39 | loss  4.70 | ppl   110.31 | bpt    6.785 
| epoch  25 |   600/ 1327 batches | lr 0.000283 | ms/batch 208.85 | loss  4.73 | ppl   113.67 | bpt    6.829 
| epoch  25 |   800/ 1327 batches | lr 0.0002828 | ms/batch 208.64 | loss  4.70 | ppl   109.71 | bpt    6.778 
| epoch  25 |  1000/ 1327 batches | lr 0.0002826 | ms/batch 210.68 | loss  4.76 | ppl   116.68 | bpt    6.866 
| epoch  25 |  1200/ 1327 batches | lr 0.0002824 | ms/batch 212.32 | loss  4.70 | ppl   110.12 | bpt    6.783 
-----------------------------------------------------------------------------------------
| end of epoch  25 | time: 335.04s | valid loss  4.43 | valid ppl    84.25 | valid bpt    6.397
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  26 |   200/ 1327 batches | lr 0.000282 | ms/batch 212.30 | loss  4.68 | ppl   107.30 | bpt    6.745 
| epoch  26 |   400/ 1327 batches | lr 0.0002818 | ms/batch 208.89 | loss  4.69 | ppl   109.02 | bpt    6.768 
| epoch  26 |   600/ 1327 batches | lr 0.0002815 | ms/batch 211.89 | loss  4.73 | ppl   113.58 | bpt    6.828 
| epoch  26 |   800/ 1327 batches | lr 0.0002813 | ms/batch 208.22 | loss  4.69 | ppl   108.66 | bpt    6.764 
| epoch  26 |  1000/ 1327 batches | lr 0.0002811 | ms/batch 212.16 | loss  4.74 | ppl   113.94 | bpt    6.832 
| epoch  26 |  1200/ 1327 batches | lr 0.0002809 | ms/batch 211.83 | loss  4.70 | ppl   109.71 | bpt    6.778 
-----------------------------------------------------------------------------------------
| end of epoch  26 | time: 334.80s | valid loss  4.43 | valid ppl    84.28 | valid bpt    6.397
-----------------------------------------------------------------------------------------
| epoch  27 |   200/ 1327 batches | lr 0.0002804 | ms/batch 208.86 | loss  4.67 | ppl   107.06 | bpt    6.742 
| epoch  27 |   400/ 1327 batches | lr 0.0002802 | ms/batch 210.73 | loss  4.66 | ppl   106.03 | bpt    6.728 
| epoch  27 |   600/ 1327 batches | lr 0.00028 | ms/batch 210.81 | loss  4.70 | ppl   109.80 | bpt    6.779 
| epoch  27 |   800/ 1327 batches | lr 0.0002798 | ms/batch 209.58 | loss  4.67 | ppl   106.86 | bpt    6.740 
| epoch  27 |  1000/ 1327 batches | lr 0.0002795 | ms/batch 210.38 | loss  4.72 | ppl   112.40 | bpt    6.812 
| epoch  27 |  1200/ 1327 batches | lr 0.0002793 | ms/batch 207.91 | loss  4.68 | ppl   107.27 | bpt    6.745 
-----------------------------------------------------------------------------------------
| end of epoch  27 | time: 335.34s | valid loss  4.42 | valid ppl    83.43 | valid bpt    6.383
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  28 |   200/ 1327 batches | lr 0.0002788 | ms/batch 210.67 | loss  4.66 | ppl   105.14 | bpt    6.716 
| epoch  28 |   400/ 1327 batches | lr 0.0002786 | ms/batch 211.20 | loss  4.66 | ppl   105.39 | bpt    6.720 
| epoch  28 |   600/ 1327 batches | lr 0.0002784 | ms/batch 211.24 | loss  4.69 | ppl   108.69 | bpt    6.764 
| epoch  28 |   800/ 1327 batches | lr 0.0002781 | ms/batch 210.73 | loss  4.68 | ppl   107.88 | bpt    6.753 
| epoch  28 |  1000/ 1327 batches | lr 0.0002779 | ms/batch 210.14 | loss  4.72 | ppl   112.32 | bpt    6.812 
| epoch  28 |  1200/ 1327 batches | lr 0.0002776 | ms/batch 211.31 | loss  4.65 | ppl   104.45 | bpt    6.707 
-----------------------------------------------------------------------------------------
| end of epoch  28 | time: 334.51s | valid loss  4.42 | valid ppl    83.26 | valid bpt    6.380
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  29 |   200/ 1327 batches | lr 0.0002772 | ms/batch 212.33 | loss  4.65 | ppl   104.60 | bpt    6.709 
| epoch  29 |   400/ 1327 batches | lr 0.000277 | ms/batch 211.44 | loss  4.65 | ppl   104.66 | bpt    6.710 
| epoch  29 |   600/ 1327 batches | lr 0.0002767 | ms/batch 209.64 | loss  4.68 | ppl   108.25 | bpt    6.758 
| epoch  29 |   800/ 1327 batches | lr 0.0002765 | ms/batch 211.70 | loss  4.66 | ppl   105.55 | bpt    6.722 
| epoch  29 |  1000/ 1327 batches | lr 0.0002762 | ms/batch 210.43 | loss  4.70 | ppl   109.57 | bpt    6.776 
| epoch  29 |  1200/ 1327 batches | lr 0.000276 | ms/batch 211.01 | loss  4.65 | ppl   104.48 | bpt    6.707 
-----------------------------------------------------------------------------------------
| end of epoch  29 | time: 335.06s | valid loss  4.41 | valid ppl    82.56 | valid bpt    6.367
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  30 |   200/ 1327 batches | lr 0.0002755 | ms/batch 209.83 | loss  4.63 | ppl   102.06 | bpt    6.673 
| epoch  30 |   400/ 1327 batches | lr 0.0002752 | ms/batch 209.17 | loss  4.62 | ppl   101.89 | bpt    6.671 
| epoch  30 |   600/ 1327 batches | lr 0.000275 | ms/batch 208.23 | loss  4.67 | ppl   106.71 | bpt    6.738 
| epoch  30 |   800/ 1327 batches | lr 0.0002747 | ms/batch 209.28 | loss  4.63 | ppl   102.42 | bpt    6.678 
| epoch  30 |  1000/ 1327 batches | lr 0.0002745 | ms/batch 211.83 | loss  4.69 | ppl   109.07 | bpt    6.769 
| epoch  30 |  1200/ 1327 batches | lr 0.0002742 | ms/batch 211.08 | loss  4.62 | ppl   101.76 | bpt    6.669 
-----------------------------------------------------------------------------------------
| end of epoch  30 | time: 335.56s | valid loss  4.40 | valid ppl    81.15 | valid bpt    6.342
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  31 |   200/ 1327 batches | lr 0.0002737 | ms/batch 211.36 | loss  4.62 | ppl   101.57 | bpt    6.666 
| epoch  31 |   400/ 1327 batches | lr 0.0002735 | ms/batch 209.41 | loss  4.63 | ppl   102.18 | bpt    6.675 
| epoch  31 |   600/ 1327 batches | lr 0.0002732 | ms/batch 213.28 | loss  4.66 | ppl   105.22 | bpt    6.717 
| epoch  31 |   800/ 1327 batches | lr 0.0002729 | ms/batch 211.92 | loss  4.63 | ppl   102.17 | bpt    6.675 
| epoch  31 |  1000/ 1327 batches | lr 0.0002727 | ms/batch 212.03 | loss  4.68 | ppl   107.59 | bpt    6.749 
| epoch  31 |  1200/ 1327 batches | lr 0.0002724 | ms/batch 211.88 | loss  4.61 | ppl   100.95 | bpt    6.657 
-----------------------------------------------------------------------------------------
| end of epoch  31 | time: 336.89s | valid loss  4.39 | valid ppl    80.58 | valid bpt    6.332
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  32 |   200/ 1327 batches | lr 0.0002719 | ms/batch 214.30 | loss  4.61 | ppl   100.16 | bpt    6.646 
| epoch  32 |   400/ 1327 batches | lr 0.0002716 | ms/batch 211.75 | loss  4.59 | ppl    98.98 | bpt    6.629 
| epoch  32 |   600/ 1327 batches | lr 0.0002714 | ms/batch 208.50 | loss  4.66 | ppl   105.29 | bpt    6.718 
| epoch  32 |   800/ 1327 batches | lr 0.0002711 | ms/batch 210.37 | loss  4.62 | ppl   101.92 | bpt    6.671 
| epoch  32 |  1000/ 1327 batches | lr 0.0002708 | ms/batch 214.86 | loss  4.67 | ppl   106.67 | bpt    6.737 
| epoch  32 |  1200/ 1327 batches | lr 0.0002706 | ms/batch 211.50 | loss  4.61 | ppl   100.19 | bpt    6.647 
-----------------------------------------------------------------------------------------
| end of epoch  32 | time: 336.58s | valid loss  4.38 | valid ppl    80.07 | valid bpt    6.323
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  33 |   200/ 1327 batches | lr 0.0002701 | ms/batch 211.63 | loss  4.59 | ppl    98.43 | bpt    6.621 
| epoch  33 |   400/ 1327 batches | lr 0.0002698 | ms/batch 212.94 | loss  4.60 | ppl    99.26 | bpt    6.633 
| epoch  33 |   600/ 1327 batches | lr 0.0002695 | ms/batch 209.47 | loss  4.65 | ppl   104.07 | bpt    6.701 
| epoch  33 |   800/ 1327 batches | lr 0.0002692 | ms/batch 208.52 | loss  4.61 | ppl   100.61 | bpt    6.653 
| epoch  33 |  1000/ 1327 batches | lr 0.0002689 | ms/batch 213.98 | loss  4.65 | ppl   104.39 | bpt    6.706 
| epoch  33 |  1200/ 1327 batches | lr 0.0002687 | ms/batch 213.62 | loss  4.59 | ppl    98.93 | bpt    6.628 
-----------------------------------------------------------------------------------------
| end of epoch  33 | time: 337.48s | valid loss  4.37 | valid ppl    78.92 | valid bpt    6.302
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  34 |   200/ 1327 batches | lr 0.0002681 | ms/batch 211.22 | loss  4.58 | ppl    97.56 | bpt    6.608 
| epoch  34 |   400/ 1327 batches | lr 0.0002679 | ms/batch 209.46 | loss  4.57 | ppl    96.24 | bpt    6.589 
| epoch  34 |   600/ 1327 batches | lr 0.0002676 | ms/batch 213.12 | loss  4.61 | ppl   100.62 | bpt    6.653 
| epoch  34 |   800/ 1327 batches | lr 0.0002673 | ms/batch 211.78 | loss  4.60 | ppl    99.26 | bpt    6.633 
| epoch  34 |  1000/ 1327 batches | lr 0.000267 | ms/batch 213.28 | loss  4.64 | ppl   103.66 | bpt    6.696 
| epoch  34 |  1200/ 1327 batches | lr 0.0002667 | ms/batch 211.44 | loss  4.59 | ppl    98.85 | bpt    6.627 
-----------------------------------------------------------------------------------------
| end of epoch  34 | time: 335.85s | valid loss  4.37 | valid ppl    79.35 | valid bpt    6.310
-----------------------------------------------------------------------------------------
| epoch  35 |   200/ 1327 batches | lr 0.0002662 | ms/batch 212.17 | loss  4.57 | ppl    96.86 | bpt    6.598 
| epoch  35 |   400/ 1327 batches | lr 0.0002659 | ms/batch 208.43 | loss  4.56 | ppl    95.91 | bpt    6.584 
| epoch  35 |   600/ 1327 batches | lr 0.0002656 | ms/batch 210.17 | loss  4.60 | ppl    99.79 | bpt    6.641 
| epoch  35 |   800/ 1327 batches | lr 0.0002653 | ms/batch 210.92 | loss  4.59 | ppl    98.87 | bpt    6.627 
| epoch  35 |  1000/ 1327 batches | lr 0.000265 | ms/batch 211.74 | loss  4.64 | ppl   103.92 | bpt    6.699 
| epoch  35 |  1200/ 1327 batches | lr 0.0002647 | ms/batch 212.80 | loss  4.56 | ppl    95.64 | bpt    6.580 
-----------------------------------------------------------------------------------------
| end of epoch  35 | time: 335.86s | valid loss  4.36 | valid ppl    78.46 | valid bpt    6.294
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  36 |   200/ 1327 batches | lr 0.0002642 | ms/batch 208.38 | loss  4.56 | ppl    95.19 | bpt    6.573 
| epoch  36 |   400/ 1327 batches | lr 0.0002639 | ms/batch 211.58 | loss  4.56 | ppl    95.36 | bpt    6.575 
| epoch  36 |   600/ 1327 batches | lr 0.0002636 | ms/batch 211.85 | loss  4.60 | ppl    99.35 | bpt    6.634 
| epoch  36 |   800/ 1327 batches | lr 0.0002633 | ms/batch 213.28 | loss  4.57 | ppl    96.68 | bpt    6.595 
| epoch  36 |  1000/ 1327 batches | lr 0.000263 | ms/batch 211.42 | loss  4.62 | ppl   101.75 | bpt    6.669 
| epoch  36 |  1200/ 1327 batches | lr 0.0002627 | ms/batch 206.35 | loss  4.57 | ppl    96.25 | bpt    6.589 
-----------------------------------------------------------------------------------------
| end of epoch  36 | time: 335.78s | valid loss  4.35 | valid ppl    77.67 | valid bpt    6.279
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  37 |   200/ 1327 batches | lr 0.0002621 | ms/batch 209.16 | loss  4.55 | ppl    94.67 | bpt    6.565 
| epoch  37 |   400/ 1327 batches | lr 0.0002618 | ms/batch 209.65 | loss  4.55 | ppl    94.54 | bpt    6.563 
| epoch  37 |   600/ 1327 batches | lr 0.0002615 | ms/batch 209.95 | loss  4.58 | ppl    97.81 | bpt    6.612 
| epoch  37 |   800/ 1327 batches | lr 0.0002612 | ms/batch 212.39 | loss  4.57 | ppl    96.49 | bpt    6.592 
| epoch  37 |  1000/ 1327 batches | lr 0.0002609 | ms/batch 212.54 | loss  4.62 | ppl   101.07 | bpt    6.659 
| epoch  37 |  1200/ 1327 batches | lr 0.0002606 | ms/batch 212.19 | loss  4.55 | ppl    94.55 | bpt    6.563 
-----------------------------------------------------------------------------------------
| end of epoch  37 | time: 337.62s | valid loss  4.34 | valid ppl    77.02 | valid bpt    6.267
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  38 |   200/ 1327 batches | lr 0.00026 | ms/batch 210.45 | loss  4.54 | ppl    94.02 | bpt    6.555 
| epoch  38 |   400/ 1327 batches | lr 0.0002597 | ms/batch 209.17 | loss  4.53 | ppl    92.91 | bpt    6.538 
| epoch  38 |   600/ 1327 batches | lr 0.0002594 | ms/batch 212.28 | loss  4.58 | ppl    97.43 | bpt    6.606 
| epoch  38 |   800/ 1327 batches | lr 0.0002591 | ms/batch 212.32 | loss  4.56 | ppl    95.94 | bpt    6.584 
| epoch  38 |  1000/ 1327 batches | lr 0.0002588 | ms/batch 211.65 | loss  4.60 | ppl    99.08 | bpt    6.630 
| epoch  38 |  1200/ 1327 batches | lr 0.0002585 | ms/batch 206.39 | loss  4.54 | ppl    93.90 | bpt    6.553 
-----------------------------------------------------------------------------------------
| end of epoch  38 | time: 335.32s | valid loss  4.35 | valid ppl    77.32 | valid bpt    6.273
-----------------------------------------------------------------------------------------
| epoch  39 |   200/ 1327 batches | lr 0.0002579 | ms/batch 210.99 | loss  4.52 | ppl    91.79 | bpt    6.520 
| epoch  39 |   400/ 1327 batches | lr 0.0002576 | ms/batch 211.24 | loss  4.53 | ppl    92.55 | bpt    6.532 
| epoch  39 |   600/ 1327 batches | lr 0.0002573 | ms/batch 209.83 | loss  4.57 | ppl    96.63 | bpt    6.594 
| epoch  39 |   800/ 1327 batches | lr 0.000257 | ms/batch 209.64 | loss  4.53 | ppl    92.63 | bpt    6.533 
| epoch  39 |  1000/ 1327 batches | lr 0.0002566 | ms/batch 210.04 | loss  4.58 | ppl    97.81 | bpt    6.612 
| epoch  39 |  1200/ 1327 batches | lr 0.0002563 | ms/batch 215.91 | loss  4.54 | ppl    93.45 | bpt    6.546 
-----------------------------------------------------------------------------------------
| end of epoch  39 | time: 337.37s | valid loss  4.33 | valid ppl    75.91 | valid bpt    6.246
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  40 |   200/ 1327 batches | lr 0.0002557 | ms/batch 210.18 | loss  4.51 | ppl    90.66 | bpt    6.502 
| epoch  40 |   400/ 1327 batches | lr 0.0002554 | ms/batch 210.53 | loss  4.51 | ppl    90.58 | bpt    6.501 
| epoch  40 |   600/ 1327 batches | lr 0.0002551 | ms/batch 209.07 | loss  4.57 | ppl    96.90 | bpt    6.598 
| epoch  40 |   800/ 1327 batches | lr 0.0002548 | ms/batch 210.25 | loss  4.54 | ppl    94.12 | bpt    6.556 
| epoch  40 |  1000/ 1327 batches | lr 0.0002544 | ms/batch 213.82 | loss  4.59 | ppl    98.36 | bpt    6.620 
| epoch  40 |  1200/ 1327 batches | lr 0.0002541 | ms/batch 210.07 | loss  4.53 | ppl    93.03 | bpt    6.540 
-----------------------------------------------------------------------------------------
| end of epoch  40 | time: 336.20s | valid loss  4.33 | valid ppl    75.85 | valid bpt    6.245
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  41 |   200/ 1327 batches | lr 0.0002535 | ms/batch 211.42 | loss  4.50 | ppl    90.17 | bpt    6.495 
| epoch  41 |   400/ 1327 batches | lr 0.0002532 | ms/batch 212.30 | loss  4.51 | ppl    90.80 | bpt    6.505 
| epoch  41 |   600/ 1327 batches | lr 0.0002528 | ms/batch 210.52 | loss  4.54 | ppl    94.15 | bpt    6.557 
| epoch  41 |   800/ 1327 batches | lr 0.0002525 | ms/batch 209.66 | loss  4.53 | ppl    92.94 | bpt    6.538 
| epoch  41 |  1000/ 1327 batches | lr 0.0002522 | ms/batch 211.17 | loss  4.58 | ppl    97.18 | bpt    6.603 
| epoch  41 |  1200/ 1327 batches | lr 0.0002519 | ms/batch 212.26 | loss  4.50 | ppl    90.14 | bpt    6.494 
-----------------------------------------------------------------------------------------
| end of epoch  41 | time: 336.61s | valid loss  4.33 | valid ppl    75.87 | valid bpt    6.246
-----------------------------------------------------------------------------------------
| epoch  42 |   200/ 1327 batches | lr 0.0002512 | ms/batch 212.91 | loss  4.49 | ppl    88.95 | bpt    6.475 
| epoch  42 |   400/ 1327 batches | lr 0.0002509 | ms/batch 210.74 | loss  4.49 | ppl    89.42 | bpt    6.483 
| epoch  42 |   600/ 1327 batches | lr 0.0002506 | ms/batch 215.20 | loss  4.53 | ppl    92.70 | bpt    6.535 
| epoch  42 |   800/ 1327 batches | lr 0.0002503 | ms/batch 212.74 | loss  4.52 | ppl    92.02 | bpt    6.524 
| epoch  42 |  1000/ 1327 batches | lr 0.0002499 | ms/batch 212.64 | loss  4.56 | ppl    95.63 | bpt    6.579 
| epoch  42 |  1200/ 1327 batches | lr 0.0002496 | ms/batch 214.20 | loss  4.50 | ppl    89.85 | bpt    6.489 
-----------------------------------------------------------------------------------------
| end of epoch  42 | time: 336.29s | valid loss  4.31 | valid ppl    74.57 | valid bpt    6.220
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  43 |   200/ 1327 batches | lr 0.000249 | ms/batch 210.70 | loss  4.47 | ppl    87.65 | bpt    6.454 
| epoch  43 |   400/ 1327 batches | lr 0.0002486 | ms/batch 211.19 | loss  4.47 | ppl    87.68 | bpt    6.454 
| epoch  43 |   600/ 1327 batches | lr 0.0002483 | ms/batch 210.53 | loss  4.54 | ppl    93.91 | bpt    6.553 
| epoch  43 |   800/ 1327 batches | lr 0.000248 | ms/batch 210.26 | loss  4.49 | ppl    89.55 | bpt    6.485 
| epoch  43 |  1000/ 1327 batches | lr 0.0002476 | ms/batch 207.22 | loss  4.54 | ppl    94.08 | bpt    6.556 
| epoch  43 |  1200/ 1327 batches | lr 0.0002473 | ms/batch 209.14 | loss  4.51 | ppl    90.64 | bpt    6.502 
-----------------------------------------------------------------------------------------
| end of epoch  43 | time: 336.39s | valid loss  4.31 | valid ppl    74.60 | valid bpt    6.221
-----------------------------------------------------------------------------------------
| epoch  44 |   200/ 1327 batches | lr 0.0002466 | ms/batch 215.56 | loss  4.48 | ppl    88.35 | bpt    6.465 
| epoch  44 |   400/ 1327 batches | lr 0.0002463 | ms/batch 210.89 | loss  4.47 | ppl    87.50 | bpt    6.451 
| epoch  44 |   600/ 1327 batches | lr 0.000246 | ms/batch 213.09 | loss  4.53 | ppl    92.80 | bpt    6.536 
| epoch  44 |   800/ 1327 batches | lr 0.0002456 | ms/batch 212.63 | loss  4.50 | ppl    90.22 | bpt    6.495 
| epoch  44 |  1000/ 1327 batches | lr 0.0002453 | ms/batch 210.53 | loss  4.55 | ppl    94.45 | bpt    6.561 
| epoch  44 |  1200/ 1327 batches | lr 0.0002449 | ms/batch 211.05 | loss  4.47 | ppl    87.09 | bpt    6.444 
-----------------------------------------------------------------------------------------
| end of epoch  44 | time: 336.98s | valid loss  4.30 | valid ppl    73.97 | valid bpt    6.209
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  45 |   200/ 1327 batches | lr 0.0002443 | ms/batch 211.93 | loss  4.47 | ppl    87.20 | bpt    6.446 
| epoch  45 |   400/ 1327 batches | lr 0.0002439 | ms/batch 212.77 | loss  4.46 | ppl    86.23 | bpt    6.430 
| epoch  45 |   600/ 1327 batches | lr 0.0002436 | ms/batch 211.67 | loss  4.51 | ppl    90.88 | bpt    6.506 
| epoch  45 |   800/ 1327 batches | lr 0.0002433 | ms/batch 212.15 | loss  4.49 | ppl    89.21 | bpt    6.479 
| epoch  45 |  1000/ 1327 batches | lr 0.0002429 | ms/batch 214.51 | loss  4.54 | ppl    93.24 | bpt    6.543 
| epoch  45 |  1200/ 1327 batches | lr 0.0002426 | ms/batch 213.88 | loss  4.47 | ppl    87.60 | bpt    6.453 
-----------------------------------------------------------------------------------------
| end of epoch  45 | time: 336.49s | valid loss  4.29 | valid ppl    73.11 | valid bpt    6.192
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  46 |   200/ 1327 batches | lr 0.0002419 | ms/batch 213.08 | loss  4.46 | ppl    86.80 | bpt    6.440 
| epoch  46 |   400/ 1327 batches | lr 0.0002416 | ms/batch 212.37 | loss  4.45 | ppl    85.32 | bpt    6.415 
| epoch  46 |   600/ 1327 batches | lr 0.0002412 | ms/batch 213.65 | loss  4.51 | ppl    90.81 | bpt    6.505 
| epoch  46 |   800/ 1327 batches | lr 0.0002409 | ms/batch 212.51 | loss  4.49 | ppl    89.08 | bpt    6.477 
| epoch  46 |  1000/ 1327 batches | lr 0.0002405 | ms/batch 211.60 | loss  4.52 | ppl    92.01 | bpt    6.524 
| epoch  46 |  1200/ 1327 batches | lr 0.0002402 | ms/batch 210.17 | loss  4.46 | ppl    86.81 | bpt    6.440 
-----------------------------------------------------------------------------------------
| end of epoch  46 | time: 336.01s | valid loss  4.29 | valid ppl    73.23 | valid bpt    6.194
-----------------------------------------------------------------------------------------
| epoch  47 |   200/ 1327 batches | lr 0.0002395 | ms/batch 209.82 | loss  4.44 | ppl    85.07 | bpt    6.411 
| epoch  47 |   400/ 1327 batches | lr 0.0002392 | ms/batch 208.69 | loss  4.44 | ppl    84.52 | bpt    6.401 
| epoch  47 |   600/ 1327 batches | lr 0.0002388 | ms/batch 209.43 | loss  4.50 | ppl    89.98 | bpt    6.492 
| epoch  47 |   800/ 1327 batches | lr 0.0002385 | ms/batch 209.59 | loss  4.46 | ppl    86.88 | bpt    6.441 
| epoch  47 |  1000/ 1327 batches | lr 0.0002381 | ms/batch 212.82 | loss  4.52 | ppl    91.90 | bpt    6.522 
| epoch  47 |  1200/ 1327 batches | lr 0.0002377 | ms/batch 211.81 | loss  4.44 | ppl    84.91 | bpt    6.408 
-----------------------------------------------------------------------------------------
| end of epoch  47 | time: 334.84s | valid loss  4.27 | valid ppl    71.85 | valid bpt    6.167
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  48 |   200/ 1327 batches | lr 0.0002371 | ms/batch 216.18 | loss  4.44 | ppl    85.03 | bpt    6.410 
| epoch  48 |   400/ 1327 batches | lr 0.0002367 | ms/batch 213.75 | loss  4.43 | ppl    84.19 | bpt    6.396 
| epoch  48 |   600/ 1327 batches | lr 0.0002364 | ms/batch 211.92 | loss  4.50 | ppl    90.09 | bpt    6.493 
| epoch  48 |   800/ 1327 batches | lr 0.000236 | ms/batch 210.91 | loss  4.46 | ppl    86.45 | bpt    6.434 
| epoch  48 |  1000/ 1327 batches | lr 0.0002356 | ms/batch 210.93 | loss  4.52 | ppl    91.66 | bpt    6.518 
| epoch  48 |  1200/ 1327 batches | lr 0.0002353 | ms/batch 212.10 | loss  4.45 | ppl    85.95 | bpt    6.425 
-----------------------------------------------------------------------------------------
| end of epoch  48 | time: 336.22s | valid loss  4.28 | valid ppl    71.99 | valid bpt    6.170
-----------------------------------------------------------------------------------------
| epoch  49 |   200/ 1327 batches | lr 0.0002346 | ms/batch 210.46 | loss  4.43 | ppl    84.00 | bpt    6.392 
| epoch  49 |   400/ 1327 batches | lr 0.0002343 | ms/batch 210.31 | loss  4.42 | ppl    83.17 | bpt    6.378 
| epoch  49 |   600/ 1327 batches | lr 0.0002339 | ms/batch 210.67 | loss  4.47 | ppl    87.55 | bpt    6.452 
| epoch  49 |   800/ 1327 batches | lr 0.0002335 | ms/batch 210.15 | loss  4.46 | ppl    86.61 | bpt    6.437 
| epoch  49 |  1000/ 1327 batches | lr 0.0002332 | ms/batch 211.98 | loss  4.49 | ppl    89.21 | bpt    6.479 
| epoch  49 |  1200/ 1327 batches | lr 0.0002328 | ms/batch 212.10 | loss  4.44 | ppl    84.50 | bpt    6.401 
-----------------------------------------------------------------------------------------
| end of epoch  49 | time: 335.90s | valid loss  4.27 | valid ppl    71.65 | valid bpt    6.163
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  50 |   200/ 1327 batches | lr 0.0002321 | ms/batch 213.08 | loss  4.41 | ppl    82.47 | bpt    6.366 
| epoch  50 |   400/ 1327 batches | lr 0.0002318 | ms/batch 212.42 | loss  4.42 | ppl    83.18 | bpt    6.378 
| epoch  50 |   600/ 1327 batches | lr 0.0002314 | ms/batch 213.55 | loss  4.48 | ppl    88.12 | bpt    6.461 
| epoch  50 |   800/ 1327 batches | lr 0.000231 | ms/batch 215.57 | loss  4.44 | ppl    84.98 | bpt    6.409 
| epoch  50 |  1000/ 1327 batches | lr 0.0002307 | ms/batch 211.94 | loss  4.50 | ppl    90.12 | bpt    6.494 
| epoch  50 |  1200/ 1327 batches | lr 0.0002303 | ms/batch 212.66 | loss  4.43 | ppl    83.93 | bpt    6.391 
-----------------------------------------------------------------------------------------
| end of epoch  50 | time: 337.88s | valid loss  4.27 | valid ppl    71.69 | valid bpt    6.164
-----------------------------------------------------------------------------------------
| epoch  51 |   200/ 1327 batches | lr 0.0002296 | ms/batch 211.15 | loss  4.40 | ppl    81.65 | bpt    6.351 
| epoch  51 |   400/ 1327 batches | lr 0.0002293 | ms/batch 210.25 | loss  4.40 | ppl    81.70 | bpt    6.352 
| epoch  51 |   600/ 1327 batches | lr 0.0002289 | ms/batch 212.34 | loss  4.47 | ppl    87.75 | bpt    6.455 
| epoch  51 |   800/ 1327 batches | lr 0.0002285 | ms/batch 214.20 | loss  4.43 | ppl    83.71 | bpt    6.387 
| epoch  51 |  1000/ 1327 batches | lr 0.0002281 | ms/batch 208.26 | loss  4.48 | ppl    88.59 | bpt    6.469 
| epoch  51 |  1200/ 1327 batches | lr 0.0002278 | ms/batch 211.71 | loss  4.43 | ppl    83.64 | bpt    6.386 
-----------------------------------------------------------------------------------------
| end of epoch  51 | time: 338.10s | valid loss  4.26 | valid ppl    70.99 | valid bpt    6.150
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  52 |   200/ 1327 batches | lr 0.0002271 | ms/batch 210.55 | loss  4.39 | ppl    80.82 | bpt    6.337 
| epoch  52 |   400/ 1327 batches | lr 0.0002267 | ms/batch 210.28 | loss  4.40 | ppl    81.51 | bpt    6.349 
| epoch  52 |   600/ 1327 batches | lr 0.0002263 | ms/batch 213.91 | loss  4.45 | ppl    85.77 | bpt    6.422 
| epoch  52 |   800/ 1327 batches | lr 0.000226 | ms/batch 212.83 | loss  4.44 | ppl    84.74 | bpt    6.405 
| epoch  52 |  1000/ 1327 batches | lr 0.0002256 | ms/batch 213.48 | loss  4.48 | ppl    88.63 | bpt    6.470 
| epoch  52 |  1200/ 1327 batches | lr 0.0002252 | ms/batch 210.02 | loss  4.41 | ppl    82.07 | bpt    6.359 
-----------------------------------------------------------------------------------------
| end of epoch  52 | time: 337.10s | valid loss  4.27 | valid ppl    71.19 | valid bpt    6.154
-----------------------------------------------------------------------------------------
| epoch  53 |   200/ 1327 batches | lr 0.0002245 | ms/batch 211.55 | loss  4.40 | ppl    81.19 | bpt    6.343 
| epoch  53 |   400/ 1327 batches | lr 0.0002241 | ms/batch 210.61 | loss  4.39 | ppl    80.90 | bpt    6.338 
| epoch  53 |   600/ 1327 batches | lr 0.0002238 | ms/batch 211.28 | loss  4.44 | ppl    84.56 | bpt    6.402 
| epoch  53 |   800/ 1327 batches | lr 0.0002234 | ms/batch 212.59 | loss  4.43 | ppl    83.74 | bpt    6.388 
| epoch  53 |  1000/ 1327 batches | lr 0.000223 | ms/batch 213.14 | loss  4.47 | ppl    87.42 | bpt    6.450 
| epoch  53 |  1200/ 1327 batches | lr 0.0002226 | ms/batch 213.17 | loss  4.41 | ppl    81.91 | bpt    6.356 
-----------------------------------------------------------------------------------------
| end of epoch  53 | time: 336.81s | valid loss  4.25 | valid ppl    70.41 | valid bpt    6.138
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  54 |   200/ 1327 batches | lr 0.0002219 | ms/batch 211.78 | loss  4.38 | ppl    79.98 | bpt    6.322 
| epoch  54 |   400/ 1327 batches | lr 0.0002216 | ms/batch 212.82 | loss  4.37 | ppl    78.81 | bpt    6.300 
| epoch  54 |   600/ 1327 batches | lr 0.0002212 | ms/batch 210.63 | loss  4.44 | ppl    84.59 | bpt    6.402 
| epoch  54 |   800/ 1327 batches | lr 0.0002208 | ms/batch 208.17 | loss  4.40 | ppl    81.60 | bpt    6.350 
| epoch  54 |  1000/ 1327 batches | lr 0.0002204 | ms/batch 213.48 | loss  4.46 | ppl    86.58 | bpt    6.436 
| epoch  54 |  1200/ 1327 batches | lr 0.0002201 | ms/batch 212.71 | loss  4.41 | ppl    82.33 | bpt    6.363 
-----------------------------------------------------------------------------------------
| end of epoch  54 | time: 337.27s | valid loss  4.25 | valid ppl    69.96 | valid bpt    6.128
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  55 |   200/ 1327 batches | lr 0.0002194 | ms/batch 213.68 | loss  4.38 | ppl    79.73 | bpt    6.317 
| epoch  55 |   400/ 1327 batches | lr 0.000219 | ms/batch 213.78 | loss  4.37 | ppl    78.78 | bpt    6.300 
| epoch  55 |   600/ 1327 batches | lr 0.0002186 | ms/batch 213.29 | loss  4.44 | ppl    84.72 | bpt    6.405 
| epoch  55 |   800/ 1327 batches | lr 0.0002182 | ms/batch 206.94 | loss  4.40 | ppl    81.79 | bpt    6.354 
| epoch  55 |  1000/ 1327 batches | lr 0.0002178 | ms/batch 210.70 | loss  4.46 | ppl    86.60 | bpt    6.436 
| epoch  55 |  1200/ 1327 batches | lr 0.0002175 | ms/batch 210.28 | loss  4.37 | ppl    78.95 | bpt    6.303 
-----------------------------------------------------------------------------------------
| end of epoch  55 | time: 336.11s | valid loss  4.25 | valid ppl    70.16 | valid bpt    6.133
-----------------------------------------------------------------------------------------
| epoch  56 |   200/ 1327 batches | lr 0.0002168 | ms/batch 210.14 | loss  4.37 | ppl    78.98 | bpt    6.303 
| epoch  56 |   400/ 1327 batches | lr 0.0002164 | ms/batch 211.20 | loss  4.37 | ppl    79.21 | bpt    6.308 
| epoch  56 |   600/ 1327 batches | lr 0.000216 | ms/batch 211.46 | loss  4.43 | ppl    84.16 | bpt    6.395 
| epoch  56 |   800/ 1327 batches | lr 0.0002156 | ms/batch 209.84 | loss  4.40 | ppl    81.12 | bpt    6.342 
| epoch  56 |  1000/ 1327 batches | lr 0.0002152 | ms/batch 212.82 | loss  4.43 | ppl    84.11 | bpt    6.394 
| epoch  56 |  1200/ 1327 batches | lr 0.0002149 | ms/batch 211.65 | loss  4.38 | ppl    80.07 | bpt    6.323 
-----------------------------------------------------------------------------------------
| end of epoch  56 | time: 336.56s | valid loss  4.25 | valid ppl    69.98 | valid bpt    6.129
-----------------------------------------------------------------------------------------
| epoch  57 |   200/ 1327 batches | lr 0.0002141 | ms/batch 205.80 | loss  4.35 | ppl    77.79 | bpt    6.282 
| epoch  57 |   400/ 1327 batches | lr 0.0002138 | ms/batch 211.71 | loss  4.35 | ppl    77.27 | bpt    6.272 
| epoch  57 |   600/ 1327 batches | lr 0.0002134 | ms/batch 212.47 | loss  4.41 | ppl    81.86 | bpt    6.355 
| epoch  57 |   800/ 1327 batches | lr 0.000213 | ms/batch 212.11 | loss  4.38 | ppl    80.10 | bpt    6.324 
| epoch  57 |  1000/ 1327 batches | lr 0.0002126 | ms/batch 211.76 | loss  4.42 | ppl    82.75 | bpt    6.371 
| epoch  57 |  1200/ 1327 batches | lr 0.0002122 | ms/batch 209.19 | loss  4.37 | ppl    78.74 | bpt    6.299 
-----------------------------------------------------------------------------------------
| end of epoch  57 | time: 335.56s | valid loss  4.25 | valid ppl    69.82 | valid bpt    6.125
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  58 |   200/ 1327 batches | lr 0.0002115 | ms/batch 211.76 | loss  4.35 | ppl    77.39 | bpt    6.274 
| epoch  58 |   400/ 1327 batches | lr 0.0002111 | ms/batch 210.12 | loss  4.34 | ppl    77.01 | bpt    6.267 
| epoch  58 |   600/ 1327 batches | lr 0.0002108 | ms/batch 212.07 | loss  4.40 | ppl    81.24 | bpt    6.344 
| epoch  58 |   800/ 1327 batches | lr 0.0002104 | ms/batch 207.20 | loss  4.38 | ppl    79.81 | bpt    6.318 
| epoch  58 |  1000/ 1327 batches | lr 0.00021 | ms/batch 211.76 | loss  4.42 | ppl    83.51 | bpt    6.384 
| epoch  58 |  1200/ 1327 batches | lr 0.0002096 | ms/batch 211.89 | loss  4.38 | ppl    79.85 | bpt    6.319 
-----------------------------------------------------------------------------------------
| end of epoch  58 | time: 337.27s | valid loss  4.25 | valid ppl    70.04 | valid bpt    6.130
-----------------------------------------------------------------------------------------
| epoch  59 |   200/ 1327 batches | lr 0.0002089 | ms/batch 207.02 | loss  4.34 | ppl    76.87 | bpt    6.264 
| epoch  59 |   400/ 1327 batches | lr 0.0002085 | ms/batch 209.87 | loss  4.33 | ppl    76.08 | bpt    6.249 
| epoch  59 |   600/ 1327 batches | lr 0.0002081 | ms/batch 209.64 | loss  4.40 | ppl    81.13 | bpt    6.342 
| epoch  59 |   800/ 1327 batches | lr 0.0002077 | ms/batch 211.94 | loss  4.37 | ppl    78.67 | bpt    6.298 
| epoch  59 |  1000/ 1327 batches | lr 0.0002073 | ms/batch 215.37 | loss  4.42 | ppl    82.78 | bpt    6.371 
| epoch  59 |  1200/ 1327 batches | lr 0.000207 | ms/batch 208.45 | loss  4.36 | ppl    78.37 | bpt    6.292 
-----------------------------------------------------------------------------------------
| end of epoch  59 | time: 335.51s | valid loss  4.23 | valid ppl    68.86 | valid bpt    6.106
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  60 |   200/ 1327 batches | lr 0.0002062 | ms/batch 211.51 | loss  4.34 | ppl    76.46 | bpt    6.257 
| epoch  60 |   400/ 1327 batches | lr 0.0002058 | ms/batch 210.71 | loss  4.35 | ppl    77.13 | bpt    6.269 
| epoch  60 |   600/ 1327 batches | lr 0.0002055 | ms/batch 211.55 | loss  4.40 | ppl    81.31 | bpt    6.345 
| epoch  60 |   800/ 1327 batches | lr 0.0002051 | ms/batch 213.27 | loss  4.37 | ppl    79.03 | bpt    6.304 
| epoch  60 |  1000/ 1327 batches | lr 0.0002047 | ms/batch 210.83 | loss  4.39 | ppl    80.50 | bpt    6.331 
| epoch  60 |  1200/ 1327 batches | lr 0.0002043 | ms/batch 206.64 | loss  4.34 | ppl    76.91 | bpt    6.265 
-----------------------------------------------------------------------------------------
| end of epoch  60 | time: 336.32s | valid loss  4.23 | valid ppl    69.01 | valid bpt    6.109
-----------------------------------------------------------------------------------------
| epoch  61 |   200/ 1327 batches | lr 0.0002036 | ms/batch 209.73 | loss  4.32 | ppl    75.16 | bpt    6.232 
| epoch  61 |   400/ 1327 batches | lr 0.0002032 | ms/batch 211.93 | loss  4.31 | ppl    74.48 | bpt    6.219 
| epoch  61 |   600/ 1327 batches | lr 0.0002028 | ms/batch 210.84 | loss  4.37 | ppl    78.95 | bpt    6.303 
| epoch  61 |   800/ 1327 batches | lr 0.0002024 | ms/batch 214.40 | loss  4.34 | ppl    76.85 | bpt    6.264 
| epoch  61 |  1000/ 1327 batches | lr 0.000202 | ms/batch 210.64 | loss  4.39 | ppl    80.58 | bpt    6.332 
| epoch  61 |  1200/ 1327 batches | lr 0.0002017 | ms/batch 215.02 | loss  4.34 | ppl    76.55 | bpt    6.258 
-----------------------------------------------------------------------------------------
| end of epoch  61 | time: 336.01s | valid loss  4.22 | valid ppl    68.28 | valid bpt    6.093
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  62 |   200/ 1327 batches | lr 0.0002009 | ms/batch 212.49 | loss  4.32 | ppl    74.86 | bpt    6.226 
| epoch  62 |   400/ 1327 batches | lr 0.0002006 | ms/batch 209.99 | loss  4.33 | ppl    75.58 | bpt    6.240 
| epoch  62 |   600/ 1327 batches | lr 0.0002002 | ms/batch 211.83 | loss  4.38 | ppl    79.64 | bpt    6.315 
| epoch  62 |   800/ 1327 batches | lr 0.0001998 | ms/batch 212.44 | loss  4.33 | ppl    75.63 | bpt    6.241 
| epoch  62 |  1000/ 1327 batches | lr 0.0001994 | ms/batch 213.45 | loss  4.39 | ppl    80.72 | bpt    6.335 
| epoch  62 |  1200/ 1327 batches | lr 0.000199 | ms/batch 212.31 | loss  4.33 | ppl    76.31 | bpt    6.254 
-----------------------------------------------------------------------------------------
| end of epoch  62 | time: 338.03s | valid loss  4.22 | valid ppl    67.99 | valid bpt    6.087
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  63 |   200/ 1327 batches | lr 0.0001983 | ms/batch 211.24 | loss  4.31 | ppl    74.28 | bpt    6.215 
| epoch  63 |   400/ 1327 batches | lr 0.0001979 | ms/batch 208.92 | loss  4.32 | ppl    74.97 | bpt    6.228 
| epoch  63 |   600/ 1327 batches | lr 0.0001975 | ms/batch 205.64 | loss  4.37 | ppl    78.75 | bpt    6.299 
| epoch  63 |   800/ 1327 batches | lr 0.0001971 | ms/batch 207.07 | loss  4.33 | ppl    76.31 | bpt    6.254 
| epoch  63 |  1000/ 1327 batches | lr 0.0001968 | ms/batch 211.85 | loss  4.38 | ppl    79.73 | bpt    6.317 
| epoch  63 |  1200/ 1327 batches | lr 0.0001964 | ms/batch 209.66 | loss  4.33 | ppl    76.16 | bpt    6.251 
-----------------------------------------------------------------------------------------
| end of epoch  63 | time: 335.99s | valid loss  4.23 | valid ppl    68.80 | valid bpt    6.104
-----------------------------------------------------------------------------------------
| epoch  64 |   200/ 1327 batches | lr 0.0001956 | ms/batch 210.65 | loss  4.30 | ppl    73.86 | bpt    6.207 
| epoch  64 |   400/ 1327 batches | lr 0.0001952 | ms/batch 210.63 | loss  4.29 | ppl    72.94 | bpt    6.189 
| epoch  64 |   600/ 1327 batches | lr 0.0001949 | ms/batch 211.41 | loss  4.36 | ppl    77.89 | bpt    6.283 
| epoch  64 |   800/ 1327 batches | lr 0.0001945 | ms/batch 215.62 | loss  4.33 | ppl    75.75 | bpt    6.243 
| epoch  64 |  1000/ 1327 batches | lr 0.0001941 | ms/batch 210.89 | loss  4.39 | ppl    80.28 | bpt    6.327 
| epoch  64 |  1200/ 1327 batches | lr 0.0001937 | ms/batch 212.84 | loss  4.32 | ppl    75.25 | bpt    6.234 
-----------------------------------------------------------------------------------------
| end of epoch  64 | time: 337.71s | valid loss  4.22 | valid ppl    67.82 | valid bpt    6.084
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  65 |   200/ 1327 batches | lr 0.000193 | ms/batch 214.32 | loss  4.30 | ppl    73.88 | bpt    6.207 
| epoch  65 |   400/ 1327 batches | lr 0.0001926 | ms/batch 209.68 | loss  4.29 | ppl    73.00 | bpt    6.190 
| epoch  65 |   600/ 1327 batches | lr 0.0001922 | ms/batch 210.48 | loss  4.35 | ppl    77.20 | bpt    6.271 
| epoch  65 |   800/ 1327 batches | lr 0.0001918 | ms/batch 211.53 | loss  4.31 | ppl    74.78 | bpt    6.225 
| epoch  65 |  1000/ 1327 batches | lr 0.0001914 | ms/batch 210.06 | loss  4.35 | ppl    77.86 | bpt    6.283 
| epoch  65 |  1200/ 1327 batches | lr 0.0001911 | ms/batch 211.93 | loss  4.29 | ppl    73.08 | bpt    6.191 
-----------------------------------------------------------------------------------------
| end of epoch  65 | time: 336.20s | valid loss  4.22 | valid ppl    67.76 | valid bpt    6.082
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  66 |   200/ 1327 batches | lr 0.0001903 | ms/batch 211.11 | loss  4.28 | ppl    71.98 | bpt    6.170 
| epoch  66 |   400/ 1327 batches | lr 0.00019 | ms/batch 211.02 | loss  4.30 | ppl    73.43 | bpt    6.198 
| epoch  66 |   600/ 1327 batches | lr 0.0001896 | ms/batch 209.98 | loss  4.34 | ppl    76.45 | bpt    6.256 
| epoch  66 |   800/ 1327 batches | lr 0.0001892 | ms/batch 211.50 | loss  4.31 | ppl    74.55 | bpt    6.220 
| epoch  66 |  1000/ 1327 batches | lr 0.0001888 | ms/batch 212.45 | loss  4.35 | ppl    77.50 | bpt    6.276 
| epoch  66 |  1200/ 1327 batches | lr 0.0001884 | ms/batch 212.81 | loss  4.30 | ppl    73.60 | bpt    6.202 
-----------------------------------------------------------------------------------------
| end of epoch  66 | time: 336.94s | valid loss  4.20 | valid ppl    67.01 | valid bpt    6.066
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  67 |   200/ 1327 batches | lr 0.0001877 | ms/batch 213.12 | loss  4.27 | ppl    71.51 | bpt    6.160 
| epoch  67 |   400/ 1327 batches | lr 0.0001873 | ms/batch 211.32 | loss  4.27 | ppl    71.27 | bpt    6.155 
| epoch  67 |   600/ 1327 batches | lr 0.0001869 | ms/batch 210.17 | loss  4.33 | ppl    75.90 | bpt    6.246 
| epoch  67 |   800/ 1327 batches | lr 0.0001866 | ms/batch 208.49 | loss  4.29 | ppl    72.67 | bpt    6.183 
| epoch  67 |  1000/ 1327 batches | lr 0.0001862 | ms/batch 211.28 | loss  4.35 | ppl    77.49 | bpt    6.276 
| epoch  67 |  1200/ 1327 batches | lr 0.0001858 | ms/batch 212.61 | loss  4.30 | ppl    73.34 | bpt    6.197 
-----------------------------------------------------------------------------------------
| end of epoch  67 | time: 336.79s | valid loss  4.20 | valid ppl    66.96 | valid bpt    6.065
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  68 |   200/ 1327 batches | lr 0.0001851 | ms/batch 213.29 | loss  4.26 | ppl    70.75 | bpt    6.145 
| epoch  68 |   400/ 1327 batches | lr 0.0001847 | ms/batch 212.96 | loss  4.28 | ppl    71.95 | bpt    6.169 
| epoch  68 |   600/ 1327 batches | lr 0.0001843 | ms/batch 210.90 | loss  4.32 | ppl    74.99 | bpt    6.229 
| epoch  68 |   800/ 1327 batches | lr 0.0001839 | ms/batch 212.25 | loss  4.29 | ppl    72.94 | bpt    6.189 
| epoch  68 |  1000/ 1327 batches | lr 0.0001836 | ms/batch 212.88 | loss  4.34 | ppl    76.98 | bpt    6.266 
| epoch  68 |  1200/ 1327 batches | lr 0.0001832 | ms/batch 206.16 | loss  4.28 | ppl    72.16 | bpt    6.173 
-----------------------------------------------------------------------------------------
| end of epoch  68 | time: 337.20s | valid loss  4.20 | valid ppl    66.91 | valid bpt    6.064
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  69 |   200/ 1327 batches | lr 0.0001825 | ms/batch 209.71 | loss  4.26 | ppl    70.85 | bpt    6.147 
| epoch  69 |   400/ 1327 batches | lr 0.0001821 | ms/batch 209.06 | loss  4.26 | ppl    70.70 | bpt    6.144 
| epoch  69 |   600/ 1327 batches | lr 0.0001817 | ms/batch 210.84 | loss  4.30 | ppl    73.85 | bpt    6.207 
| epoch  69 |   800/ 1327 batches | lr 0.0001813 | ms/batch 206.06 | loss  4.27 | ppl    71.71 | bpt    6.164 
| epoch  69 |  1000/ 1327 batches | lr 0.000181 | ms/batch 210.70 | loss  4.33 | ppl    75.98 | bpt    6.248 
| epoch  69 |  1200/ 1327 batches | lr 0.0001806 | ms/batch 213.53 | loss  4.27 | ppl    71.49 | bpt    6.160 
-----------------------------------------------------------------------------------------
| end of epoch  69 | time: 334.98s | valid loss  4.20 | valid ppl    66.88 | valid bpt    6.064
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  70 |   200/ 1327 batches | lr 0.0001799 | ms/batch 214.79 | loss  4.25 | ppl    70.44 | bpt    6.138 
| epoch  70 |   400/ 1327 batches | lr 0.0001795 | ms/batch 213.29 | loss  4.25 | ppl    70.29 | bpt    6.135 
| epoch  70 |   600/ 1327 batches | lr 0.0001791 | ms/batch 213.03 | loss  4.30 | ppl    73.78 | bpt    6.205 
| epoch  70 |   800/ 1327 batches | lr 0.0001788 | ms/batch 205.01 | loss  4.26 | ppl    71.07 | bpt    6.151 
| epoch  70 |  1000/ 1327 batches | lr 0.0001784 | ms/batch 210.59 | loss  4.33 | ppl    75.64 | bpt    6.241 
| epoch  70 |  1200/ 1327 batches | lr 0.000178 | ms/batch 209.97 | loss  4.28 | ppl    72.17 | bpt    6.173 
-----------------------------------------------------------------------------------------
| end of epoch  70 | time: 337.60s | valid loss  4.20 | valid ppl    66.56 | valid bpt    6.056
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  71 |   200/ 1327 batches | lr 0.0001773 | ms/batch 209.84 | loss  4.24 | ppl    69.70 | bpt    6.123 
| epoch  71 |   400/ 1327 batches | lr 0.0001769 | ms/batch 211.95 | loss  4.24 | ppl    69.52 | bpt    6.119 
| epoch  71 |   600/ 1327 batches | lr 0.0001765 | ms/batch 211.59 | loss  4.30 | ppl    73.89 | bpt    6.207 
| epoch  71 |   800/ 1327 batches | lr 0.0001762 | ms/batch 212.90 | loss  4.27 | ppl    71.40 | bpt    6.158 
| epoch  71 |  1000/ 1327 batches | lr 0.0001758 | ms/batch 210.69 | loss  4.32 | ppl    74.98 | bpt    6.229 
| epoch  71 |  1200/ 1327 batches | lr 0.0001754 | ms/batch 212.99 | loss  4.26 | ppl    70.93 | bpt    6.148 
-----------------------------------------------------------------------------------------
| end of epoch  71 | time: 336.22s | valid loss  4.19 | valid ppl    66.24 | valid bpt    6.050
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  72 |   200/ 1327 batches | lr 0.0001747 | ms/batch 211.62 | loss  4.23 | ppl    68.95 | bpt    6.108 
| epoch  72 |   400/ 1327 batches | lr 0.0001744 | ms/batch 210.77 | loss  4.24 | ppl    69.18 | bpt    6.112 
| epoch  72 |   600/ 1327 batches | lr 0.000174 | ms/batch 212.19 | loss  4.30 | ppl    73.55 | bpt    6.201 
| epoch  72 |   800/ 1327 batches | lr 0.0001736 | ms/batch 211.65 | loss  4.25 | ppl    70.37 | bpt    6.137 
| epoch  72 |  1000/ 1327 batches | lr 0.0001732 | ms/batch 212.02 | loss  4.30 | ppl    73.75 | bpt    6.205 
| epoch  72 |  1200/ 1327 batches | lr 0.0001729 | ms/batch 212.58 | loss  4.25 | ppl    70.40 | bpt    6.138 
-----------------------------------------------------------------------------------------
| end of epoch  72 | time: 337.42s | valid loss  4.19 | valid ppl    65.92 | valid bpt    6.043
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  73 |   200/ 1327 batches | lr 0.0001722 | ms/batch 211.94 | loss  4.23 | ppl    68.50 | bpt    6.098 
| epoch  73 |   400/ 1327 batches | lr 0.0001718 | ms/batch 211.56 | loss  4.22 | ppl    67.97 | bpt    6.087 
| epoch  73 |   600/ 1327 batches | lr 0.0001714 | ms/batch 213.85 | loss  4.29 | ppl    72.72 | bpt    6.184 
| epoch  73 |   800/ 1327 batches | lr 0.0001711 | ms/batch 211.36 | loss  4.25 | ppl    70.12 | bpt    6.132 
| epoch  73 |  1000/ 1327 batches | lr 0.0001707 | ms/batch 211.82 | loss  4.30 | ppl    73.57 | bpt    6.201 
| epoch  73 |  1200/ 1327 batches | lr 0.0001703 | ms/batch 212.30 | loss  4.24 | ppl    69.44 | bpt    6.118 
-----------------------------------------------------------------------------------------
| end of epoch  73 | time: 337.32s | valid loss  4.18 | valid ppl    65.41 | valid bpt    6.031
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  74 |   200/ 1327 batches | lr 0.0001696 | ms/batch 211.42 | loss  4.21 | ppl    67.03 | bpt    6.067 
| epoch  74 |   400/ 1327 batches | lr 0.0001693 | ms/batch 212.18 | loss  4.22 | ppl    67.87 | bpt    6.085 
| epoch  74 |   600/ 1327 batches | lr 0.0001689 | ms/batch 212.01 | loss  4.28 | ppl    72.23 | bpt    6.174 
| epoch  74 |   800/ 1327 batches | lr 0.0001685 | ms/batch 212.24 | loss  4.24 | ppl    69.35 | bpt    6.116 
| epoch  74 |  1000/ 1327 batches | lr 0.0001682 | ms/batch 210.80 | loss  4.29 | ppl    72.75 | bpt    6.185 
| epoch  74 |  1200/ 1327 batches | lr 0.0001678 | ms/batch 212.39 | loss  4.22 | ppl    68.35 | bpt    6.095 
-----------------------------------------------------------------------------------------
| end of epoch  74 | time: 336.85s | valid loss  4.18 | valid ppl    65.46 | valid bpt    6.032
-----------------------------------------------------------------------------------------
| epoch  75 |   200/ 1327 batches | lr 0.0001671 | ms/batch 212.49 | loss  4.21 | ppl    67.25 | bpt    6.072 
| epoch  75 |   400/ 1327 batches | lr 0.0001668 | ms/batch 210.49 | loss  4.21 | ppl    67.08 | bpt    6.068 
| epoch  75 |   600/ 1327 batches | lr 0.0001664 | ms/batch 211.90 | loss  4.26 | ppl    70.90 | bpt    6.148 
| epoch  75 |   800/ 1327 batches | lr 0.000166 | ms/batch 214.25 | loss  4.24 | ppl    69.40 | bpt    6.117 
| epoch  75 |  1000/ 1327 batches | lr 0.0001657 | ms/batch 212.07 | loss  4.29 | ppl    72.78 | bpt    6.186 
| epoch  75 |  1200/ 1327 batches | lr 0.0001653 | ms/batch 211.79 | loss  4.23 | ppl    68.42 | bpt    6.096 
-----------------------------------------------------------------------------------------
| end of epoch  75 | time: 338.56s | valid loss  4.18 | valid ppl    65.54 | valid bpt    6.034
-----------------------------------------------------------------------------------------
| epoch  76 |   200/ 1327 batches | lr 0.0001646 | ms/batch 213.22 | loss  4.20 | ppl    66.57 | bpt    6.057 
| epoch  76 |   400/ 1327 batches | lr 0.0001643 | ms/batch 211.66 | loss  4.22 | ppl    67.74 | bpt    6.082 
| epoch  76 |   600/ 1327 batches | lr 0.0001639 | ms/batch 210.88 | loss  4.26 | ppl    70.85 | bpt    6.147 
| epoch  76 |   800/ 1327 batches | lr 0.0001636 | ms/batch 211.76 | loss  4.25 | ppl    69.92 | bpt    6.128 
| epoch  76 |  1000/ 1327 batches | lr 0.0001632 | ms/batch 211.36 | loss  4.28 | ppl    71.99 | bpt    6.170 
| epoch  76 |  1200/ 1327 batches | lr 0.0001628 | ms/batch 211.81 | loss  4.23 | ppl    68.61 | bpt    6.100 
-----------------------------------------------------------------------------------------
| end of epoch  76 | time: 337.24s | valid loss  4.18 | valid ppl    65.15 | valid bpt    6.026
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  77 |   200/ 1327 batches | lr 0.0001622 | ms/batch 212.75 | loss  4.20 | ppl    66.46 | bpt    6.054 
| epoch  77 |   400/ 1327 batches | lr 0.0001618 | ms/batch 207.34 | loss  4.20 | ppl    66.44 | bpt    6.054 
| epoch  77 |   600/ 1327 batches | lr 0.0001615 | ms/batch 212.94 | loss  4.26 | ppl    70.72 | bpt    6.144 
| epoch  77 |   800/ 1327 batches | lr 0.0001611 | ms/batch 213.29 | loss  4.24 | ppl    69.18 | bpt    6.112 
| epoch  77 |  1000/ 1327 batches | lr 0.0001607 | ms/batch 206.79 | loss  4.26 | ppl    71.05 | bpt    6.151 
| epoch  77 |  1200/ 1327 batches | lr 0.0001604 | ms/batch 211.17 | loss  4.21 | ppl    67.08 | bpt    6.068 
-----------------------------------------------------------------------------------------
| end of epoch  77 | time: 333.70s | valid loss  4.17 | valid ppl    64.99 | valid bpt    6.022
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  78 |   200/ 1327 batches | lr 0.0001597 | ms/batch 211.39 | loss  4.18 | ppl    65.62 | bpt    6.036 
| epoch  78 |   400/ 1327 batches | lr 0.0001594 | ms/batch 212.55 | loss  4.18 | ppl    65.53 | bpt    6.034 
| epoch  78 |   600/ 1327 batches | lr 0.000159 | ms/batch 212.41 | loss  4.24 | ppl    69.68 | bpt    6.123 
| epoch  78 |   800/ 1327 batches | lr 0.0001587 | ms/batch 212.66 | loss  4.24 | ppl    69.07 | bpt    6.110 
| epoch  78 |  1000/ 1327 batches | lr 0.0001583 | ms/batch 213.24 | loss  4.26 | ppl    70.98 | bpt    6.149 
| epoch  78 |  1200/ 1327 batches | lr 0.000158 | ms/batch 212.47 | loss  4.20 | ppl    66.73 | bpt    6.060 
-----------------------------------------------------------------------------------------
| end of epoch  78 | time: 336.80s | valid loss  4.17 | valid ppl    64.76 | valid bpt    6.017
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  79 |   200/ 1327 batches | lr 0.0001573 | ms/batch 214.93 | loss  4.19 | ppl    65.84 | bpt    6.041 
| epoch  79 |   400/ 1327 batches | lr 0.000157 | ms/batch 212.35 | loss  4.17 | ppl    64.90 | bpt    6.020 
| epoch  79 |   600/ 1327 batches | lr 0.0001566 | ms/batch 212.03 | loss  4.24 | ppl    69.19 | bpt    6.112 
| epoch  79 |   800/ 1327 batches | lr 0.0001563 | ms/batch 207.60 | loss  4.21 | ppl    67.38 | bpt    6.074 
| epoch  79 |  1000/ 1327 batches | lr 0.000156 | ms/batch 208.28 | loss  4.26 | ppl    70.49 | bpt    6.139 
| epoch  79 |  1200/ 1327 batches | lr 0.0001556 | ms/batch 210.77 | loss  4.20 | ppl    66.87 | bpt    6.063 
-----------------------------------------------------------------------------------------
| end of epoch  79 | time: 335.82s | valid loss  4.17 | valid ppl    64.74 | valid bpt    6.016
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  80 |   200/ 1327 batches | lr 0.000155 | ms/batch 212.50 | loss  4.17 | ppl    64.88 | bpt    6.020 
| epoch  80 |   400/ 1327 batches | lr 0.0001546 | ms/batch 211.47 | loss  4.17 | ppl    64.60 | bpt    6.013 
| epoch  80 |   600/ 1327 batches | lr 0.0001543 | ms/batch 211.86 | loss  4.24 | ppl    69.13 | bpt    6.111 
| epoch  80 |   800/ 1327 batches | lr 0.0001539 | ms/batch 209.81 | loss  4.20 | ppl    66.60 | bpt    6.057 
| epoch  80 |  1000/ 1327 batches | lr 0.0001536 | ms/batch 209.66 | loss  4.25 | ppl    70.43 | bpt    6.138 
| epoch  80 |  1200/ 1327 batches | lr 0.0001533 | ms/batch 211.56 | loss  4.21 | ppl    67.17 | bpt    6.070 
-----------------------------------------------------------------------------------------
| end of epoch  80 | time: 336.57s | valid loss  4.17 | valid ppl    64.89 | valid bpt    6.020
-----------------------------------------------------------------------------------------
| epoch  81 |   200/ 1327 batches | lr 0.0001526 | ms/batch 214.05 | loss  4.17 | ppl    65.00 | bpt    6.022 
| epoch  81 |   400/ 1327 batches | lr 0.0001523 | ms/batch 208.86 | loss  4.15 | ppl    63.51 | bpt    5.989 
| epoch  81 |   600/ 1327 batches | lr 0.0001519 | ms/batch 213.28 | loss  4.24 | ppl    69.49 | bpt    6.119 
| epoch  81 |   800/ 1327 batches | lr 0.0001516 | ms/batch 206.67 | loss  4.21 | ppl    67.09 | bpt    6.068 
| epoch  81 |  1000/ 1327 batches | lr 0.0001513 | ms/batch 209.46 | loss  4.25 | ppl    70.12 | bpt    6.132 
| epoch  81 |  1200/ 1327 batches | lr 0.0001509 | ms/batch 211.16 | loss  4.18 | ppl    65.51 | bpt    6.034 
-----------------------------------------------------------------------------------------
| end of epoch  81 | time: 334.90s | valid loss  4.16 | valid ppl    64.15 | valid bpt    6.003
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  82 |   200/ 1327 batches | lr 0.0001503 | ms/batch 213.46 | loss  4.16 | ppl    64.08 | bpt    6.002 
| epoch  82 |   400/ 1327 batches | lr 0.00015 | ms/batch 213.26 | loss  4.15 | ppl    63.35 | bpt    5.985 
| epoch  82 |   600/ 1327 batches | lr 0.0001496 | ms/batch 211.02 | loss  4.21 | ppl    67.59 | bpt    6.079 
| epoch  82 |   800/ 1327 batches | lr 0.0001493 | ms/batch 212.93 | loss  4.19 | ppl    66.10 | bpt    6.047 
| epoch  82 |  1000/ 1327 batches | lr 0.000149 | ms/batch 212.51 | loss  4.24 | ppl    69.75 | bpt    6.124 
| epoch  82 |  1200/ 1327 batches | lr 0.0001486 | ms/batch 211.24 | loss  4.17 | ppl    64.56 | bpt    6.012 
-----------------------------------------------------------------------------------------
| end of epoch  82 | time: 337.95s | valid loss  4.17 | valid ppl    64.62 | valid bpt    6.014
-----------------------------------------------------------------------------------------
| epoch  83 |   200/ 1327 batches | lr 0.000148 | ms/batch 213.73 | loss  4.16 | ppl    64.36 | bpt    6.008 
| epoch  83 |   400/ 1327 batches | lr 0.0001477 | ms/batch 214.05 | loss  4.16 | ppl    64.28 | bpt    6.006 
| epoch  83 |   600/ 1327 batches | lr 0.0001474 | ms/batch 213.47 | loss  4.21 | ppl    67.27 | bpt    6.072 
| epoch  83 |   800/ 1327 batches | lr 0.000147 | ms/batch 213.43 | loss  4.20 | ppl    66.71 | bpt    6.060 
| epoch  83 |  1000/ 1327 batches | lr 0.0001467 | ms/batch 208.22 | loss  4.23 | ppl    68.46 | bpt    6.097 
| epoch  83 |  1200/ 1327 batches | lr 0.0001464 | ms/batch 206.58 | loss  4.17 | ppl    64.74 | bpt    6.017 
-----------------------------------------------------------------------------------------
| end of epoch  83 | time: 334.72s | valid loss  4.16 | valid ppl    64.28 | valid bpt    6.006
-----------------------------------------------------------------------------------------
| epoch  84 |   200/ 1327 batches | lr 0.0001458 | ms/batch 210.93 | loss  4.15 | ppl    63.20 | bpt    5.982 
| epoch  84 |   400/ 1327 batches | lr 0.0001455 | ms/batch 210.15 | loss  4.14 | ppl    62.69 | bpt    5.970 
| epoch  84 |   600/ 1327 batches | lr 0.0001452 | ms/batch 211.11 | loss  4.20 | ppl    66.77 | bpt    6.061 
| epoch  84 |   800/ 1327 batches | lr 0.0001448 | ms/batch 210.93 | loss  4.17 | ppl    64.70 | bpt    6.016 
| epoch  84 |  1000/ 1327 batches | lr 0.0001445 | ms/batch 209.61 | loss  4.21 | ppl    67.15 | bpt    6.069 
| epoch  84 |  1200/ 1327 batches | lr 0.0001442 | ms/batch 210.61 | loss  4.16 | ppl    63.77 | bpt    5.995 
-----------------------------------------------------------------------------------------
| end of epoch  84 | time: 335.30s | valid loss  4.16 | valid ppl    63.87 | valid bpt    5.997
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  85 |   200/ 1327 batches | lr 0.0001436 | ms/batch 213.88 | loss  4.15 | ppl    63.45 | bpt    5.988 
| epoch  85 |   400/ 1327 batches | lr 0.0001433 | ms/batch 211.68 | loss  4.13 | ppl    62.36 | bpt    5.963 
| epoch  85 |   600/ 1327 batches | lr 0.000143 | ms/batch 213.15 | loss  4.21 | ppl    67.31 | bpt    6.073 
| epoch  85 |   800/ 1327 batches | lr 0.0001426 | ms/batch 211.86 | loss  4.17 | ppl    64.78 | bpt    6.017 
| epoch  85 |  1000/ 1327 batches | lr 0.0001423 | ms/batch 212.49 | loss  4.22 | ppl    67.80 | bpt    6.083 
| epoch  85 |  1200/ 1327 batches | lr 0.000142 | ms/batch 208.96 | loss  4.15 | ppl    63.72 | bpt    5.994 
-----------------------------------------------------------------------------------------
| end of epoch  85 | time: 337.00s | valid loss  4.16 | valid ppl    63.96 | valid bpt    5.999
-----------------------------------------------------------------------------------------
| epoch  86 |   200/ 1327 batches | lr 0.0001414 | ms/batch 212.90 | loss  4.14 | ppl    63.02 | bpt    5.978 
| epoch  86 |   400/ 1327 batches | lr 0.0001411 | ms/batch 209.84 | loss  4.12 | ppl    61.34 | bpt    5.939 
| epoch  86 |   600/ 1327 batches | lr 0.0001408 | ms/batch 211.32 | loss  4.20 | ppl    66.59 | bpt    6.057 
| epoch  86 |   800/ 1327 batches | lr 0.0001405 | ms/batch 211.92 | loss  4.16 | ppl    64.20 | bpt    6.004 
| epoch  86 |  1000/ 1327 batches | lr 0.0001402 | ms/batch 209.76 | loss  4.20 | ppl    66.90 | bpt    6.064 
| epoch  86 |  1200/ 1327 batches | lr 0.0001399 | ms/batch 209.83 | loss  4.16 | ppl    63.85 | bpt    5.997 
-----------------------------------------------------------------------------------------
| end of epoch  86 | time: 336.59s | valid loss  4.16 | valid ppl    63.96 | valid bpt    5.999
-----------------------------------------------------------------------------------------
| epoch  87 |   200/ 1327 batches | lr 0.0001393 | ms/batch 210.24 | loss  4.12 | ppl    61.77 | bpt    5.949 
| epoch  87 |   400/ 1327 batches | lr 0.000139 | ms/batch 209.22 | loss  4.13 | ppl    62.38 | bpt    5.963 
| epoch  87 |   600/ 1327 batches | lr 0.0001387 | ms/batch 214.41 | loss  4.18 | ppl    65.63 | bpt    6.036 
| epoch  87 |   800/ 1327 batches | lr 0.0001384 | ms/batch 208.45 | loss  4.15 | ppl    63.22 | bpt    5.982 
| epoch  87 |  1000/ 1327 batches | lr 0.0001381 | ms/batch 205.68 | loss  4.21 | ppl    67.22 | bpt    6.071 
| epoch  87 |  1200/ 1327 batches | lr 0.0001378 | ms/batch 212.30 | loss  4.15 | ppl    63.27 | bpt    5.984 
-----------------------------------------------------------------------------------------
| end of epoch  87 | time: 335.45s | valid loss  4.16 | valid ppl    63.77 | valid bpt    5.995
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  88 |   200/ 1327 batches | lr 0.0001372 | ms/batch 211.05 | loss  4.11 | ppl    60.77 | bpt    5.925 
| epoch  88 |   400/ 1327 batches | lr 0.0001369 | ms/batch 213.91 | loss  4.10 | ppl    60.62 | bpt    5.922 
| epoch  88 |   600/ 1327 batches | lr 0.0001366 | ms/batch 211.33 | loss  4.18 | ppl    65.13 | bpt    6.025 
| epoch  88 |   800/ 1327 batches | lr 0.0001363 | ms/batch 212.45 | loss  4.17 | ppl    64.44 | bpt    6.010 
| epoch  88 |  1000/ 1327 batches | lr 0.000136 | ms/batch 211.87 | loss  4.21 | ppl    67.28 | bpt    6.072 
| epoch  88 |  1200/ 1327 batches | lr 0.0001357 | ms/batch 212.82 | loss  4.14 | ppl    63.07 | bpt    5.979 
-----------------------------------------------------------------------------------------
| end of epoch  88 | time: 336.66s | valid loss  4.15 | valid ppl    63.44 | valid bpt    5.987
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  89 |   200/ 1327 batches | lr 0.0001352 | ms/batch 215.01 | loss  4.12 | ppl    61.65 | bpt    5.946 
| epoch  89 |   400/ 1327 batches | lr 0.0001349 | ms/batch 213.52 | loss  4.10 | ppl    60.50 | bpt    5.919 
| epoch  89 |   600/ 1327 batches | lr 0.0001346 | ms/batch 211.22 | loss  4.19 | ppl    65.77 | bpt    6.039 
| epoch  89 |   800/ 1327 batches | lr 0.0001343 | ms/batch 213.36 | loss  4.15 | ppl    63.52 | bpt    5.989 
| epoch  89 |  1000/ 1327 batches | lr 0.000134 | ms/batch 213.22 | loss  4.19 | ppl    66.01 | bpt    6.045 
| epoch  89 |  1200/ 1327 batches | lr 0.0001337 | ms/batch 213.73 | loss  4.13 | ppl    62.42 | bpt    5.964 
-----------------------------------------------------------------------------------------
| end of epoch  89 | time: 337.51s | valid loss  4.15 | valid ppl    63.55 | valid bpt    5.990
-----------------------------------------------------------------------------------------
| epoch  90 |   200/ 1327 batches | lr 0.0001332 | ms/batch 211.30 | loss  4.11 | ppl    60.94 | bpt    5.929 
| epoch  90 |   400/ 1327 batches | lr 0.0001329 | ms/batch 213.38 | loss  4.11 | ppl    60.94 | bpt    5.929 
| epoch  90 |   600/ 1327 batches | lr 0.0001326 | ms/batch 211.96 | loss  4.17 | ppl    64.75 | bpt    6.017 
| epoch  90 |   800/ 1327 batches | lr 0.0001323 | ms/batch 213.31 | loss  4.15 | ppl    63.26 | bpt    5.983 
| epoch  90 |  1000/ 1327 batches | lr 0.0001321 | ms/batch 212.17 | loss  4.20 | ppl    66.59 | bpt    6.057 
| epoch  90 |  1200/ 1327 batches | lr 0.0001318 | ms/batch 210.88 | loss  4.12 | ppl    61.79 | bpt    5.949 
-----------------------------------------------------------------------------------------
| end of epoch  90 | time: 336.95s | valid loss  4.16 | valid ppl    63.76 | valid bpt    5.995
-----------------------------------------------------------------------------------------
| epoch  91 |   200/ 1327 batches | lr 0.0001313 | ms/batch 212.99 | loss  4.11 | ppl    60.91 | bpt    5.929 
| epoch  91 |   400/ 1327 batches | lr 0.000131 | ms/batch 211.01 | loss  4.09 | ppl    59.55 | bpt    5.896 
| epoch  91 |   600/ 1327 batches | lr 0.0001307 | ms/batch 212.27 | loss  4.19 | ppl    65.92 | bpt    6.043 
| epoch  91 |   800/ 1327 batches | lr 0.0001304 | ms/batch 212.63 | loss  4.14 | ppl    62.79 | bpt    5.973 
| epoch  91 |  1000/ 1327 batches | lr 0.0001301 | ms/batch 208.66 | loss  4.18 | ppl    65.23 | bpt    6.027 
| epoch  91 |  1200/ 1327 batches | lr 0.0001299 | ms/batch 210.97 | loss  4.12 | ppl    61.50 | bpt    5.943 
-----------------------------------------------------------------------------------------
| end of epoch  91 | time: 335.90s | valid loss  4.15 | valid ppl    63.25 | valid bpt    5.983
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  92 |   200/ 1327 batches | lr 0.0001294 | ms/batch 207.43 | loss  4.09 | ppl    59.58 | bpt    5.897 
| epoch  92 |   400/ 1327 batches | lr 0.0001291 | ms/batch 210.71 | loss  4.10 | ppl    60.25 | bpt    5.913 
| epoch  92 |   600/ 1327 batches | lr 0.0001288 | ms/batch 212.72 | loss  4.15 | ppl    63.67 | bpt    5.992 
| epoch  92 |   800/ 1327 batches | lr 0.0001285 | ms/batch 210.57 | loss  4.14 | ppl    62.65 | bpt    5.969 
| epoch  92 |  1000/ 1327 batches | lr 0.0001283 | ms/batch 212.64 | loss  4.18 | ppl    65.32 | bpt    6.029 
| epoch  92 |  1200/ 1327 batches | lr 0.000128 | ms/batch 210.35 | loss  4.12 | ppl    61.52 | bpt    5.943 
-----------------------------------------------------------------------------------------
| end of epoch  92 | time: 335.73s | valid loss  4.14 | valid ppl    62.82 | valid bpt    5.973
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch  93 |   200/ 1327 batches | lr 0.0001275 | ms/batch 210.44 | loss  4.08 | ppl    59.25 | bpt    5.889 
| epoch  93 |   400/ 1327 batches | lr 0.0001272 | ms/batch 211.04 | loss  4.08 | ppl    59.23 | bpt    5.888 
| epoch  93 |   600/ 1327 batches | lr 0.000127 | ms/batch 214.57 | loss  4.16 | ppl    63.96 | bpt    5.999 
| epoch  93 |   800/ 1327 batches | lr 0.0001267 | ms/batch 212.91 | loss  4.12 | ppl    61.78 | bpt    5.949 
| epoch  93 |  1000/ 1327 batches | lr 0.0001265 | ms/batch 210.37 | loss  4.16 | ppl    63.92 | bpt    5.998 
| epoch  93 |  1200/ 1327 batches | lr 0.0001262 | ms/batch 207.13 | loss  4.11 | ppl    60.80 | bpt    5.926 
-----------------------------------------------------------------------------------------
| end of epoch  93 | time: 335.58s | valid loss  4.15 | valid ppl    63.51 | valid bpt    5.989
-----------------------------------------------------------------------------------------
| epoch  94 |   200/ 1327 batches | lr 0.0001257 | ms/batch 210.74 | loss  4.09 | ppl    59.45 | bpt    5.894 
| epoch  94 |   400/ 1327 batches | lr 0.0001255 | ms/batch 214.33 | loss  4.07 | ppl    58.75 | bpt    5.877 
| epoch  94 |   600/ 1327 batches | lr 0.0001252 | ms/batch 213.54 | loss  4.15 | ppl    63.23 | bpt    5.982 
| epoch  94 |   800/ 1327 batches | lr 0.0001249 | ms/batch 212.29 | loss  4.13 | ppl    62.35 | bpt    5.962 
| epoch  94 |  1000/ 1327 batches | lr 0.0001247 | ms/batch 212.06 | loss  4.16 | ppl    64.24 | bpt    6.005 
| epoch  94 |  1200/ 1327 batches | lr 0.0001244 | ms/batch 211.45 | loss  4.10 | ppl    60.53 | bpt    5.919 
-----------------------------------------------------------------------------------------
| end of epoch  94 | time: 336.32s | valid loss  4.15 | valid ppl    63.22 | valid bpt    5.982
-----------------------------------------------------------------------------------------
| epoch  95 |   200/ 1327 batches | lr 0.000124 | ms/batch 210.35 | loss  4.08 | ppl    59.17 | bpt    5.887 
| epoch  95 |   400/ 1327 batches | lr 0.0001237 | ms/batch 211.98 | loss  4.07 | ppl    58.27 | bpt    5.865 
| epoch  95 |   600/ 1327 batches | lr 0.0001235 | ms/batch 212.75 | loss  4.13 | ppl    62.17 | bpt    5.958 
| epoch  95 |   800/ 1327 batches | lr 0.0001232 | ms/batch 212.35 | loss  4.12 | ppl    61.32 | bpt    5.938 
| epoch  95 |  1000/ 1327 batches | lr 0.000123 | ms/batch 210.88 | loss  4.16 | ppl    63.91 | bpt    5.998 
| epoch  95 |  1200/ 1327 batches | lr 0.0001227 | ms/batch 209.59 | loss  4.09 | ppl    59.77 | bpt    5.901 
-----------------------------------------------------------------------------------------
| end of epoch  95 | time: 336.16s | valid loss  4.14 | valid ppl    63.00 | valid bpt    5.977
-----------------------------------------------------------------------------------------
| epoch  96 |   200/ 1327 batches | lr 0.0001223 | ms/batch 211.17 | loss  4.08 | ppl    58.90 | bpt    5.880 
| epoch  96 |   400/ 1327 batches | lr 0.000122 | ms/batch 213.36 | loss  4.05 | ppl    57.53 | bpt    5.846 
| epoch  96 |   600/ 1327 batches | lr 0.0001218 | ms/batch 209.98 | loss  4.14 | ppl    62.81 | bpt    5.973 
| epoch  96 |   800/ 1327 batches | lr 0.0001216 | ms/batch 208.80 | loss  4.11 | ppl    60.72 | bpt    5.924 
| epoch  96 |  1000/ 1327 batches | lr 0.0001213 | ms/batch 207.81 | loss  4.14 | ppl    62.75 | bpt    5.972 
| epoch  96 |  1200/ 1327 batches | lr 0.0001211 | ms/batch 211.86 | loss  4.10 | ppl    60.59 | bpt    5.921 
-----------------------------------------------------------------------------------------
| end of epoch  96 | time: 335.36s | valid loss  4.14 | valid ppl    62.84 | valid bpt    5.974
-----------------------------------------------------------------------------------------
| epoch  97 |   200/ 1327 batches | lr 0.0001206 | ms/batch 210.69 | loss  4.06 | ppl    58.09 | bpt    5.860 
| epoch  97 |   400/ 1327 batches | lr 0.0001204 | ms/batch 209.89 | loss  4.06 | ppl    58.20 | bpt    5.863 
| epoch  97 |   600/ 1327 batches | lr 0.0001202 | ms/batch 211.03 | loss  4.13 | ppl    61.97 | bpt    5.954 
| epoch  97 |   800/ 1327 batches | lr 0.0001199 | ms/batch 210.33 | loss  4.10 | ppl    60.33 | bpt    5.915 
| epoch  97 |  1000/ 1327 batches | lr 0.0001197 | ms/batch 209.12 | loss  4.15 | ppl    63.20 | bpt    5.982 
| epoch  97 |  1200/ 1327 batches | lr 0.0001195 | ms/batch 209.75 | loss  4.09 | ppl    59.45 | bpt    5.894 
-----------------------------------------------------------------------------------------
| end of epoch  97 | time: 337.69s | valid loss  4.14 | valid ppl    63.09 | valid bpt    5.979
-----------------------------------------------------------------------------------------
| epoch  98 |   200/ 1327 batches | lr 0.000119 | ms/batch 213.06 | loss  4.05 | ppl    57.56 | bpt    5.847 
| epoch  98 |   400/ 1327 batches | lr 0.0001188 | ms/batch 210.57 | loss  4.05 | ppl    57.42 | bpt    5.843 
| epoch  98 |   600/ 1327 batches | lr 0.0001186 | ms/batch 211.09 | loss  4.13 | ppl    61.92 | bpt    5.952 
| epoch  98 |   800/ 1327 batches | lr 0.0001184 | ms/batch 210.68 | loss  4.09 | ppl    59.94 | bpt    5.905 
| epoch  98 |  1000/ 1327 batches | lr 0.0001182 | ms/batch 209.90 | loss  4.13 | ppl    62.42 | bpt    5.964 
| epoch  98 |  1200/ 1327 batches | lr 0.0001179 | ms/batch 209.83 | loss  4.07 | ppl    58.73 | bpt    5.876 
-----------------------------------------------------------------------------------------
| end of epoch  98 | time: 337.08s | valid loss  4.14 | valid ppl    63.06 | valid bpt    5.979
-----------------------------------------------------------------------------------------
| epoch  99 |   200/ 1327 batches | lr 0.0001175 | ms/batch 211.36 | loss  4.05 | ppl    57.47 | bpt    5.845 
| epoch  99 |   400/ 1327 batches | lr 0.0001173 | ms/batch 210.38 | loss  4.05 | ppl    57.40 | bpt    5.843 
| epoch  99 |   600/ 1327 batches | lr 0.0001171 | ms/batch 211.64 | loss  4.11 | ppl    61.04 | bpt    5.932 
| epoch  99 |   800/ 1327 batches | lr 0.0001169 | ms/batch 213.01 | loss  4.07 | ppl    58.73 | bpt    5.876 
| epoch  99 |  1000/ 1327 batches | lr 0.0001167 | ms/batch 209.77 | loss  4.13 | ppl    61.93 | bpt    5.953 
| epoch  99 |  1200/ 1327 batches | lr 0.0001164 | ms/batch 209.84 | loss  4.08 | ppl    59.27 | bpt    5.889 
-----------------------------------------------------------------------------------------
| end of epoch  99 | time: 335.88s | valid loss  4.14 | valid ppl    62.96 | valid bpt    5.976
-----------------------------------------------------------------------------------------
| epoch 100 |   200/ 1327 batches | lr 0.000116 | ms/batch 213.45 | loss  4.06 | ppl    57.80 | bpt    5.853 
| epoch 100 |   400/ 1327 batches | lr 0.0001158 | ms/batch 213.04 | loss  4.04 | ppl    56.59 | bpt    5.823 
| epoch 100 |   600/ 1327 batches | lr 0.0001156 | ms/batch 212.34 | loss  4.11 | ppl    61.00 | bpt    5.931 
| epoch 100 |   800/ 1327 batches | lr 0.0001154 | ms/batch 211.09 | loss  4.10 | ppl    60.41 | bpt    5.917 
| epoch 100 |  1000/ 1327 batches | lr 0.0001152 | ms/batch 209.25 | loss  4.14 | ppl    62.55 | bpt    5.967 
| epoch 100 |  1200/ 1327 batches | lr 0.000115 | ms/batch 213.29 | loss  4.07 | ppl    58.33 | bpt    5.866 
-----------------------------------------------------------------------------------------
| end of epoch 100 | time: 335.84s | valid loss  4.14 | valid ppl    62.91 | valid bpt    5.975
-----------------------------------------------------------------------------------------
| epoch 101 |   200/ 1327 batches | lr 0.0001146 | ms/batch 211.90 | loss  4.05 | ppl    57.55 | bpt    5.847 
| epoch 101 |   400/ 1327 batches | lr 0.0001144 | ms/batch 211.49 | loss  4.04 | ppl    57.05 | bpt    5.834 
| epoch 101 |   600/ 1327 batches | lr 0.0001142 | ms/batch 210.25 | loss  4.12 | ppl    61.27 | bpt    5.937 
| epoch 101 |   800/ 1327 batches | lr 0.0001141 | ms/batch 208.26 | loss  4.07 | ppl    58.63 | bpt    5.874 
| epoch 101 |  1000/ 1327 batches | lr 0.0001139 | ms/batch 211.86 | loss  4.12 | ppl    61.37 | bpt    5.939 
| epoch 101 |  1200/ 1327 batches | lr 0.0001137 | ms/batch 211.66 | loss  4.08 | ppl    58.99 | bpt    5.882 
-----------------------------------------------------------------------------------------
| end of epoch 101 | time: 337.31s | valid loss  4.14 | valid ppl    62.59 | valid bpt    5.968
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch 102 |   200/ 1327 batches | lr 0.0001133 | ms/batch 214.39 | loss  4.04 | ppl    56.91 | bpt    5.830 
| epoch 102 |   400/ 1327 batches | lr 0.0001131 | ms/batch 209.34 | loss  4.03 | ppl    56.38 | bpt    5.817 
| epoch 102 |   600/ 1327 batches | lr 0.0001129 | ms/batch 208.52 | loss  4.10 | ppl    60.53 | bpt    5.920 
| epoch 102 |   800/ 1327 batches | lr 0.0001127 | ms/batch 212.28 | loss  4.07 | ppl    58.52 | bpt    5.871 
| epoch 102 |  1000/ 1327 batches | lr 0.0001125 | ms/batch 210.64 | loss  4.12 | ppl    61.51 | bpt    5.943 
| epoch 102 |  1200/ 1327 batches | lr 0.0001123 | ms/batch 212.45 | loss  4.04 | ppl    56.80 | bpt    5.828 
-----------------------------------------------------------------------------------------
| end of epoch 102 | time: 335.23s | valid loss  4.13 | valid ppl    62.25 | valid bpt    5.960
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch 103 |   200/ 1327 batches | lr 0.000112 | ms/batch 211.53 | loss  4.03 | ppl    56.14 | bpt    5.811 
| epoch 103 |   400/ 1327 batches | lr 0.0001118 | ms/batch 209.52 | loss  4.02 | ppl    55.84 | bpt    5.803 
| epoch 103 |   600/ 1327 batches | lr 0.0001116 | ms/batch 213.37 | loss  4.09 | ppl    59.76 | bpt    5.901 
| epoch 103 |   800/ 1327 batches | lr 0.0001115 | ms/batch 213.53 | loss  4.07 | ppl    58.60 | bpt    5.873 
| epoch 103 |  1000/ 1327 batches | lr 0.0001113 | ms/batch 212.14 | loss  4.12 | ppl    61.80 | bpt    5.950 
| epoch 103 |  1200/ 1327 batches | lr 0.0001111 | ms/batch 210.81 | loss  4.05 | ppl    57.39 | bpt    5.843 
-----------------------------------------------------------------------------------------
| end of epoch 103 | time: 336.12s | valid loss  4.13 | valid ppl    62.33 | valid bpt    5.962
-----------------------------------------------------------------------------------------
| epoch 104 |   200/ 1327 batches | lr 0.0001108 | ms/batch 211.07 | loss  4.03 | ppl    56.39 | bpt    5.817 
| epoch 104 |   400/ 1327 batches | lr 0.0001106 | ms/batch 211.47 | loss  4.02 | ppl    55.48 | bpt    5.794 
| epoch 104 |   600/ 1327 batches | lr 0.0001104 | ms/batch 214.54 | loss  4.09 | ppl    59.96 | bpt    5.906 
| epoch 104 |   800/ 1327 batches | lr 0.0001103 | ms/batch 213.88 | loss  4.06 | ppl    57.75 | bpt    5.852 
| epoch 104 |  1000/ 1327 batches | lr 0.0001101 | ms/batch 213.95 | loss  4.12 | ppl    61.35 | bpt    5.939 
| epoch 104 |  1200/ 1327 batches | lr 0.0001099 | ms/batch 207.24 | loss  4.07 | ppl    58.35 | bpt    5.867 
-----------------------------------------------------------------------------------------
| end of epoch 104 | time: 334.86s | valid loss  4.14 | valid ppl    62.82 | valid bpt    5.973
-----------------------------------------------------------------------------------------
| epoch 105 |   200/ 1327 batches | lr 0.0001096 | ms/batch 208.86 | loss  4.02 | ppl    55.79 | bpt    5.802 
| epoch 105 |   400/ 1327 batches | lr 0.0001095 | ms/batch 212.98 | loss  4.03 | ppl    56.05 | bpt    5.809 
| epoch 105 |   600/ 1327 batches | lr 0.0001093 | ms/batch 215.12 | loss  4.09 | ppl    59.69 | bpt    5.899 
| epoch 105 |   800/ 1327 batches | lr 0.0001091 | ms/batch 212.38 | loss  4.06 | ppl    57.69 | bpt    5.850 
| epoch 105 |  1000/ 1327 batches | lr 0.000109 | ms/batch 213.96 | loss  4.12 | ppl    61.48 | bpt    5.942 
| epoch 105 |  1200/ 1327 batches | lr 0.0001088 | ms/batch 211.63 | loss  4.03 | ppl    56.43 | bpt    5.818 
-----------------------------------------------------------------------------------------
| end of epoch 105 | time: 337.05s | valid loss  4.13 | valid ppl    62.41 | valid bpt    5.964
-----------------------------------------------------------------------------------------
| epoch 106 |   200/ 1327 batches | lr 0.0001085 | ms/batch 212.61 | loss  4.02 | ppl    55.51 | bpt    5.795 
| epoch 106 |   400/ 1327 batches | lr 0.0001084 | ms/batch 210.32 | loss  4.01 | ppl    55.35 | bpt    5.790 
| epoch 106 |   600/ 1327 batches | lr 0.0001082 | ms/batch 210.58 | loss  4.09 | ppl    59.71 | bpt    5.900 
| epoch 106 |   800/ 1327 batches | lr 0.0001081 | ms/batch 212.01 | loss  4.05 | ppl    57.50 | bpt    5.845 
| epoch 106 |  1000/ 1327 batches | lr 0.0001079 | ms/batch 209.99 | loss  4.09 | ppl    59.65 | bpt    5.899 
| epoch 106 |  1200/ 1327 batches | lr 0.0001078 | ms/batch 214.13 | loss  4.04 | ppl    57.11 | bpt    5.836 
-----------------------------------------------------------------------------------------
| end of epoch 106 | time: 337.39s | valid loss  4.13 | valid ppl    62.32 | valid bpt    5.962
-----------------------------------------------------------------------------------------
| epoch 107 |   200/ 1327 batches | lr 0.0001075 | ms/batch 214.44 | loss  4.02 | ppl    55.60 | bpt    5.797 
| epoch 107 |   400/ 1327 batches | lr 0.0001073 | ms/batch 213.90 | loss  4.01 | ppl    55.38 | bpt    5.791 
| epoch 107 |   600/ 1327 batches | lr 0.0001072 | ms/batch 212.29 | loss  4.09 | ppl    59.44 | bpt    5.893 
| epoch 107 |   800/ 1327 batches | lr 0.0001071 | ms/batch 211.48 | loss  4.06 | ppl    57.95 | bpt    5.857 
| epoch 107 |  1000/ 1327 batches | lr 0.0001069 | ms/batch 210.88 | loss  4.11 | ppl    61.21 | bpt    5.936 
| epoch 107 |  1200/ 1327 batches | lr 0.0001068 | ms/batch 212.74 | loss  4.03 | ppl    56.42 | bpt    5.818 
-----------------------------------------------------------------------------------------
| end of epoch 107 | time: 336.15s | valid loss  4.13 | valid ppl    62.09 | valid bpt    5.956
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch 108 |   200/ 1327 batches | lr 0.0001065 | ms/batch 212.15 | loss  4.02 | ppl    55.47 | bpt    5.794 
| epoch 108 |   400/ 1327 batches | lr 0.0001064 | ms/batch 208.95 | loss  4.02 | ppl    55.90 | bpt    5.805 
| epoch 108 |   600/ 1327 batches | lr 0.0001063 | ms/batch 211.51 | loss  4.07 | ppl    58.51 | bpt    5.871 
| epoch 108 |   800/ 1327 batches | lr 0.0001061 | ms/batch 210.34 | loss  4.04 | ppl    57.04 | bpt    5.834 
| epoch 108 |  1000/ 1327 batches | lr 0.000106 | ms/batch 211.83 | loss  4.10 | ppl    60.12 | bpt    5.910 
| epoch 108 |  1200/ 1327 batches | lr 0.0001059 | ms/batch 210.62 | loss  4.04 | ppl    56.80 | bpt    5.828 
-----------------------------------------------------------------------------------------
| end of epoch 108 | time: 335.87s | valid loss  4.13 | valid ppl    62.34 | valid bpt    5.962
-----------------------------------------------------------------------------------------
| epoch 109 |   200/ 1327 batches | lr 0.0001056 | ms/batch 213.90 | loss  4.01 | ppl    55.37 | bpt    5.791 
| epoch 109 |   400/ 1327 batches | lr 0.0001055 | ms/batch 211.26 | loss  3.98 | ppl    53.71 | bpt    5.747 
| epoch 109 |   600/ 1327 batches | lr 0.0001054 | ms/batch 210.52 | loss  4.08 | ppl    58.95 | bpt    5.881 
| epoch 109 |   800/ 1327 batches | lr 0.0001052 | ms/batch 212.55 | loss  4.04 | ppl    56.68 | bpt    5.825 
| epoch 109 |  1000/ 1327 batches | lr 0.0001051 | ms/batch 211.30 | loss  4.10 | ppl    60.46 | bpt    5.918 
| epoch 109 |  1200/ 1327 batches | lr 0.000105 | ms/batch 213.29 | loss  4.04 | ppl    56.71 | bpt    5.826 
-----------------------------------------------------------------------------------------
| end of epoch 109 | time: 337.34s | valid loss  4.13 | valid ppl    61.97 | valid bpt    5.953
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch 110 |   200/ 1327 batches | lr 0.0001048 | ms/batch 210.45 | loss  4.00 | ppl    54.57 | bpt    5.770 
| epoch 110 |   400/ 1327 batches | lr 0.0001047 | ms/batch 208.43 | loss  4.01 | ppl    55.25 | bpt    5.788 
| epoch 110 |   600/ 1327 batches | lr 0.0001045 | ms/batch 212.05 | loss  4.07 | ppl    58.84 | bpt    5.879 
| epoch 110 |   800/ 1327 batches | lr 0.0001044 | ms/batch 211.25 | loss  4.02 | ppl    55.71 | bpt    5.800 
| epoch 110 |  1000/ 1327 batches | lr 0.0001043 | ms/batch 210.49 | loss  4.09 | ppl    59.86 | bpt    5.904 
| epoch 110 |  1200/ 1327 batches | lr 0.0001042 | ms/batch 211.91 | loss  4.03 | ppl    56.15 | bpt    5.811 
-----------------------------------------------------------------------------------------
| end of epoch 110 | time: 336.55s | valid loss  4.13 | valid ppl    62.32 | valid bpt    5.962
-----------------------------------------------------------------------------------------
| epoch 111 |   200/ 1327 batches | lr 0.000104 | ms/batch 210.56 | loss  4.00 | ppl    54.62 | bpt    5.771 
| epoch 111 |   400/ 1327 batches | lr 0.0001039 | ms/batch 214.88 | loss  3.99 | ppl    53.83 | bpt    5.750 
| epoch 111 |   600/ 1327 batches | lr 0.0001038 | ms/batch 209.73 | loss  4.08 | ppl    59.10 | bpt    5.885 
| epoch 111 |   800/ 1327 batches | lr 0.0001037 | ms/batch 212.18 | loss  4.04 | ppl    57.06 | bpt    5.834 
| epoch 111 |  1000/ 1327 batches | lr 0.0001036 | ms/batch 212.23 | loss  4.08 | ppl    59.00 | bpt    5.883 
| epoch 111 |  1200/ 1327 batches | lr 0.0001035 | ms/batch 214.63 | loss  4.01 | ppl    55.16 | bpt    5.785 
-----------------------------------------------------------------------------------------
| end of epoch 111 | time: 335.64s | valid loss  4.13 | valid ppl    61.95 | valid bpt    5.953
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch 112 |   200/ 1327 batches | lr 0.0001033 | ms/batch 211.09 | loss  4.00 | ppl    54.40 | bpt    5.765 
| epoch 112 |   400/ 1327 batches | lr 0.0001032 | ms/batch 213.13 | loss  4.00 | ppl    54.54 | bpt    5.769 
| epoch 112 |   600/ 1327 batches | lr 0.0001031 | ms/batch 212.73 | loss  4.05 | ppl    57.50 | bpt    5.845 
| epoch 112 |   800/ 1327 batches | lr 0.000103 | ms/batch 211.68 | loss  4.04 | ppl    56.73 | bpt    5.826 
| epoch 112 |  1000/ 1327 batches | lr 0.0001029 | ms/batch 210.45 | loss  4.06 | ppl    58.11 | bpt    5.861 
| epoch 112 |  1200/ 1327 batches | lr 0.0001028 | ms/batch 207.90 | loss  4.02 | ppl    55.63 | bpt    5.798 
-----------------------------------------------------------------------------------------
| end of epoch 112 | time: 337.17s | valid loss  4.13 | valid ppl    61.93 | valid bpt    5.953
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch 113 |   200/ 1327 batches | lr 0.0001027 | ms/batch 211.48 | loss  4.01 | ppl    54.89 | bpt    5.778 
| epoch 113 |   400/ 1327 batches | lr 0.0001026 | ms/batch 210.25 | loss  3.98 | ppl    53.65 | bpt    5.746 
| epoch 113 |   600/ 1327 batches | lr 0.0001025 | ms/batch 207.94 | loss  4.05 | ppl    57.19 | bpt    5.838 
| epoch 113 |   800/ 1327 batches | lr 0.0001024 | ms/batch 208.86 | loss  4.03 | ppl    56.08 | bpt    5.809 
| epoch 113 |  1000/ 1327 batches | lr 0.0001023 | ms/batch 211.54 | loss  4.07 | ppl    58.80 | bpt    5.878 
| epoch 113 |  1200/ 1327 batches | lr 0.0001022 | ms/batch 211.19 | loss  4.02 | ppl    55.84 | bpt    5.803 
-----------------------------------------------------------------------------------------
| end of epoch 113 | time: 336.25s | valid loss  4.13 | valid ppl    61.96 | valid bpt    5.953
-----------------------------------------------------------------------------------------
| epoch 114 |   200/ 1327 batches | lr 0.0001021 | ms/batch 210.47 | loss  3.99 | ppl    54.20 | bpt    5.760 
| epoch 114 |   400/ 1327 batches | lr 0.000102 | ms/batch 209.54 | loss  3.99 | ppl    53.85 | bpt    5.751 
| epoch 114 |   600/ 1327 batches | lr 0.0001019 | ms/batch 210.60 | loss  4.07 | ppl    58.42 | bpt    5.868 
| epoch 114 |   800/ 1327 batches | lr 0.0001019 | ms/batch 211.34 | loss  4.03 | ppl    56.11 | bpt    5.810 
| epoch 114 |  1000/ 1327 batches | lr 0.0001018 | ms/batch 211.92 | loss  4.07 | ppl    58.59 | bpt    5.872 
| epoch 114 |  1200/ 1327 batches | lr 0.0001017 | ms/batch 213.55 | loss  4.01 | ppl    54.99 | bpt    5.781 
-----------------------------------------------------------------------------------------
| end of epoch 114 | time: 335.23s | valid loss  4.12 | valid ppl    61.81 | valid bpt    5.950
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch 115 |   200/ 1327 batches | lr 0.0001016 | ms/batch 209.72 | loss  3.97 | ppl    53.20 | bpt    5.733 
| epoch 115 |   400/ 1327 batches | lr 0.0001015 | ms/batch 209.34 | loss  3.98 | ppl    53.57 | bpt    5.743 
| epoch 115 |   600/ 1327 batches | lr 0.0001014 | ms/batch 210.61 | loss  4.07 | ppl    58.49 | bpt    5.870 
| epoch 115 |   800/ 1327 batches | lr 0.0001014 | ms/batch 212.34 | loss  4.03 | ppl    56.30 | bpt    5.815 
| epoch 115 |  1000/ 1327 batches | lr 0.0001013 | ms/batch 210.58 | loss  4.06 | ppl    57.73 | bpt    5.851 
| epoch 115 |  1200/ 1327 batches | lr 0.0001013 | ms/batch 211.78 | loss  4.01 | ppl    55.22 | bpt    5.787 
-----------------------------------------------------------------------------------------
| end of epoch 115 | time: 335.77s | valid loss  4.12 | valid ppl    61.65 | valid bpt    5.946
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch 116 |   200/ 1327 batches | lr 0.0001011 | ms/batch 213.40 | loss  3.97 | ppl    53.10 | bpt    5.731 
| epoch 116 |   400/ 1327 batches | lr 0.0001011 | ms/batch 213.08 | loss  3.98 | ppl    53.56 | bpt    5.743 
| epoch 116 |   600/ 1327 batches | lr 0.000101 | ms/batch 211.84 | loss  4.05 | ppl    57.62 | bpt    5.848 
| epoch 116 |   800/ 1327 batches | lr 0.000101 | ms/batch 214.49 | loss  4.03 | ppl    56.18 | bpt    5.812 
| epoch 116 |  1000/ 1327 batches | lr 0.0001009 | ms/batch 205.34 | loss  4.06 | ppl    57.97 | bpt    5.857 
| epoch 116 |  1200/ 1327 batches | lr 0.0001009 | ms/batch 211.64 | loss  4.00 | ppl    54.78 | bpt    5.775 
-----------------------------------------------------------------------------------------
| end of epoch 116 | time: 335.84s | valid loss  4.12 | valid ppl    61.61 | valid bpt    5.945
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch 117 |   200/ 1327 batches | lr 0.0001008 | ms/batch 212.53 | loss  3.97 | ppl    53.07 | bpt    5.730 
| epoch 117 |   400/ 1327 batches | lr 0.0001007 | ms/batch 211.88 | loss  3.96 | ppl    52.40 | bpt    5.711 
| epoch 117 |   600/ 1327 batches | lr 0.0001007 | ms/batch 208.05 | loss  4.03 | ppl    56.18 | bpt    5.812 
| epoch 117 |   800/ 1327 batches | lr 0.0001006 | ms/batch 210.83 | loss  4.00 | ppl    54.52 | bpt    5.769 
| epoch 117 |  1000/ 1327 batches | lr 0.0001006 | ms/batch 207.87 | loss  4.06 | ppl    57.81 | bpt    5.853 
| epoch 117 |  1200/ 1327 batches | lr 0.0001006 | ms/batch 213.66 | loss  4.00 | ppl    54.46 | bpt    5.767 
-----------------------------------------------------------------------------------------
| end of epoch 117 | time: 336.00s | valid loss  4.12 | valid ppl    61.52 | valid bpt    5.943
-----------------------------------------------------------------------------------------
Saving model (new best validation)
| epoch 118 |   200/ 1327 batches | lr 0.0001005 | ms/batch 212.62 | loss  3.96 | ppl    52.32 | bpt    5.709 
| epoch 118 |   400/ 1327 batches | lr 0.0001004 | ms/batch 212.41 | loss  3.98 | ppl    53.35 | bpt    5.738 
| epoch 118 |   600/ 1327 batches | lr 0.0001004 | ms/batch 210.93 | loss  4.06 | ppl    58.04 | bpt    5.859 
| epoch 118 |   800/ 1327 batches | lr 0.0001004 | ms/batch 210.99 | loss  4.01 | ppl    55.17 | bpt    5.786 
| epoch 118 |  1000/ 1327 batches | lr 0.0001003 | ms/batch 213.74 | loss  4.07 | ppl    58.34 | bpt    5.866 
| epoch 118 |  1200/ 1327 batches | lr 0.0001003 | ms/batch 213.39 | loss  3.98 | ppl    53.48 | bpt    5.741 
-----------------------------------------------------------------------------------------
| end of epoch 118 | time: 337.10s | valid loss  4.13 | valid ppl    62.27 | valid bpt    5.960
-----------------------------------------------------------------------------------------
| epoch 119 |   200/ 1327 batches | lr 0.0001003 | ms/batch 212.40 | loss  3.98 | ppl    53.47 | bpt    5.741 
| epoch 119 |   400/ 1327 batches | lr 0.0001002 | ms/batch 211.00 | loss  3.96 | ppl    52.36 | bpt    5.710 
| epoch 119 |   600/ 1327 batches | lr 0.0001002 | ms/batch 211.97 | loss  4.03 | ppl    56.35 | bpt    5.816 
| epoch 119 |   800/ 1327 batches | lr 0.0001002 | ms/batch 208.52 | loss  4.01 | ppl    55.21 | bpt    5.787 
| epoch 119 |  1000/ 1327 batches | lr 0.0001002 | ms/batch 211.97 | loss  4.04 | ppl    56.68 | bpt    5.825 
| epoch 119 |  1200/ 1327 batches | lr 0.0001001 | ms/batch 212.25 | loss  3.99 | ppl    54.16 | bpt    5.759 
-----------------------------------------------------------------------------------------
| end of epoch 119 | time: 336.30s | valid loss  4.12 | valid ppl    61.85 | valid bpt    5.951
-----------------------------------------------------------------------------------------
| epoch 120 |   200/ 1327 batches | lr 0.0001001 | ms/batch 212.38 | loss  3.95 | ppl    52.12 | bpt    5.704 
| epoch 120 |   400/ 1327 batches | lr 0.0001001 | ms/batch 209.50 | loss  3.97 | ppl    52.94 | bpt    5.726 
| epoch 120 |   600/ 1327 batches | lr 0.0001001 | ms/batch 211.88 | loss  4.03 | ppl    56.47 | bpt    5.819 
| epoch 120 |   800/ 1327 batches | lr 0.0001001 | ms/batch 211.22 | loss  4.00 | ppl    54.84 | bpt    5.777 
| epoch 120 |  1000/ 1327 batches | lr 0.0001 | ms/batch 211.97 | loss  4.06 | ppl    58.18 | bpt    5.862 
| epoch 120 |  1200/ 1327 batches | lr 0.0001 | ms/batch 213.38 | loss  3.98 | ppl    53.67 | bpt    5.746 
-----------------------------------------------------------------------------------------
| end of epoch 120 | time: 336.56s | valid loss  4.12 | valid ppl    61.80 | valid bpt    5.949
-----------------------------------------------------------------------------------------
| epoch 121 |   200/ 1327 batches | lr 0.0001 | ms/batch 210.12 | loss  3.96 | ppl    52.40 | bpt    5.711 
| epoch 121 |   400/ 1327 batches | lr 0.0001 | ms/batch 214.97 | loss  3.94 | ppl    51.58 | bpt    5.689 
| epoch 121 |   600/ 1327 batches | lr 0.0001 | ms/batch 209.56 | loss  4.05 | ppl    57.46 | bpt    5.845 
| epoch 121 |   800/ 1327 batches | lr 0.0001 | ms/batch 213.86 | loss  4.01 | ppl    54.90 | bpt    5.779 
| epoch 121 |  1000/ 1327 batches | lr 0.0001 | ms/batch 212.71 | loss  4.05 | ppl    57.35 | bpt    5.842 
| epoch 121 |  1200/ 1327 batches | lr 0.0001 | ms/batch 209.86 | loss  3.99 | ppl    53.90 | bpt    5.752 
-----------------------------------------------------------------------------------------
| end of epoch 121 | time: 336.03s | valid loss  4.12 | valid ppl    61.78 | valid bpt    5.949
-----------------------------------------------------------------------------------------
| epoch 122 |   200/ 1327 batches | lr 0.0001 | ms/batch 213.27 | loss  3.96 | ppl    52.58 | bpt    5.717 
| epoch 122 |   400/ 1327 batches | lr 0.0001 | ms/batch 212.54 | loss  3.96 | ppl    52.36 | bpt    5.711 
| epoch 122 |   600/ 1327 batches | lr 0.0001 | ms/batch 211.99 | loss  4.04 | ppl    56.57 | bpt    5.822 
| epoch 122 |   800/ 1327 batches | lr 0.0001 | ms/batch 212.02 | loss  4.00 | ppl    54.50 | bpt    5.768 
| epoch 122 |  1000/ 1327 batches | lr 0.0001 | ms/batch 213.34 | loss  4.05 | ppl    57.31 | bpt    5.841 
| epoch 122 |  1200/ 1327 batches | lr 0.0001 | ms/batch 212.69 | loss  3.99 | ppl    53.92 | bpt    5.753 
-----------------------------------------------------------------------------------------
| end of epoch 122 | time: 336.93s | valid loss  4.12 | valid ppl    61.85 | valid bpt    5.951
-----------------------------------------------------------------------------------------
| epoch 123 |   200/ 1327 batches | lr 0.0001 | ms/batch 209.86 | loss  3.96 | ppl    52.60 | bpt    5.717 
| epoch 123 |   400/ 1327 batches | lr 0.0001 | ms/batch 212.06 | loss  3.97 | ppl    52.89 | bpt    5.725 
| epoch 123 |   600/ 1327 batches | lr 0.0001 | ms/batch 209.36 | loss  4.04 | ppl    56.80 | bpt    5.828 
| epoch 123 |   800/ 1327 batches | lr 0.0001 | ms/batch 210.28 | loss  3.98 | ppl    53.59 | bpt    5.744 
| epoch 123 |  1000/ 1327 batches | lr 0.0001 | ms/batch 210.51 | loss  4.04 | ppl    56.54 | bpt    5.821 
| epoch 123 |  1200/ 1327 batches | lr 0.0001 | ms/batch 213.99 | loss  3.99 | ppl    54.17 | bpt    5.759 
-----------------------------------------------------------------------------------------
| end of epoch 123 | time: 336.36s | valid loss  4.12 | valid ppl    61.81 | valid bpt    5.950
-----------------------------------------------------------------------------------------
| epoch 124 |   200/ 1327 batches | lr 0.0001 | ms/batch 212.89 | loss  3.95 | ppl    51.83 | bpt    5.696 
| epoch 124 |   400/ 1327 batches | lr 0.0001 | ms/batch 213.56 | loss  3.95 | ppl    52.15 | bpt    5.705 
| epoch 124 |   600/ 1327 batches | lr 0.0001 | ms/batch 215.35 | loss  4.02 | ppl    55.61 | bpt    5.797 
| epoch 124 |   800/ 1327 batches | lr 0.0001 | ms/batch 210.66 | loss  4.01 | ppl    55.36 | bpt    5.791 
| epoch 124 |  1000/ 1327 batches | lr 0.0001 | ms/batch 213.71 | loss  4.04 | ppl    56.68 | bpt    5.825 
| epoch 124 |  1200/ 1327 batches | lr 0.0001 | ms/batch 214.14 | loss  3.98 | ppl    53.79 | bpt    5.749 
-----------------------------------------------------------------------------------------
| end of epoch 124 | time: 336.55s | valid loss  4.13 | valid ppl    61.95 | valid bpt    5.953
-----------------------------------------------------------------------------------------
| epoch 125 |   200/ 1327 batches | lr 0.0001 | ms/batch 212.85 | loss  3.94 | ppl    51.50 | bpt    5.686 
| epoch 125 |   400/ 1327 batches | lr 0.0001 | ms/batch 209.26 | loss  3.96 | ppl    52.44 | bpt    5.713 
| epoch 125 |   600/ 1327 batches | lr 0.0001 | ms/batch 210.77 | loss  4.02 | ppl    55.74 | bpt    5.801 
| epoch 125 |   800/ 1327 batches | lr 0.0001 | ms/batch 213.22 | loss  4.00 | ppl    54.33 | bpt    5.764 
| epoch 125 |  1000/ 1327 batches | lr 0.0001 | ms/batch 212.62 | loss  4.02 | ppl    55.85 | bpt    5.803 
| epoch 125 |  1200/ 1327 batches | lr 0.0001 | ms/batch 207.73 | loss  3.98 | ppl    53.54 | bpt    5.742 
-----------------------------------------------------------------------------------------
| end of epoch 125 | time: 337.62s | valid loss  4.13 | valid ppl    61.97 | valid bpt    5.953
-----------------------------------------------------------------------------------------
Starting EMA at epoch 126
| epoch 126 |   200/ 1327 batches | lr 5e-05 | ms/batch 217.46 | loss  3.93 | ppl    51.07 | bpt    5.675 
| epoch 126 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.34 | loss  3.94 | ppl    51.33 | bpt    5.682 
| epoch 126 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.28 | loss  4.01 | ppl    55.35 | bpt    5.790 
| epoch 126 |   800/ 1327 batches | lr 5e-05 | ms/batch 218.13 | loss  3.95 | ppl    51.71 | bpt    5.692 
| epoch 126 |  1000/ 1327 batches | lr 5e-05 | ms/batch 217.12 | loss  4.01 | ppl    55.13 | bpt    5.785 
| epoch 126 |  1200/ 1327 batches | lr 5e-05 | ms/batch 216.92 | loss  3.94 | ppl    51.54 | bpt    5.688 
-----------------------------------------------------------------------------------------
| end of epoch 126 | time: 345.48s | valid loss  4.10 | valid ppl     60.54 | valid bpt    5.920
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 127 |   200/ 1327 batches | lr 5e-05 | ms/batch 217.47 | loss  3.93 | ppl    50.70 | bpt    5.664 
| epoch 127 |   400/ 1327 batches | lr 5e-05 | ms/batch 216.14 | loss  3.92 | ppl    50.51 | bpt    5.658 
| epoch 127 |   600/ 1327 batches | lr 5e-05 | ms/batch 216.33 | loss  4.00 | ppl    54.38 | bpt    5.765 
| epoch 127 |   800/ 1327 batches | lr 5e-05 | ms/batch 217.87 | loss  3.95 | ppl    51.94 | bpt    5.699 
| epoch 127 |  1000/ 1327 batches | lr 5e-05 | ms/batch 216.64 | loss  4.00 | ppl    54.46 | bpt    5.767 
| epoch 127 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.81 | loss  3.94 | ppl    51.34 | bpt    5.682 
-----------------------------------------------------------------------------------------
| end of epoch 127 | time: 345.13s | valid loss  4.10 | valid ppl     60.42 | valid bpt    5.917
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 128 |   200/ 1327 batches | lr 5e-05 | ms/batch 219.03 | loss  3.93 | ppl    50.69 | bpt    5.664 
| epoch 128 |   400/ 1327 batches | lr 5e-05 | ms/batch 219.53 | loss  3.91 | ppl    49.98 | bpt    5.643 
| epoch 128 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.28 | loss  4.00 | ppl    54.83 | bpt    5.777 
| epoch 128 |   800/ 1327 batches | lr 5e-05 | ms/batch 217.66 | loss  3.96 | ppl    52.45 | bpt    5.713 
| epoch 128 |  1000/ 1327 batches | lr 5e-05 | ms/batch 217.65 | loss  3.98 | ppl    53.67 | bpt    5.746 
| epoch 128 |  1200/ 1327 batches | lr 5e-05 | ms/batch 220.27 | loss  3.93 | ppl    50.67 | bpt    5.663 
-----------------------------------------------------------------------------------------
| end of epoch 128 | time: 343.74s | valid loss  4.10 | valid ppl     60.36 | valid bpt    5.916
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 129 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.34 | loss  3.92 | ppl    50.34 | bpt    5.654 
| epoch 129 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.13 | loss  3.92 | ppl    50.34 | bpt    5.654 
| epoch 129 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.35 | loss  3.98 | ppl    53.70 | bpt    5.747 
| epoch 129 |   800/ 1327 batches | lr 5e-05 | ms/batch 214.18 | loss  3.95 | ppl    51.85 | bpt    5.696 
| epoch 129 |  1000/ 1327 batches | lr 5e-05 | ms/batch 215.77 | loss  4.00 | ppl    54.66 | bpt    5.772 
| epoch 129 |  1200/ 1327 batches | lr 5e-05 | ms/batch 216.18 | loss  3.93 | ppl    51.12 | bpt    5.676 
-----------------------------------------------------------------------------------------
| end of epoch 129 | time: 344.05s | valid loss  4.10 | valid ppl     60.32 | valid bpt    5.914
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 130 |   200/ 1327 batches | lr 5e-05 | ms/batch 213.42 | loss  3.94 | ppl    51.18 | bpt    5.677 
| epoch 130 |   400/ 1327 batches | lr 5e-05 | ms/batch 216.95 | loss  3.90 | ppl    49.34 | bpt    5.625 
| epoch 130 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.88 | loss  3.98 | ppl    53.28 | bpt    5.735 
| epoch 130 |   800/ 1327 batches | lr 5e-05 | ms/batch 213.47 | loss  3.94 | ppl    51.29 | bpt    5.681 
| epoch 130 |  1000/ 1327 batches | lr 5e-05 | ms/batch 218.45 | loss  3.99 | ppl    53.84 | bpt    5.751 
| epoch 130 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.23 | loss  3.93 | ppl    50.90 | bpt    5.669 
-----------------------------------------------------------------------------------------
| end of epoch 130 | time: 343.60s | valid loss  4.10 | valid ppl     60.27 | valid bpt    5.913
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 131 |   200/ 1327 batches | lr 5e-05 | ms/batch 216.84 | loss  3.91 | ppl    50.05 | bpt    5.645 
| epoch 131 |   400/ 1327 batches | lr 5e-05 | ms/batch 220.64 | loss  3.89 | ppl    49.04 | bpt    5.616 
| epoch 131 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.80 | loss  3.97 | ppl    52.96 | bpt    5.727 
| epoch 131 |   800/ 1327 batches | lr 5e-05 | ms/batch 217.51 | loss  3.93 | ppl    50.88 | bpt    5.669 
| epoch 131 |  1000/ 1327 batches | lr 5e-05 | ms/batch 217.90 | loss  3.99 | ppl    53.87 | bpt    5.751 
| epoch 131 |  1200/ 1327 batches | lr 5e-05 | ms/batch 216.80 | loss  3.93 | ppl    51.06 | bpt    5.674 
-----------------------------------------------------------------------------------------
| end of epoch 131 | time: 345.75s | valid loss  4.10 | valid ppl     60.22 | valid bpt    5.912
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 132 |   200/ 1327 batches | lr 5e-05 | ms/batch 219.61 | loss  3.90 | ppl    49.18 | bpt    5.620 
| epoch 132 |   400/ 1327 batches | lr 5e-05 | ms/batch 213.71 | loss  3.90 | ppl    49.35 | bpt    5.625 
| epoch 132 |   600/ 1327 batches | lr 5e-05 | ms/batch 214.64 | loss  3.98 | ppl    53.51 | bpt    5.742 
| epoch 132 |   800/ 1327 batches | lr 5e-05 | ms/batch 210.56 | loss  3.93 | ppl    50.85 | bpt    5.668 
| epoch 132 |  1000/ 1327 batches | lr 5e-05 | ms/batch 214.95 | loss  3.98 | ppl    53.55 | bpt    5.743 
| epoch 132 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.14 | loss  3.93 | ppl    50.97 | bpt    5.672 
-----------------------------------------------------------------------------------------
| end of epoch 132 | time: 342.80s | valid loss  4.10 | valid ppl     60.19 | valid bpt    5.912
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 133 |   200/ 1327 batches | lr 5e-05 | ms/batch 216.92 | loss  3.92 | ppl    50.48 | bpt    5.658 
| epoch 133 |   400/ 1327 batches | lr 5e-05 | ms/batch 214.27 | loss  3.90 | ppl    49.55 | bpt    5.631 
| epoch 133 |   600/ 1327 batches | lr 5e-05 | ms/batch 220.24 | loss  3.96 | ppl    52.63 | bpt    5.718 
| epoch 133 |   800/ 1327 batches | lr 5e-05 | ms/batch 218.91 | loss  3.94 | ppl    51.48 | bpt    5.686 
| epoch 133 |  1000/ 1327 batches | lr 5e-05 | ms/batch 219.20 | loss  3.99 | ppl    54.06 | bpt    5.757 
| epoch 133 |  1200/ 1327 batches | lr 5e-05 | ms/batch 218.67 | loss  3.90 | ppl    49.56 | bpt    5.631 
-----------------------------------------------------------------------------------------
| end of epoch 133 | time: 344.40s | valid loss  4.10 | valid ppl     60.16 | valid bpt    5.911
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 134 |   200/ 1327 batches | lr 5e-05 | ms/batch 219.09 | loss  3.89 | ppl    48.71 | bpt    5.606 
| epoch 134 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.30 | loss  3.89 | ppl    48.67 | bpt    5.605 
| epoch 134 |   600/ 1327 batches | lr 5e-05 | ms/batch 216.89 | loss  3.98 | ppl    53.64 | bpt    5.745 
| epoch 134 |   800/ 1327 batches | lr 5e-05 | ms/batch 218.74 | loss  3.93 | ppl    50.84 | bpt    5.668 
| epoch 134 |  1000/ 1327 batches | lr 5e-05 | ms/batch 218.45 | loss  3.99 | ppl    54.18 | bpt    5.760 
| epoch 134 |  1200/ 1327 batches | lr 5e-05 | ms/batch 216.63 | loss  3.91 | ppl    49.68 | bpt    5.635 
-----------------------------------------------------------------------------------------
| end of epoch 134 | time: 345.27s | valid loss  4.10 | valid ppl     60.16 | valid bpt    5.911
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 135 |   200/ 1327 batches | lr 5e-05 | ms/batch 216.96 | loss  3.88 | ppl    48.66 | bpt    5.605 
| epoch 135 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.75 | loss  3.92 | ppl    50.28 | bpt    5.652 
| epoch 135 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.94 | loss  3.95 | ppl    51.81 | bpt    5.695 
| epoch 135 |   800/ 1327 batches | lr 5e-05 | ms/batch 218.09 | loss  3.93 | ppl    50.80 | bpt    5.667 
| epoch 135 |  1000/ 1327 batches | lr 5e-05 | ms/batch 219.31 | loss  3.96 | ppl    52.65 | bpt    5.718 
| epoch 135 |  1200/ 1327 batches | lr 5e-05 | ms/batch 214.41 | loss  3.90 | ppl    49.28 | bpt    5.623 
-----------------------------------------------------------------------------------------
| end of epoch 135 | time: 345.64s | valid loss  4.10 | valid ppl     60.13 | valid bpt    5.910
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 136 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.89 | loss  3.88 | ppl    48.63 | bpt    5.604 
| epoch 136 |   400/ 1327 batches | lr 5e-05 | ms/batch 219.03 | loss  3.88 | ppl    48.59 | bpt    5.603 
| epoch 136 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.92 | loss  3.96 | ppl    52.47 | bpt    5.713 
| epoch 136 |   800/ 1327 batches | lr 5e-05 | ms/batch 217.22 | loss  3.92 | ppl    50.49 | bpt    5.658 
| epoch 136 |  1000/ 1327 batches | lr 5e-05 | ms/batch 219.81 | loss  3.96 | ppl    52.58 | bpt    5.717 
| epoch 136 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.37 | loss  3.90 | ppl    49.64 | bpt    5.633 
-----------------------------------------------------------------------------------------
| end of epoch 136 | time: 344.60s | valid loss  4.10 | valid ppl     60.11 | valid bpt    5.909
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 137 |   200/ 1327 batches | lr 5e-05 | ms/batch 215.59 | loss  3.88 | ppl    48.60 | bpt    5.603 
| epoch 137 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.89 | loss  3.90 | ppl    49.27 | bpt    5.623 
| epoch 137 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.48 | loss  3.95 | ppl    52.12 | bpt    5.704 
| epoch 137 |   800/ 1327 batches | lr 5e-05 | ms/batch 217.06 | loss  3.92 | ppl    50.61 | bpt    5.661 
| epoch 137 |  1000/ 1327 batches | lr 5e-05 | ms/batch 216.25 | loss  3.97 | ppl    52.95 | bpt    5.727 
| epoch 137 |  1200/ 1327 batches | lr 5e-05 | ms/batch 218.98 | loss  3.93 | ppl    51.09 | bpt    5.675 
-----------------------------------------------------------------------------------------
| end of epoch 137 | time: 346.88s | valid loss  4.10 | valid ppl     60.09 | valid bpt    5.909
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 138 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.13 | loss  3.88 | ppl    48.41 | bpt    5.597 
| epoch 138 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.62 | loss  3.89 | ppl    48.94 | bpt    5.613 
| epoch 138 |   600/ 1327 batches | lr 5e-05 | ms/batch 216.91 | loss  3.95 | ppl    51.90 | bpt    5.698 
| epoch 138 |   800/ 1327 batches | lr 5e-05 | ms/batch 217.78 | loss  3.90 | ppl    49.34 | bpt    5.625 
| epoch 138 |  1000/ 1327 batches | lr 5e-05 | ms/batch 216.46 | loss  3.96 | ppl    52.65 | bpt    5.718 
| epoch 138 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.32 | loss  3.90 | ppl    49.23 | bpt    5.621 
-----------------------------------------------------------------------------------------
| end of epoch 138 | time: 344.77s | valid loss  4.10 | valid ppl     60.08 | valid bpt    5.909
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 139 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.48 | loss  3.88 | ppl    48.22 | bpt    5.592 
| epoch 139 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.07 | loss  3.88 | ppl    48.63 | bpt    5.604 
| epoch 139 |   600/ 1327 batches | lr 5e-05 | ms/batch 214.62 | loss  3.95 | ppl    52.15 | bpt    5.705 
| epoch 139 |   800/ 1327 batches | lr 5e-05 | ms/batch 216.02 | loss  3.92 | ppl    50.18 | bpt    5.649 
| epoch 139 |  1000/ 1327 batches | lr 5e-05 | ms/batch 217.35 | loss  3.97 | ppl    52.84 | bpt    5.724 
| epoch 139 |  1200/ 1327 batches | lr 5e-05 | ms/batch 221.57 | loss  3.91 | ppl    49.68 | bpt    5.635 
-----------------------------------------------------------------------------------------
| end of epoch 139 | time: 344.30s | valid loss  4.10 | valid ppl     60.06 | valid bpt    5.908
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 140 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.33 | loss  3.87 | ppl    47.86 | bpt    5.581 
| epoch 140 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.24 | loss  3.86 | ppl    47.65 | bpt    5.574 
| epoch 140 |   600/ 1327 batches | lr 5e-05 | ms/batch 216.89 | loss  3.95 | ppl    51.96 | bpt    5.699 
| epoch 140 |   800/ 1327 batches | lr 5e-05 | ms/batch 217.89 | loss  3.92 | ppl    50.58 | bpt    5.660 
| epoch 140 |  1000/ 1327 batches | lr 5e-05 | ms/batch 219.44 | loss  3.96 | ppl    52.61 | bpt    5.717 
| epoch 140 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.10 | loss  3.92 | ppl    50.21 | bpt    5.650 
-----------------------------------------------------------------------------------------
| end of epoch 140 | time: 344.50s | valid loss  4.09 | valid ppl     60.04 | valid bpt    5.908
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 141 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.90 | loss  3.88 | ppl    48.39 | bpt    5.597 
| epoch 141 |   400/ 1327 batches | lr 5e-05 | ms/batch 214.62 | loss  3.88 | ppl    48.44 | bpt    5.598 
| epoch 141 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.88 | loss  3.96 | ppl    52.65 | bpt    5.718 
| epoch 141 |   800/ 1327 batches | lr 5e-05 | ms/batch 218.29 | loss  3.92 | ppl    50.64 | bpt    5.662 
| epoch 141 |  1000/ 1327 batches | lr 5e-05 | ms/batch 219.40 | loss  3.96 | ppl    52.53 | bpt    5.715 
| epoch 141 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.97 | loss  3.91 | ppl    49.85 | bpt    5.640 
-----------------------------------------------------------------------------------------
| end of epoch 141 | time: 342.94s | valid loss  4.09 | valid ppl     60.02 | valid bpt    5.907
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 142 |   200/ 1327 batches | lr 5e-05 | ms/batch 213.06 | loss  3.89 | ppl    48.75 | bpt    5.607 
| epoch 142 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.93 | loss  3.87 | ppl    48.17 | bpt    5.590 
| epoch 142 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.66 | loss  3.94 | ppl    51.32 | bpt    5.681 
| epoch 142 |   800/ 1327 batches | lr 5e-05 | ms/batch 219.38 | loss  3.90 | ppl    49.52 | bpt    5.630 
| epoch 142 |  1000/ 1327 batches | lr 5e-05 | ms/batch 213.46 | loss  3.96 | ppl    52.66 | bpt    5.719 
| epoch 142 |  1200/ 1327 batches | lr 5e-05 | ms/batch 215.22 | loss  3.89 | ppl    49.16 | bpt    5.619 
-----------------------------------------------------------------------------------------
| end of epoch 142 | time: 342.43s | valid loss  4.09 | valid ppl     60.01 | valid bpt    5.907
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 143 |   200/ 1327 batches | lr 5e-05 | ms/batch 215.80 | loss  3.87 | ppl    47.71 | bpt    5.576 
| epoch 143 |   400/ 1327 batches | lr 5e-05 | ms/batch 214.25 | loss  3.88 | ppl    48.27 | bpt    5.593 
| epoch 143 |   600/ 1327 batches | lr 5e-05 | ms/batch 212.36 | loss  3.94 | ppl    51.58 | bpt    5.689 
| epoch 143 |   800/ 1327 batches | lr 5e-05 | ms/batch 216.25 | loss  3.89 | ppl    49.14 | bpt    5.619 
| epoch 143 |  1000/ 1327 batches | lr 5e-05 | ms/batch 214.18 | loss  3.95 | ppl    51.74 | bpt    5.693 
| epoch 143 |  1200/ 1327 batches | lr 5e-05 | ms/batch 216.54 | loss  3.92 | ppl    50.29 | bpt    5.652 
-----------------------------------------------------------------------------------------
| end of epoch 143 | time: 344.46s | valid loss  4.09 | valid ppl     60.01 | valid bpt    5.907
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 144 |   200/ 1327 batches | lr 5e-05 | ms/batch 221.17 | loss  3.87 | ppl    48.09 | bpt    5.588 
| epoch 144 |   400/ 1327 batches | lr 5e-05 | ms/batch 221.04 | loss  3.85 | ppl    47.09 | bpt    5.557 
| epoch 144 |   600/ 1327 batches | lr 5e-05 | ms/batch 213.21 | loss  3.95 | ppl    51.90 | bpt    5.698 
| epoch 144 |   800/ 1327 batches | lr 5e-05 | ms/batch 216.91 | loss  3.91 | ppl    49.74 | bpt    5.636 
| epoch 144 |  1000/ 1327 batches | lr 5e-05 | ms/batch 215.07 | loss  3.97 | ppl    52.74 | bpt    5.721 
| epoch 144 |  1200/ 1327 batches | lr 5e-05 | ms/batch 218.45 | loss  3.89 | ppl    49.01 | bpt    5.615 
-----------------------------------------------------------------------------------------
| end of epoch 144 | time: 344.99s | valid loss  4.09 | valid ppl     59.99 | valid bpt    5.907
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 145 |   200/ 1327 batches | lr 5e-05 | ms/batch 216.82 | loss  3.87 | ppl    48.13 | bpt    5.589 
| epoch 145 |   400/ 1327 batches | lr 5e-05 | ms/batch 216.32 | loss  3.86 | ppl    47.63 | bpt    5.574 
| epoch 145 |   600/ 1327 batches | lr 5e-05 | ms/batch 215.12 | loss  3.93 | ppl    50.98 | bpt    5.672 
| epoch 145 |   800/ 1327 batches | lr 5e-05 | ms/batch 213.61 | loss  3.91 | ppl    49.66 | bpt    5.634 
| epoch 145 |  1000/ 1327 batches | lr 5e-05 | ms/batch 213.38 | loss  3.96 | ppl    52.26 | bpt    5.708 
| epoch 145 |  1200/ 1327 batches | lr 5e-05 | ms/batch 218.87 | loss  3.91 | ppl    49.69 | bpt    5.635 
-----------------------------------------------------------------------------------------
| end of epoch 145 | time: 342.61s | valid loss  4.09 | valid ppl     59.98 | valid bpt    5.906
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 146 |   200/ 1327 batches | lr 5e-05 | ms/batch 216.14 | loss  3.87 | ppl    47.98 | bpt    5.584 
| epoch 146 |   400/ 1327 batches | lr 5e-05 | ms/batch 215.48 | loss  3.87 | ppl    48.00 | bpt    5.585 
| epoch 146 |   600/ 1327 batches | lr 5e-05 | ms/batch 216.90 | loss  3.95 | ppl    51.79 | bpt    5.694 
| epoch 146 |   800/ 1327 batches | lr 5e-05 | ms/batch 217.78 | loss  3.89 | ppl    48.93 | bpt    5.613 
| epoch 146 |  1000/ 1327 batches | lr 5e-05 | ms/batch 216.15 | loss  3.96 | ppl    52.29 | bpt    5.708 
| epoch 146 |  1200/ 1327 batches | lr 5e-05 | ms/batch 220.04 | loss  3.89 | ppl    49.14 | bpt    5.619 
-----------------------------------------------------------------------------------------
| end of epoch 146 | time: 344.28s | valid loss  4.09 | valid ppl     59.97 | valid bpt    5.906
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 147 |   200/ 1327 batches | lr 5e-05 | ms/batch 216.88 | loss  3.86 | ppl    47.24 | bpt    5.562 
| epoch 147 |   400/ 1327 batches | lr 5e-05 | ms/batch 216.31 | loss  3.87 | ppl    47.96 | bpt    5.584 
| epoch 147 |   600/ 1327 batches | lr 5e-05 | ms/batch 216.86 | loss  3.93 | ppl    50.85 | bpt    5.668 
| epoch 147 |   800/ 1327 batches | lr 5e-05 | ms/batch 218.46 | loss  3.89 | ppl    49.08 | bpt    5.617 
| epoch 147 |  1000/ 1327 batches | lr 5e-05 | ms/batch 218.26 | loss  3.95 | ppl    51.97 | bpt    5.699 
| epoch 147 |  1200/ 1327 batches | lr 5e-05 | ms/batch 218.18 | loss  3.89 | ppl    48.75 | bpt    5.607 
-----------------------------------------------------------------------------------------
| end of epoch 147 | time: 345.66s | valid loss  4.09 | valid ppl     59.96 | valid bpt    5.906
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 148 |   200/ 1327 batches | lr 5e-05 | ms/batch 215.28 | loss  3.86 | ppl    47.27 | bpt    5.563 
| epoch 148 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.00 | loss  3.87 | ppl    48.07 | bpt    5.587 
| epoch 148 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.81 | loss  3.93 | ppl    50.77 | bpt    5.666 
| epoch 148 |   800/ 1327 batches | lr 5e-05 | ms/batch 216.13 | loss  3.90 | ppl    49.54 | bpt    5.631 
| epoch 148 |  1000/ 1327 batches | lr 5e-05 | ms/batch 213.27 | loss  3.94 | ppl    51.53 | bpt    5.687 
| epoch 148 |  1200/ 1327 batches | lr 5e-05 | ms/batch 215.10 | loss  3.89 | ppl    48.95 | bpt    5.613 
-----------------------------------------------------------------------------------------
| end of epoch 148 | time: 345.07s | valid loss  4.09 | valid ppl     59.94 | valid bpt    5.906
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 149 |   200/ 1327 batches | lr 5e-05 | ms/batch 217.28 | loss  3.84 | ppl    46.64 | bpt    5.544 
| epoch 149 |   400/ 1327 batches | lr 5e-05 | ms/batch 214.30 | loss  3.87 | ppl    47.72 | bpt    5.577 
| epoch 149 |   600/ 1327 batches | lr 5e-05 | ms/batch 216.79 | loss  3.93 | ppl    50.88 | bpt    5.669 
| epoch 149 |   800/ 1327 batches | lr 5e-05 | ms/batch 218.99 | loss  3.89 | ppl    48.85 | bpt    5.610 
| epoch 149 |  1000/ 1327 batches | lr 5e-05 | ms/batch 215.86 | loss  3.96 | ppl    52.28 | bpt    5.708 
| epoch 149 |  1200/ 1327 batches | lr 5e-05 | ms/batch 218.93 | loss  3.88 | ppl    48.37 | bpt    5.596 
-----------------------------------------------------------------------------------------
| end of epoch 149 | time: 345.45s | valid loss  4.09 | valid ppl     59.94 | valid bpt    5.905
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 150 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.72 | loss  3.86 | ppl    47.60 | bpt    5.573 
| epoch 150 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.44 | loss  3.85 | ppl    46.99 | bpt    5.554 
| epoch 150 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.25 | loss  3.94 | ppl    51.35 | bpt    5.682 
| epoch 150 |   800/ 1327 batches | lr 5e-05 | ms/batch 217.92 | loss  3.90 | ppl    49.31 | bpt    5.624 
| epoch 150 |  1000/ 1327 batches | lr 5e-05 | ms/batch 218.61 | loss  3.94 | ppl    51.48 | bpt    5.686 
| epoch 150 |  1200/ 1327 batches | lr 5e-05 | ms/batch 214.73 | loss  3.86 | ppl    47.62 | bpt    5.573 
-----------------------------------------------------------------------------------------
| end of epoch 150 | time: 344.29s | valid loss  4.09 | valid ppl     59.93 | valid bpt    5.905
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 151 |   200/ 1327 batches | lr 5e-05 | ms/batch 219.30 | loss  3.86 | ppl    47.27 | bpt    5.563 
| epoch 151 |   400/ 1327 batches | lr 5e-05 | ms/batch 213.49 | loss  3.86 | ppl    47.69 | bpt    5.576 
| epoch 151 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.22 | loss  3.93 | ppl    50.90 | bpt    5.669 
| epoch 151 |   800/ 1327 batches | lr 5e-05 | ms/batch 218.67 | loss  3.88 | ppl    48.31 | bpt    5.594 
| epoch 151 |  1000/ 1327 batches | lr 5e-05 | ms/batch 213.49 | loss  3.93 | ppl    50.74 | bpt    5.665 
| epoch 151 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.23 | loss  3.87 | ppl    48.18 | bpt    5.590 
-----------------------------------------------------------------------------------------
| end of epoch 151 | time: 344.22s | valid loss  4.09 | valid ppl     59.92 | valid bpt    5.905
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 152 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.25 | loss  3.86 | ppl    47.48 | bpt    5.569 
| epoch 152 |   400/ 1327 batches | lr 5e-05 | ms/batch 216.51 | loss  3.86 | ppl    47.58 | bpt    5.572 
| epoch 152 |   600/ 1327 batches | lr 5e-05 | ms/batch 216.91 | loss  3.93 | ppl    50.67 | bpt    5.663 
| epoch 152 |   800/ 1327 batches | lr 5e-05 | ms/batch 219.26 | loss  3.90 | ppl    49.23 | bpt    5.621 
| epoch 152 |  1000/ 1327 batches | lr 5e-05 | ms/batch 216.23 | loss  3.93 | ppl    50.78 | bpt    5.666 
| epoch 152 |  1200/ 1327 batches | lr 5e-05 | ms/batch 218.86 | loss  3.87 | ppl    48.02 | bpt    5.586 
-----------------------------------------------------------------------------------------
| end of epoch 152 | time: 344.58s | valid loss  4.09 | valid ppl     59.92 | valid bpt    5.905
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 153 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.32 | loss  3.84 | ppl    46.55 | bpt    5.541 
| epoch 153 |   400/ 1327 batches | lr 5e-05 | ms/batch 218.87 | loss  3.84 | ppl    46.47 | bpt    5.538 
| epoch 153 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.01 | loss  3.93 | ppl    51.12 | bpt    5.676 
| epoch 153 |   800/ 1327 batches | lr 5e-05 | ms/batch 219.68 | loss  3.89 | ppl    48.89 | bpt    5.611 
| epoch 153 |  1000/ 1327 batches | lr 5e-05 | ms/batch 218.23 | loss  3.92 | ppl    50.50 | bpt    5.658 
| epoch 153 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.92 | loss  3.86 | ppl    47.65 | bpt    5.574 
-----------------------------------------------------------------------------------------
| end of epoch 153 | time: 343.48s | valid loss  4.09 | valid ppl     59.91 | valid bpt    5.905
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 154 |   200/ 1327 batches | lr 5e-05 | ms/batch 213.64 | loss  3.86 | ppl    47.31 | bpt    5.564 
| epoch 154 |   400/ 1327 batches | lr 5e-05 | ms/batch 216.76 | loss  3.85 | ppl    46.97 | bpt    5.554 
| epoch 154 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.67 | loss  3.93 | ppl    50.92 | bpt    5.670 
| epoch 154 |   800/ 1327 batches | lr 5e-05 | ms/batch 217.89 | loss  3.88 | ppl    48.55 | bpt    5.601 
| epoch 154 |  1000/ 1327 batches | lr 5e-05 | ms/batch 219.65 | loss  3.93 | ppl    51.09 | bpt    5.675 
| epoch 154 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.97 | loss  3.88 | ppl    48.34 | bpt    5.595 
-----------------------------------------------------------------------------------------
| end of epoch 154 | time: 345.71s | valid loss  4.09 | valid ppl     59.91 | valid bpt    5.905
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 155 |   200/ 1327 batches | lr 5e-05 | ms/batch 217.04 | loss  3.86 | ppl    47.27 | bpt    5.563 
| epoch 155 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.78 | loss  3.86 | ppl    47.37 | bpt    5.566 
| epoch 155 |   600/ 1327 batches | lr 5e-05 | ms/batch 215.03 | loss  3.92 | ppl    50.20 | bpt    5.650 
| epoch 155 |   800/ 1327 batches | lr 5e-05 | ms/batch 218.87 | loss  3.88 | ppl    48.65 | bpt    5.604 
| epoch 155 |  1000/ 1327 batches | lr 5e-05 | ms/batch 215.80 | loss  3.94 | ppl    51.52 | bpt    5.687 
| epoch 155 |  1200/ 1327 batches | lr 5e-05 | ms/batch 216.92 | loss  3.87 | ppl    48.08 | bpt    5.587 
-----------------------------------------------------------------------------------------
| end of epoch 155 | time: 344.92s | valid loss  4.09 | valid ppl     59.90 | valid bpt    5.905
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 156 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.66 | loss  3.84 | ppl    46.60 | bpt    5.542 
| epoch 156 |   400/ 1327 batches | lr 5e-05 | ms/batch 220.71 | loss  3.85 | ppl    47.08 | bpt    5.557 
| epoch 156 |   600/ 1327 batches | lr 5e-05 | ms/batch 216.22 | loss  3.92 | ppl    50.28 | bpt    5.652 
| epoch 156 |   800/ 1327 batches | lr 5e-05 | ms/batch 217.74 | loss  3.88 | ppl    48.38 | bpt    5.596 
| epoch 156 |  1000/ 1327 batches | lr 5e-05 | ms/batch 216.98 | loss  3.94 | ppl    51.54 | bpt    5.688 
| epoch 156 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.21 | loss  3.86 | ppl    47.63 | bpt    5.574 
-----------------------------------------------------------------------------------------
| end of epoch 156 | time: 345.01s | valid loss  4.09 | valid ppl     59.90 | valid bpt    5.905
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 157 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.65 | loss  3.84 | ppl    46.34 | bpt    5.534 
| epoch 157 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.20 | loss  3.85 | ppl    46.94 | bpt    5.553 
| epoch 157 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.02 | loss  3.91 | ppl    49.82 | bpt    5.639 
| epoch 157 |   800/ 1327 batches | lr 5e-05 | ms/batch 216.82 | loss  3.87 | ppl    48.13 | bpt    5.589 
| epoch 157 |  1000/ 1327 batches | lr 5e-05 | ms/batch 217.84 | loss  3.94 | ppl    51.52 | bpt    5.687 
| epoch 157 |  1200/ 1327 batches | lr 5e-05 | ms/batch 218.48 | loss  3.85 | ppl    47.00 | bpt    5.555 
-----------------------------------------------------------------------------------------
| end of epoch 157 | time: 344.72s | valid loss  4.09 | valid ppl     59.90 | valid bpt    5.904
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 158 |   200/ 1327 batches | lr 5e-05 | ms/batch 216.63 | loss  3.84 | ppl    46.35 | bpt    5.535 
| epoch 158 |   400/ 1327 batches | lr 5e-05 | ms/batch 215.38 | loss  3.84 | ppl    46.46 | bpt    5.538 
| epoch 158 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.94 | loss  3.92 | ppl    50.33 | bpt    5.653 
| epoch 158 |   800/ 1327 batches | lr 5e-05 | ms/batch 216.82 | loss  3.87 | ppl    47.93 | bpt    5.583 
| epoch 158 |  1000/ 1327 batches | lr 5e-05 | ms/batch 216.29 | loss  3.94 | ppl    51.39 | bpt    5.684 
| epoch 158 |  1200/ 1327 batches | lr 5e-05 | ms/batch 216.59 | loss  3.88 | ppl    48.48 | bpt    5.599 
-----------------------------------------------------------------------------------------
| end of epoch 158 | time: 344.68s | valid loss  4.09 | valid ppl     59.90 | valid bpt    5.904
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 159 |   200/ 1327 batches | lr 5e-05 | ms/batch 215.42 | loss  3.85 | ppl    46.85 | bpt    5.550 
| epoch 159 |   400/ 1327 batches | lr 5e-05 | ms/batch 219.15 | loss  3.84 | ppl    46.64 | bpt    5.544 
| epoch 159 |   600/ 1327 batches | lr 5e-05 | ms/batch 219.81 | loss  3.90 | ppl    49.56 | bpt    5.631 
| epoch 159 |   800/ 1327 batches | lr 5e-05 | ms/batch 216.67 | loss  3.88 | ppl    48.63 | bpt    5.604 
| epoch 159 |  1000/ 1327 batches | lr 5e-05 | ms/batch 216.43 | loss  3.92 | ppl    50.17 | bpt    5.649 
| epoch 159 |  1200/ 1327 batches | lr 5e-05 | ms/batch 213.83 | loss  3.86 | ppl    47.52 | bpt    5.570 
-----------------------------------------------------------------------------------------
| end of epoch 159 | time: 344.29s | valid loss  4.09 | valid ppl     59.90 | valid bpt    5.904
-----------------------------------------------------------------------------------------
| epoch 160 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.40 | loss  3.85 | ppl    46.94 | bpt    5.553 
| epoch 160 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.29 | loss  3.85 | ppl    46.82 | bpt    5.549 
| epoch 160 |   600/ 1327 batches | lr 5e-05 | ms/batch 216.82 | loss  3.91 | ppl    49.70 | bpt    5.635 
| epoch 160 |   800/ 1327 batches | lr 5e-05 | ms/batch 219.57 | loss  3.89 | ppl    48.79 | bpt    5.609 
| epoch 160 |  1000/ 1327 batches | lr 5e-05 | ms/batch 219.11 | loss  3.93 | ppl    50.72 | bpt    5.664 
| epoch 160 |  1200/ 1327 batches | lr 5e-05 | ms/batch 216.55 | loss  3.85 | ppl    46.99 | bpt    5.554 
-----------------------------------------------------------------------------------------
| end of epoch 160 | time: 344.73s | valid loss  4.09 | valid ppl     59.90 | valid bpt    5.904
-----------------------------------------------------------------------------------------
| epoch 161 |   200/ 1327 batches | lr 5e-05 | ms/batch 217.12 | loss  3.84 | ppl    46.41 | bpt    5.536 
| epoch 161 |   400/ 1327 batches | lr 5e-05 | ms/batch 218.69 | loss  3.85 | ppl    46.83 | bpt    5.549 
| epoch 161 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.98 | loss  3.91 | ppl    49.95 | bpt    5.642 
| epoch 161 |   800/ 1327 batches | lr 5e-05 | ms/batch 221.23 | loss  3.88 | ppl    48.46 | bpt    5.599 
| epoch 161 |  1000/ 1327 batches | lr 5e-05 | ms/batch 217.48 | loss  3.92 | ppl    50.60 | bpt    5.661 
| epoch 161 |  1200/ 1327 batches | lr 5e-05 | ms/batch 214.50 | loss  3.86 | ppl    47.43 | bpt    5.568 
-----------------------------------------------------------------------------------------
| end of epoch 161 | time: 345.68s | valid loss  4.09 | valid ppl     59.90 | valid bpt    5.904
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 162 |   200/ 1327 batches | lr 5e-05 | ms/batch 216.85 | loss  3.85 | ppl    46.82 | bpt    5.549 
| epoch 162 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.85 | loss  3.83 | ppl    46.02 | bpt    5.524 
| epoch 162 |   600/ 1327 batches | lr 5e-05 | ms/batch 216.38 | loss  3.91 | ppl    49.79 | bpt    5.638 
| epoch 162 |   800/ 1327 batches | lr 5e-05 | ms/batch 214.87 | loss  3.87 | ppl    47.88 | bpt    5.581 
| epoch 162 |  1000/ 1327 batches | lr 5e-05 | ms/batch 218.73 | loss  3.94 | ppl    51.62 | bpt    5.690 
| epoch 162 |  1200/ 1327 batches | lr 5e-05 | ms/batch 214.56 | loss  3.86 | ppl    47.36 | bpt    5.565 
-----------------------------------------------------------------------------------------
| end of epoch 162 | time: 345.34s | valid loss  4.09 | valid ppl     59.89 | valid bpt    5.904
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 163 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.74 | loss  3.84 | ppl    46.44 | bpt    5.537 
| epoch 163 |   400/ 1327 batches | lr 5e-05 | ms/batch 218.74 | loss  3.82 | ppl    45.76 | bpt    5.516 
| epoch 163 |   600/ 1327 batches | lr 5e-05 | ms/batch 216.31 | loss  3.90 | ppl    49.59 | bpt    5.632 
| epoch 163 |   800/ 1327 batches | lr 5e-05 | ms/batch 216.98 | loss  3.88 | ppl    48.30 | bpt    5.594 
| epoch 163 |  1000/ 1327 batches | lr 5e-05 | ms/batch 219.73 | loss  3.93 | ppl    50.82 | bpt    5.667 
| epoch 163 |  1200/ 1327 batches | lr 5e-05 | ms/batch 218.43 | loss  3.85 | ppl    46.89 | bpt    5.551 
-----------------------------------------------------------------------------------------
| end of epoch 163 | time: 345.50s | valid loss  4.09 | valid ppl     59.89 | valid bpt    5.904
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 164 |   200/ 1327 batches | lr 5e-05 | ms/batch 217.41 | loss  3.83 | ppl    45.86 | bpt    5.519 
| epoch 164 |   400/ 1327 batches | lr 5e-05 | ms/batch 219.42 | loss  3.83 | ppl    46.16 | bpt    5.529 
| epoch 164 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.80 | loss  3.91 | ppl    49.67 | bpt    5.634 
| epoch 164 |   800/ 1327 batches | lr 5e-05 | ms/batch 218.82 | loss  3.87 | ppl    47.80 | bpt    5.579 
| epoch 164 |  1000/ 1327 batches | lr 5e-05 | ms/batch 217.61 | loss  3.93 | ppl    50.68 | bpt    5.663 
| epoch 164 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.10 | loss  3.87 | ppl    48.12 | bpt    5.589 
-----------------------------------------------------------------------------------------
| end of epoch 164 | time: 345.85s | valid loss  4.09 | valid ppl     59.89 | valid bpt    5.904
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 165 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.83 | loss  3.83 | ppl    45.97 | bpt    5.523 
| epoch 165 |   400/ 1327 batches | lr 5e-05 | ms/batch 218.95 | loss  3.83 | ppl    46.15 | bpt    5.528 
| epoch 165 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.42 | loss  3.91 | ppl    49.70 | bpt    5.635 
| epoch 165 |   800/ 1327 batches | lr 5e-05 | ms/batch 218.45 | loss  3.86 | ppl    47.26 | bpt    5.563 
| epoch 165 |  1000/ 1327 batches | lr 5e-05 | ms/batch 214.65 | loss  3.92 | ppl    50.65 | bpt    5.663 
| epoch 165 |  1200/ 1327 batches | lr 5e-05 | ms/batch 219.12 | loss  3.85 | ppl    47.12 | bpt    5.558 
-----------------------------------------------------------------------------------------
| end of epoch 165 | time: 343.82s | valid loss  4.09 | valid ppl     59.89 | valid bpt    5.904
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 166 |   200/ 1327 batches | lr 5e-05 | ms/batch 217.19 | loss  3.82 | ppl    45.72 | bpt    5.515 
| epoch 166 |   400/ 1327 batches | lr 5e-05 | ms/batch 220.19 | loss  3.83 | ppl    46.08 | bpt    5.526 
| epoch 166 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.02 | loss  3.90 | ppl    49.22 | bpt    5.621 
| epoch 166 |   800/ 1327 batches | lr 5e-05 | ms/batch 214.23 | loss  3.88 | ppl    48.24 | bpt    5.592 
| epoch 166 |  1000/ 1327 batches | lr 5e-05 | ms/batch 218.13 | loss  3.93 | ppl    50.89 | bpt    5.669 
| epoch 166 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.63 | loss  3.85 | ppl    47.21 | bpt    5.561 
-----------------------------------------------------------------------------------------
| end of epoch 166 | time: 344.44s | valid loss  4.09 | valid ppl     59.88 | valid bpt    5.904
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 167 |   200/ 1327 batches | lr 5e-05 | ms/batch 218.12 | loss  3.82 | ppl    45.64 | bpt    5.512 
| epoch 167 |   400/ 1327 batches | lr 5e-05 | ms/batch 218.65 | loss  3.82 | ppl    45.72 | bpt    5.515 
| epoch 167 |   600/ 1327 batches | lr 5e-05 | ms/batch 219.53 | loss  3.89 | ppl    49.08 | bpt    5.617 
| epoch 167 |   800/ 1327 batches | lr 5e-05 | ms/batch 217.49 | loss  3.86 | ppl    47.52 | bpt    5.571 
| epoch 167 |  1000/ 1327 batches | lr 5e-05 | ms/batch 217.92 | loss  3.92 | ppl    50.52 | bpt    5.659 
| epoch 167 |  1200/ 1327 batches | lr 5e-05 | ms/batch 218.85 | loss  3.86 | ppl    47.32 | bpt    5.564 
-----------------------------------------------------------------------------------------
| end of epoch 167 | time: 345.92s | valid loss  4.09 | valid ppl     59.88 | valid bpt    5.904
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 168 |   200/ 1327 batches | lr 5e-05 | ms/batch 215.96 | loss  3.84 | ppl    46.52 | bpt    5.540 
| epoch 168 |   400/ 1327 batches | lr 5e-05 | ms/batch 216.59 | loss  3.83 | ppl    45.85 | bpt    5.519 
| epoch 168 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.12 | loss  3.90 | ppl    49.44 | bpt    5.628 
| epoch 168 |   800/ 1327 batches | lr 5e-05 | ms/batch 218.17 | loss  3.86 | ppl    47.46 | bpt    5.569 
| epoch 168 |  1000/ 1327 batches | lr 5e-05 | ms/batch 218.13 | loss  3.91 | ppl    49.89 | bpt    5.641 
| epoch 168 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.64 | loss  3.85 | ppl    46.89 | bpt    5.551 
-----------------------------------------------------------------------------------------
| end of epoch 168 | time: 344.14s | valid loss  4.09 | valid ppl     59.88 | valid bpt    5.904
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 169 |   200/ 1327 batches | lr 5e-05 | ms/batch 216.52 | loss  3.82 | ppl    45.80 | bpt    5.517 
| epoch 169 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.99 | loss  3.83 | ppl    45.86 | bpt    5.519 
| epoch 169 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.02 | loss  3.89 | ppl    48.91 | bpt    5.612 
| epoch 169 |   800/ 1327 batches | lr 5e-05 | ms/batch 216.65 | loss  3.86 | ppl    47.61 | bpt    5.573 
| epoch 169 |  1000/ 1327 batches | lr 5e-05 | ms/batch 218.64 | loss  3.90 | ppl    49.48 | bpt    5.629 
| epoch 169 |  1200/ 1327 batches | lr 5e-05 | ms/batch 215.06 | loss  3.84 | ppl    46.32 | bpt    5.533 
-----------------------------------------------------------------------------------------
| end of epoch 169 | time: 342.37s | valid loss  4.09 | valid ppl     59.88 | valid bpt    5.904
-----------------------------------------------------------------------------------------
Saving Averaged!
| epoch 170 |   200/ 1327 batches | lr 5e-05 | ms/batch 219.08 | loss  3.81 | ppl    45.18 | bpt    5.498 
| epoch 170 |   400/ 1327 batches | lr 5e-05 | ms/batch 211.55 | loss  3.84 | ppl    46.53 | bpt    5.540 
| epoch 170 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.54 | loss  3.90 | ppl    49.35 | bpt    5.625 
| epoch 170 |   800/ 1327 batches | lr 5e-05 | ms/batch 217.39 | loss  3.85 | ppl    47.03 | bpt    5.556 
| epoch 170 |  1000/ 1327 batches | lr 5e-05 | ms/batch 219.06 | loss  3.92 | ppl    50.17 | bpt    5.649 
| epoch 170 |  1200/ 1327 batches | lr 5e-05 | ms/batch 218.49 | loss  3.84 | ppl    46.48 | bpt    5.538 
-----------------------------------------------------------------------------------------
| end of epoch 170 | time: 344.81s | valid loss  4.09 | valid ppl     59.88 | valid bpt    5.904
-----------------------------------------------------------------------------------------
| epoch 171 |   200/ 1327 batches | lr 5e-05 | ms/batch 215.29 | loss  3.82 | ppl    45.79 | bpt    5.517 
| epoch 171 |   400/ 1327 batches | lr 5e-05 | ms/batch 217.18 | loss  3.84 | ppl    46.54 | bpt    5.540 
| epoch 171 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.50 | loss  3.90 | ppl    49.37 | bpt    5.626 
| epoch 171 |   800/ 1327 batches | lr 5e-05 | ms/batch 219.45 | loss  3.86 | ppl    47.41 | bpt    5.567 
| epoch 171 |  1000/ 1327 batches | lr 5e-05 | ms/batch 218.30 | loss  3.91 | ppl    50.13 | bpt    5.648 
| epoch 171 |  1200/ 1327 batches | lr 5e-05 | ms/batch 216.78 | loss  3.85 | ppl    46.95 | bpt    5.553 
-----------------------------------------------------------------------------------------
| end of epoch 171 | time: 344.40s | valid loss  4.09 | valid ppl     59.88 | valid bpt    5.904
-----------------------------------------------------------------------------------------
| epoch 172 |   200/ 1327 batches | lr 5e-05 | ms/batch 217.47 | loss  3.83 | ppl    45.90 | bpt    5.520 
| epoch 172 |   400/ 1327 batches | lr 5e-05 | ms/batch 219.45 | loss  3.81 | ppl    45.27 | bpt    5.501 
| epoch 172 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.84 | loss  3.90 | ppl    49.58 | bpt    5.632 
| epoch 172 |   800/ 1327 batches | lr 5e-05 | ms/batch 215.64 | loss  3.86 | ppl    47.57 | bpt    5.572 
| epoch 172 |  1000/ 1327 batches | lr 5e-05 | ms/batch 216.64 | loss  3.90 | ppl    49.61 | bpt    5.633 
| epoch 172 |  1200/ 1327 batches | lr 5e-05 | ms/batch 214.09 | loss  3.83 | ppl    46.25 | bpt    5.531 
-----------------------------------------------------------------------------------------
| end of epoch 172 | time: 344.35s | valid loss  4.09 | valid ppl     59.88 | valid bpt    5.904
-----------------------------------------------------------------------------------------
| epoch 173 |   200/ 1327 batches | lr 5e-05 | ms/batch 220.13 | loss  3.82 | ppl    45.63 | bpt    5.512 
| epoch 173 |   400/ 1327 batches | lr 5e-05 | ms/batch 215.97 | loss  3.83 | ppl    45.94 | bpt    5.522 
| epoch 173 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.74 | loss  3.89 | ppl    48.84 | bpt    5.610 
| epoch 173 |   800/ 1327 batches | lr 5e-05 | ms/batch 215.47 | loss  3.85 | ppl    47.19 | bpt    5.560 
| epoch 173 |  1000/ 1327 batches | lr 5e-05 | ms/batch 216.10 | loss  3.91 | ppl    49.67 | bpt    5.634 
| epoch 173 |  1200/ 1327 batches | lr 5e-05 | ms/batch 217.15 | loss  3.83 | ppl    46.23 | bpt    5.531 
-----------------------------------------------------------------------------------------
| end of epoch 173 | time: 344.93s | valid loss  4.09 | valid ppl     59.88 | valid bpt    5.904
-----------------------------------------------------------------------------------------
| epoch 174 |   200/ 1327 batches | lr 5e-05 | ms/batch 216.56 | loss  3.82 | ppl    45.77 | bpt    5.516 
| epoch 174 |   400/ 1327 batches | lr 5e-05 | ms/batch 219.23 | loss  3.82 | ppl    45.81 | bpt    5.518 
| epoch 174 |   600/ 1327 batches | lr 5e-05 | ms/batch 217.33 | loss  3.90 | ppl    49.44 | bpt    5.627 
| epoch 174 |   800/ 1327 batches | lr 5e-05 | ms/batch 218.41 | loss  3.86 | ppl    47.23 | bpt    5.562 
| epoch 174 |  1000/ 1327 batches | lr 5e-05 | ms/batch 215.89 | loss  3.90 | ppl    49.34 | bpt    5.625 
| epoch 174 |  1200/ 1327 batches | lr 5e-05 | ms/batch 216.59 | loss  3.83 | ppl    46.20 | bpt    5.530 
-----------------------------------------------------------------------------------------
| end of epoch 174 | time: 344.10s | valid loss  4.09 | valid ppl     59.88 | valid bpt    5.904
-----------------------------------------------------------------------------------------
| epoch 175 |   200/ 1327 batches | lr 5e-05 | ms/batch 219.10 | loss  3.83 | ppl    46.15 | bpt    5.528 
| epoch 175 |   400/ 1327 batches | lr 5e-05 | ms/batch 216.92 | loss  3.82 | ppl    45.68 | bpt    5.514 
| epoch 175 |   600/ 1327 batches | lr 5e-05 | ms/batch 218.35 | loss  3.88 | ppl    48.48 | bpt    5.599 
| epoch 175 |   800/ 1327 batches | lr 5e-05 | ms/batch 214.97 | loss  3.85 | ppl    46.87 | bpt    5.551 
| epoch 175 |  1000/ 1327 batches | lr 5e-05 | ms/batch 215.98 | loss  3.90 | ppl    49.44 | bpt    5.628 
| epoch 175 |  1200/ 1327 batches | lr 5e-05 | ms/batch 215.51 | loss  3.84 | ppl    46.73 | bpt    5.546 
-----------------------------------------------------------------------------------------
| end of epoch 175 | time: 344.59s | valid loss  4.09 | valid ppl     59.88 | valid bpt    5.904
-----------------------------------------------------------------------------------------
=========================================================================================
| End of training | test loss  4.01 | test ppl    55.36 | test bpt    5.791
=========================================================================================
