Logging to experiments/gym_fwalker2d/Wo01/Mon-07-Nov-2022-10-30-38-AM-CST_gym_fwalker2d_trpo_iteration_20_seed2431
Print configuration .....
{'env_name': 'gym_fwalker2d', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/gym_fwalker2d_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': False, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 19.
Path 2 | total_timesteps 37.
Path 3 | total_timesteps 52.
Path 4 | total_timesteps 81.
Path 5 | total_timesteps 100.
Path 6 | total_timesteps 146.
Path 7 | total_timesteps 158.
Path 8 | total_timesteps 184.
Path 9 | total_timesteps 201.
Path 10 | total_timesteps 214.
Path 11 | total_timesteps 232.
Path 12 | total_timesteps 248.
Path 13 | total_timesteps 257.
Path 14 | total_timesteps 271.
Path 15 | total_timesteps 281.
Path 16 | total_timesteps 297.
Path 17 | total_timesteps 319.
Path 18 | total_timesteps 334.
Path 19 | total_timesteps 345.
Path 20 | total_timesteps 356.
Path 21 | total_timesteps 375.
Path 22 | total_timesteps 391.
Path 23 | total_timesteps 424.
Path 24 | total_timesteps 435.
Path 25 | total_timesteps 479.
Path 26 | total_timesteps 514.
Path 27 | total_timesteps 538.
Path 28 | total_timesteps 561.
Path 29 | total_timesteps 584.
Path 30 | total_timesteps 597.
Path 31 | total_timesteps 610.
Path 32 | total_timesteps 622.
Path 33 | total_timesteps 661.
Path 34 | total_timesteps 680.
Path 35 | total_timesteps 712.
Path 36 | total_timesteps 731.
Path 37 | total_timesteps 761.
Path 38 | total_timesteps 787.
Path 39 | total_timesteps 811.
Path 40 | total_timesteps 825.
Path 41 | total_timesteps 836.
Path 42 | total_timesteps 846.
Path 43 | total_timesteps 870.
Path 44 | total_timesteps 896.
Path 45 | total_timesteps 920.
Path 46 | total_timesteps 937.
Path 47 | total_timesteps 959.
Path 48 | total_timesteps 978.
Path 49 | total_timesteps 997.
Path 50 | total_timesteps 1012.
Path 51 | total_timesteps 1026.
Path 52 | total_timesteps 1054.
Path 53 | total_timesteps 1068.
Path 54 | total_timesteps 1085.
Path 55 | total_timesteps 1101.
Path 56 | total_timesteps 1120.
Path 57 | total_timesteps 1135.
Path 58 | total_timesteps 1167.
Path 59 | total_timesteps 1177.
Path 60 | total_timesteps 1196.
Path 61 | total_timesteps 1205.
Path 62 | total_timesteps 1224.
Path 63 | total_timesteps 1244.
Path 64 | total_timesteps 1262.
Path 65 | total_timesteps 1281.
Path 66 | total_timesteps 1296.
Path 67 | total_timesteps 1308.
Path 68 | total_timesteps 1331.
Path 69 | total_timesteps 1352.
Path 70 | total_timesteps 1369.
Path 71 | total_timesteps 1383.
Path 72 | total_timesteps 1399.
Path 73 | total_timesteps 1409.
Path 74 | total_timesteps 1436.
Path 75 | total_timesteps 1455.
Path 76 | total_timesteps 1465.
Path 77 | total_timesteps 1488.
Path 78 | total_timesteps 1498.
Path 79 | total_timesteps 1510.
Path 80 | total_timesteps 1562.
Path 81 | total_timesteps 1576.
Path 82 | total_timesteps 1588.
Path 83 | total_timesteps 1598.
Path 84 | total_timesteps 1623.
Path 85 | total_timesteps 1638.
Path 86 | total_timesteps 1669.
Path 87 | total_timesteps 1681.
Path 88 | total_timesteps 1710.
Path 89 | total_timesteps 1730.
Path 90 | total_timesteps 1743.
Path 91 | total_timesteps 1756.
Path 92 | total_timesteps 1767.
Path 93 | total_timesteps 1782.
Path 94 | total_timesteps 1800.
Path 95 | total_timesteps 1822.
Path 96 | total_timesteps 1837.
Path 97 | total_timesteps 1853.
Path 98 | total_timesteps 1874.
Path 99 | total_timesteps 1884.
Path 100 | total_timesteps 1913.
Path 101 | total_timesteps 1939.
Path 102 | total_timesteps 1969.
Path 103 | total_timesteps 1984.
Path 104 | total_timesteps 2002.
Path 105 | total_timesteps 2035.
Path 106 | total_timesteps 2051.
Path 107 | total_timesteps 2063.
Path 108 | total_timesteps 2085.
Path 109 | total_timesteps 2104.
Path 110 | total_timesteps 2121.
Path 111 | total_timesteps 2154.
Path 112 | total_timesteps 2166.
Path 113 | total_timesteps 2177.
Path 114 | total_timesteps 2193.
Path 115 | total_timesteps 2206.
Path 116 | total_timesteps 2223.
Path 117 | total_timesteps 2258.
Path 118 | total_timesteps 2284.
Path 119 | total_timesteps 2307.
Path 120 | total_timesteps 2326.
Path 121 | total_timesteps 2347.
Path 122 | total_timesteps 2367.
Path 123 | total_timesteps 2389.
Path 124 | total_timesteps 2425.
Path 125 | total_timesteps 2439.
Path 126 | total_timesteps 2453.
Path 127 | total_timesteps 2473.
Path 128 | total_timesteps 2502.
Path 129 | total_timesteps 2539.
Path 130 | total_timesteps 2563.
Path 131 | total_timesteps 2588.
Path 132 | total_timesteps 2602.
Path 133 | total_timesteps 2615.
Path 134 | total_timesteps 2628.
Path 135 | total_timesteps 2649.
Path 136 | total_timesteps 2662.
Path 137 | total_timesteps 2677.
Path 138 | total_timesteps 2691.
Path 139 | total_timesteps 2708.
Path 140 | total_timesteps 2720.
Path 141 | total_timesteps 2733.
Path 142 | total_timesteps 2748.
Path 143 | total_timesteps 2764.
Path 144 | total_timesteps 2778.
Path 145 | total_timesteps 2793.
Path 146 | total_timesteps 2810.
Path 147 | total_timesteps 2836.
Path 148 | total_timesteps 2863.
Path 149 | total_timesteps 2879.
Path 150 | total_timesteps 2894.
Path 151 | total_timesteps 2907.
Path 152 | total_timesteps 2932.
Path 153 | total_timesteps 2944.
Path 154 | total_timesteps 2965.
Path 155 | total_timesteps 2974.
Path 156 | total_timesteps 2987.
Path 157 | total_timesteps 3001.
Path 158 | total_timesteps 3012.
Path 159 | total_timesteps 3024.
Path 160 | total_timesteps 3044.
Path 161 | total_timesteps 3076.
Path 162 | total_timesteps 3107.
Path 163 | total_timesteps 3121.
Path 164 | total_timesteps 3139.
Path 165 | total_timesteps 3149.
Path 166 | total_timesteps 3166.
Path 167 | total_timesteps 3200.
Path 168 | total_timesteps 3214.
Path 169 | total_timesteps 3237.
Path 170 | total_timesteps 3252.
Path 171 | total_timesteps 3270.
Path 172 | total_timesteps 3293.
Path 173 | total_timesteps 3321.
Path 174 | total_timesteps 3343.
Path 175 | total_timesteps 3375.
Path 176 | total_timesteps 3399.
Path 177 | total_timesteps 3410.
Path 178 | total_timesteps 3428.
Path 179 | total_timesteps 3462.
Path 180 | total_timesteps 3473.
Path 181 | total_timesteps 3493.
Path 182 | total_timesteps 3520.
Path 183 | total_timesteps 3546.
Path 184 | total_timesteps 3567.
Path 185 | total_timesteps 3583.
Path 186 | total_timesteps 3593.
Path 187 | total_timesteps 3625.
Path 188 | total_timesteps 3640.
Path 189 | total_timesteps 3657.
Path 190 | total_timesteps 3683.
Path 191 | total_timesteps 3714.
Path 192 | total_timesteps 3734.
Path 193 | total_timesteps 3751.
Path 194 | total_timesteps 3765.
Path 195 | total_timesteps 3788.
Path 196 | total_timesteps 3814.
Path 197 | total_timesteps 3825.
Path 198 | total_timesteps 3844.
Path 199 | total_timesteps 3865.
Path 200 | total_timesteps 3877.
Path 201 | total_timesteps 3916.
Path 202 | total_timesteps 3928.
Path 203 | total_timesteps 3939.
Path 204 | total_timesteps 3951.
Path 205 | total_timesteps 3990.
Path 206 | total_timesteps 4002.
Path 207 | total_timesteps 4030.
Path 208 | total_timesteps 4051.
Path 209 | total_timesteps 4063.
Path 210 | total_timesteps 4089.
Path 211 | total_timesteps 4098.
Path 212 | total_timesteps 4112.
Path 213 | total_timesteps 4135.
Path 214 | total_timesteps 4158.
Path 215 | total_timesteps 4208.
Path 216 | total_timesteps 4225.
Path 217 | total_timesteps 4245.
Path 218 | total_timesteps 4258.
Path 219 | total_timesteps 4268.
Path 220 | total_timesteps 4288.
Path 221 | total_timesteps 4329.
Path 222 | total_timesteps 4348.
Path 223 | total_timesteps 4363.
Path 224 | total_timesteps 4384.
Path 225 | total_timesteps 4403.
Path 226 | total_timesteps 4415.
Path 227 | total_timesteps 4443.
Path 228 | total_timesteps 4455.
Path 229 | total_timesteps 4466.
Path 230 | total_timesteps 4482.
Path 231 | total_timesteps 4514.
Path 232 | total_timesteps 4525.
Path 233 | total_timesteps 4538.
Path 234 | total_timesteps 4566.
Path 235 | total_timesteps 4584.
Path 236 | total_timesteps 4604.
Path 237 | total_timesteps 4619.
Path 238 | total_timesteps 4639.
Path 239 | total_timesteps 4682.
Path 240 | total_timesteps 4698.
Path 241 | total_timesteps 4716.
Path 242 | total_timesteps 4736.
Path 243 | total_timesteps 4749.
Path 244 | total_timesteps 4758.
Path 245 | total_timesteps 4783.
Path 246 | total_timesteps 4797.
Path 247 | total_timesteps 4823.
Path 248 | total_timesteps 4842.
Path 249 | total_timesteps 4863.
Path 250 | total_timesteps 4880.
Path 251 | total_timesteps 4893.
Path 252 | total_timesteps 4916.
Path 253 | total_timesteps 4935.
Path 254 | total_timesteps 4950.
Path 255 | total_timesteps 4970.
Path 256 | total_timesteps 4999.
Path 257 | total_timesteps 5026.
Path 258 | total_timesteps 5080.
Path 259 | total_timesteps 5102.
Path 260 | total_timesteps 5123.
Path 261 | total_timesteps 5152.
Path 262 | total_timesteps 5180.
Path 263 | total_timesteps 5204.
Path 264 | total_timesteps 5256.
Path 265 | total_timesteps 5273.
Path 266 | total_timesteps 5304.
Path 267 | total_timesteps 5327.
Path 268 | total_timesteps 5354.
Path 269 | total_timesteps 5376.
Path 270 | total_timesteps 5390.
Path 271 | total_timesteps 5426.
Path 272 | total_timesteps 5444.
Path 273 | total_timesteps 5464.
Path 274 | total_timesteps 5478.
Path 275 | total_timesteps 5489.
Path 276 | total_timesteps 5514.
Path 277 | total_timesteps 5532.
Path 278 | total_timesteps 5556.
Path 279 | total_timesteps 5572.
Path 280 | total_timesteps 5589.
Path 281 | total_timesteps 5616.
Path 282 | total_timesteps 5629.
Path 283 | total_timesteps 5660.
Path 284 | total_timesteps 5677.
Path 285 | total_timesteps 5701.
Path 286 | total_timesteps 5724.
Path 287 | total_timesteps 5738.
Path 288 | total_timesteps 5757.
Path 289 | total_timesteps 5773.
Path 290 | total_timesteps 5794.
Path 291 | total_timesteps 5810.
Path 292 | total_timesteps 5841.
Path 293 | total_timesteps 5859.
Path 294 | total_timesteps 5868.
Path 295 | total_timesteps 5899.
Path 296 | total_timesteps 5908.
Path 297 | total_timesteps 5925.
Path 298 | total_timesteps 5940.
Path 299 | total_timesteps 5977.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Validation loss = 0.5408116579055786
Validation loss = 0.40410804748535156
Validation loss = 0.3668535351753235
Validation loss = 0.3500964045524597
Validation loss = 0.3319145441055298
Validation loss = 0.3274116814136505
Validation loss = 0.3216879963874817
Validation loss = 0.32329022884368896
Validation loss = 0.3302510380744934
Validation loss = 0.33841800689697266
Validation loss = 0.3392595052719116
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 12.
Path 2 | total_timesteps 48.
Path 3 | total_timesteps 59.
Path 4 | total_timesteps 73.
Path 5 | total_timesteps 93.
Path 6 | total_timesteps 107.
Path 7 | total_timesteps 119.
Path 8 | total_timesteps 138.
Path 9 | total_timesteps 150.
Path 10 | total_timesteps 157.
Path 11 | total_timesteps 168.
Path 12 | total_timesteps 182.
Path 13 | total_timesteps 196.
Path 14 | total_timesteps 215.
Path 15 | total_timesteps 239.
Path 16 | total_timesteps 262.
Path 17 | total_timesteps 270.
Path 18 | total_timesteps 282.
Path 19 | total_timesteps 297.
Path 20 | total_timesteps 310.
Path 21 | total_timesteps 319.
Path 22 | total_timesteps 333.
Path 23 | total_timesteps 346.
Path 24 | total_timesteps 360.
Path 25 | total_timesteps 369.
Path 26 | total_timesteps 381.
Path 27 | total_timesteps 396.
Path 28 | total_timesteps 416.
Path 29 | total_timesteps 428.
Path 30 | total_timesteps 443.
Path 31 | total_timesteps 460.
Path 32 | total_timesteps 477.
Path 33 | total_timesteps 489.
Path 34 | total_timesteps 511.
Path 35 | total_timesteps 523.
Path 36 | total_timesteps 544.
Path 37 | total_timesteps 559.
Path 38 | total_timesteps 576.
Path 39 | total_timesteps 588.
Path 40 | total_timesteps 607.
Path 41 | total_timesteps 635.
Path 42 | total_timesteps 649.
Path 43 | total_timesteps 662.
Path 44 | total_timesteps 680.
Path 45 | total_timesteps 691.
Path 46 | total_timesteps 704.
Path 47 | total_timesteps 713.
Path 48 | total_timesteps 733.
Path 49 | total_timesteps 762.
Path 50 | total_timesteps 772.
Path 51 | total_timesteps 794.
Path 52 | total_timesteps 822.
Path 53 | total_timesteps 836.
Path 54 | total_timesteps 862.
Path 55 | total_timesteps 880.
Path 56 | total_timesteps 891.
Path 57 | total_timesteps 933.
Path 58 | total_timesteps 947.
Path 59 | total_timesteps 957.
Path 60 | total_timesteps 976.
Path 61 | total_timesteps 987.
Path 62 | total_timesteps 1004.
Path 63 | total_timesteps 1021.
Path 64 | total_timesteps 1030.
Path 65 | total_timesteps 1055.
Path 66 | total_timesteps 1079.
Path 67 | total_timesteps 1093.
Path 68 | total_timesteps 1124.
Path 69 | total_timesteps 1136.
Path 70 | total_timesteps 1156.
Path 71 | total_timesteps 1163.
Path 72 | total_timesteps 1169.
Path 73 | total_timesteps 1185.
Path 74 | total_timesteps 1203.
Path 75 | total_timesteps 1214.
Path 76 | total_timesteps 1227.
Path 77 | total_timesteps 1242.
Path 78 | total_timesteps 1258.
Path 79 | total_timesteps 1272.
Path 80 | total_timesteps 1282.
Path 81 | total_timesteps 1317.
Path 82 | total_timesteps 1332.
Path 83 | total_timesteps 1349.
Path 84 | total_timesteps 1383.
Path 85 | total_timesteps 1396.
Path 86 | total_timesteps 1405.
Path 87 | total_timesteps 1432.
Path 88 | total_timesteps 1450.
Path 89 | total_timesteps 1461.
Path 90 | total_timesteps 1489.
Path 91 | total_timesteps 1501.
Path 92 | total_timesteps 1526.
Path 93 | total_timesteps 1540.
Path 94 | total_timesteps 1553.
Path 95 | total_timesteps 1576.
Path 96 | total_timesteps 1594.
Path 97 | total_timesteps 1604.
Path 98 | total_timesteps 1631.
Path 99 | total_timesteps 1652.
Path 100 | total_timesteps 1663.
Path 101 | total_timesteps 1677.
Path 102 | total_timesteps 1687.
Path 103 | total_timesteps 1705.
Path 104 | total_timesteps 1723.
Path 105 | total_timesteps 1743.
Path 106 | total_timesteps 1752.
Path 107 | total_timesteps 1768.
Path 108 | total_timesteps 1785.
Path 109 | total_timesteps 1795.
Path 110 | total_timesteps 1810.
Path 111 | total_timesteps 1825.
Path 112 | total_timesteps 1833.
Path 113 | total_timesteps 1844.
Path 114 | total_timesteps 1856.
Path 115 | total_timesteps 1865.
Path 116 | total_timesteps 1880.
Path 117 | total_timesteps 1894.
Path 118 | total_timesteps 1911.
Path 119 | total_timesteps 1922.
Path 120 | total_timesteps 1939.
Path 121 | total_timesteps 1958.
Path 122 | total_timesteps 1976.
Path 123 | total_timesteps 1985.
Path 124 | total_timesteps 2002.
Path 125 | total_timesteps 2014.
Path 126 | total_timesteps 2028.
Path 127 | total_timesteps 2037.
Path 128 | total_timesteps 2048.
Path 129 | total_timesteps 2056.
Path 130 | total_timesteps 2066.
Path 131 | total_timesteps 2077.
Path 132 | total_timesteps 2086.
Path 133 | total_timesteps 2099.
Path 134 | total_timesteps 2112.
Path 135 | total_timesteps 2123.
Path 136 | total_timesteps 2140.
Path 137 | total_timesteps 2161.
Path 138 | total_timesteps 2187.
Path 139 | total_timesteps 2213.
Path 140 | total_timesteps 2229.
Path 141 | total_timesteps 2242.
Path 142 | total_timesteps 2259.
Path 143 | total_timesteps 2272.
Path 144 | total_timesteps 2295.
Path 145 | total_timesteps 2306.
Path 146 | total_timesteps 2314.
Path 147 | total_timesteps 2330.
Path 148 | total_timesteps 2345.
Path 149 | total_timesteps 2361.
Path 150 | total_timesteps 2372.
Path 151 | total_timesteps 2388.
Path 152 | total_timesteps 2400.
Path 153 | total_timesteps 2411.
Path 154 | total_timesteps 2447.
Path 155 | total_timesteps 2455.
Path 156 | total_timesteps 2484.
Path 157 | total_timesteps 2515.
Path 158 | total_timesteps 2525.
Path 159 | total_timesteps 2542.
Path 160 | total_timesteps 2558.
Path 161 | total_timesteps 2577.
Path 162 | total_timesteps 2615.
Path 163 | total_timesteps 2627.
Path 164 | total_timesteps 2638.
Path 165 | total_timesteps 2651.
Path 166 | total_timesteps 2673.
Path 167 | total_timesteps 2682.
Path 168 | total_timesteps 2716.
Path 169 | total_timesteps 2726.
Path 170 | total_timesteps 2742.
Path 171 | total_timesteps 2753.
Path 172 | total_timesteps 2770.
Path 173 | total_timesteps 2787.
Path 174 | total_timesteps 2806.
Path 175 | total_timesteps 2818.
Path 176 | total_timesteps 2831.
Path 177 | total_timesteps 2846.
Path 178 | total_timesteps 2857.
Path 179 | total_timesteps 2872.
Path 180 | total_timesteps 2890.
Path 181 | total_timesteps 2905.
Path 182 | total_timesteps 2925.
Path 183 | total_timesteps 2939.
Path 184 | total_timesteps 2958.
Path 185 | total_timesteps 2976.
Path 186 | total_timesteps 2988.
Path 187 | total_timesteps 3008.
Path 188 | total_timesteps 3022.
Path 189 | total_timesteps 3032.
Path 190 | total_timesteps 3043.
Path 191 | total_timesteps 3062.
Path 192 | total_timesteps 3075.
Path 193 | total_timesteps 3087.
Path 194 | total_timesteps 3129.
Path 195 | total_timesteps 3155.
Path 196 | total_timesteps 3168.
Path 197 | total_timesteps 3186.
Path 198 | total_timesteps 3197.
Path 199 | total_timesteps 3214.
Path 200 | total_timesteps 3239.
Path 201 | total_timesteps 3250.
Path 202 | total_timesteps 3265.
Path 203 | total_timesteps 3286.
Path 204 | total_timesteps 3304.
Path 205 | total_timesteps 3316.
Path 206 | total_timesteps 3336.
Path 207 | total_timesteps 3344.
Path 208 | total_timesteps 3358.
Path 209 | total_timesteps 3369.
Path 210 | total_timesteps 3384.
Path 211 | total_timesteps 3404.
Path 212 | total_timesteps 3414.
Path 213 | total_timesteps 3425.
Path 214 | total_timesteps 3440.
Path 215 | total_timesteps 3463.
Path 216 | total_timesteps 3482.
Path 217 | total_timesteps 3496.
Path 218 | total_timesteps 3508.
Path 219 | total_timesteps 3522.
Path 220 | total_timesteps 3534.
Path 221 | total_timesteps 3549.
Path 222 | total_timesteps 3564.
Path 223 | total_timesteps 3580.
Path 224 | total_timesteps 3592.
Path 225 | total_timesteps 3603.
Path 226 | total_timesteps 3632.
Path 227 | total_timesteps 3644.
Path 228 | total_timesteps 3654.
Path 229 | total_timesteps 3671.
Path 230 | total_timesteps 3701.
Path 231 | total_timesteps 3717.
Path 232 | total_timesteps 3734.
Path 233 | total_timesteps 3746.
Path 234 | total_timesteps 3763.
Path 235 | total_timesteps 3782.
Path 236 | total_timesteps 3793.
Path 237 | total_timesteps 3803.
Path 238 | total_timesteps 3814.
Path 239 | total_timesteps 3833.
Path 240 | total_timesteps 3889.
Path 241 | total_timesteps 3904.
Path 242 | total_timesteps 3920.
Path 243 | total_timesteps 3931.
Path 244 | total_timesteps 3958.
Path 245 | total_timesteps 3982.
Path 246 | total_timesteps 3998.
Path 247 | total_timesteps 4018.
Path 248 | total_timesteps 4032.
Path 249 | total_timesteps 4047.
Path 250 | total_timesteps 4057.
Path 251 | total_timesteps 4073.
Path 252 | total_timesteps 4090.
Path 253 | total_timesteps 4100.
Path 254 | total_timesteps 4111.
Path 255 | total_timesteps 4126.
Path 256 | total_timesteps 4145.
Path 257 | total_timesteps 4157.
Path 258 | total_timesteps 4184.
Path 259 | total_timesteps 4211.
Path 260 | total_timesteps 4239.
Path 261 | total_timesteps 4252.
Path 262 | total_timesteps 4265.
Path 263 | total_timesteps 4277.
Path 264 | total_timesteps 4290.
Path 265 | total_timesteps 4301.
Path 266 | total_timesteps 4321.
Path 267 | total_timesteps 4338.
Path 268 | total_timesteps 4356.
Path 269 | total_timesteps 4369.
Path 270 | total_timesteps 4381.
Path 271 | total_timesteps 4394.
Path 272 | total_timesteps 4407.
Path 273 | total_timesteps 4418.
Path 274 | total_timesteps 4429.
Path 275 | total_timesteps 4442.
Path 276 | total_timesteps 4452.
Path 277 | total_timesteps 4463.
Path 278 | total_timesteps 4478.
Path 279 | total_timesteps 4490.
Path 280 | total_timesteps 4506.
Path 281 | total_timesteps 4518.
Path 282 | total_timesteps 4535.
Path 283 | total_timesteps 4547.
Path 284 | total_timesteps 4562.
Path 285 | total_timesteps 4577.
Path 286 | total_timesteps 4587.
Path 287 | total_timesteps 4603.
Path 288 | total_timesteps 4624.
Path 289 | total_timesteps 4650.
Path 290 | total_timesteps 4666.
Path 291 | total_timesteps 4680.
Path 292 | total_timesteps 4697.
Path 293 | total_timesteps 4714.
Path 294 | total_timesteps 4739.
Path 295 | total_timesteps 4752.
Path 296 | total_timesteps 4767.
Path 297 | total_timesteps 4783.
Path 298 | total_timesteps 4796.
Path 299 | total_timesteps 4808.
Path 300 | total_timesteps 4825.
Path 301 | total_timesteps 4850.
Path 302 | total_timesteps 4863.
Path 303 | total_timesteps 4873.
Path 304 | total_timesteps 4887.
Path 305 | total_timesteps 4899.
Path 306 | total_timesteps 4915.
Path 307 | total_timesteps 4931.
Path 308 | total_timesteps 4944.
Path 309 | total_timesteps 4958.
Path 310 | total_timesteps 4973.
Path 311 | total_timesteps 5002.
Path 312 | total_timesteps 5026.
Path 313 | total_timesteps 5036.
Path 314 | total_timesteps 5047.
Path 315 | total_timesteps 5064.
Path 316 | total_timesteps 5078.
Path 317 | total_timesteps 5108.
Path 318 | total_timesteps 5119.
Path 319 | total_timesteps 5133.
Path 320 | total_timesteps 5145.
Path 321 | total_timesteps 5167.
Path 322 | total_timesteps 5176.
Path 323 | total_timesteps 5189.
Path 324 | total_timesteps 5196.
Path 325 | total_timesteps 5217.
Path 326 | total_timesteps 5228.
Path 327 | total_timesteps 5241.
Path 328 | total_timesteps 5269.
Path 329 | total_timesteps 5278.
Path 330 | total_timesteps 5305.
Path 331 | total_timesteps 5315.
Path 332 | total_timesteps 5331.
Path 333 | total_timesteps 5353.
Path 334 | total_timesteps 5366.
Path 335 | total_timesteps 5377.
Path 336 | total_timesteps 5386.
Path 337 | total_timesteps 5400.
Path 338 | total_timesteps 5415.
Path 339 | total_timesteps 5440.
Path 340 | total_timesteps 5455.
Path 341 | total_timesteps 5473.
Path 342 | total_timesteps 5485.
Path 343 | total_timesteps 5508.
Path 344 | total_timesteps 5519.
Path 345 | total_timesteps 5540.
Path 346 | total_timesteps 5551.
Path 347 | total_timesteps 5573.
Path 348 | total_timesteps 5583.
Path 349 | total_timesteps 5594.
Path 350 | total_timesteps 5608.
Path 351 | total_timesteps 5621.
Path 352 | total_timesteps 5646.
Path 353 | total_timesteps 5661.
Path 354 | total_timesteps 5678.
Path 355 | total_timesteps 5689.
Path 356 | total_timesteps 5698.
Path 357 | total_timesteps 5711.
Path 358 | total_timesteps 5726.
Path 359 | total_timesteps 5739.
Path 360 | total_timesteps 5757.
Path 361 | total_timesteps 5775.
Path 362 | total_timesteps 5789.
Path 363 | total_timesteps 5802.
Path 364 | total_timesteps 5814.
Path 365 | total_timesteps 5847.
Path 366 | total_timesteps 5865.
Path 367 | total_timesteps 5898.
Path 368 | total_timesteps 5912.
Path 369 | total_timesteps 5933.
Path 370 | total_timesteps 5970.
Path 371 | total_timesteps 5983.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.02    |
| Iteration     | 0        |
| MaximumReturn | 2.02     |
| MinimumReturn | -21.4    |
| TotalSamples  | 8008     |
----------------------------
itr #1 | 
Fitting dynamics.
Validation loss = 0.3390145003795624
Validation loss = 0.3147883713245392
Validation loss = 0.32168594002723694
Validation loss = 0.315912663936615
Validation loss = 0.3181208670139313
Validation loss = 0.32854321599006653
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 20.
Path 2 | total_timesteps 58.
Path 3 | total_timesteps 76.
Path 4 | total_timesteps 118.
Path 5 | total_timesteps 148.
Path 6 | total_timesteps 163.
Path 7 | total_timesteps 174.
Path 8 | total_timesteps 191.
Path 9 | total_timesteps 221.
Path 10 | total_timesteps 239.
Path 11 | total_timesteps 292.
Path 12 | total_timesteps 314.
Path 13 | total_timesteps 346.
Path 14 | total_timesteps 371.
Path 15 | total_timesteps 392.
Path 16 | total_timesteps 440.
Path 17 | total_timesteps 467.
Path 18 | total_timesteps 489.
Path 19 | total_timesteps 543.
Path 20 | total_timesteps 563.
Path 21 | total_timesteps 581.
Path 22 | total_timesteps 608.
Path 23 | total_timesteps 638.
Path 24 | total_timesteps 669.
Path 25 | total_timesteps 683.
Path 26 | total_timesteps 753.
Path 27 | total_timesteps 772.
Path 28 | total_timesteps 809.
Path 29 | total_timesteps 834.
Path 30 | total_timesteps 910.
Path 31 | total_timesteps 925.
Path 32 | total_timesteps 1034.
Path 33 | total_timesteps 1098.
Path 34 | total_timesteps 1127.
Path 35 | total_timesteps 1164.
Path 36 | total_timesteps 1200.
Path 37 | total_timesteps 1209.
Path 38 | total_timesteps 1263.
Path 39 | total_timesteps 1285.
Path 40 | total_timesteps 1318.
Path 41 | total_timesteps 1355.
Path 42 | total_timesteps 1370.
Path 43 | total_timesteps 1406.
Path 44 | total_timesteps 1432.
Path 45 | total_timesteps 1462.
Path 46 | total_timesteps 1499.
Path 47 | total_timesteps 1560.
Path 48 | total_timesteps 1589.
Path 49 | total_timesteps 1639.
Path 50 | total_timesteps 1661.
Path 51 | total_timesteps 1723.
Path 52 | total_timesteps 1745.
Path 53 | total_timesteps 1769.
Path 54 | total_timesteps 1786.
Path 55 | total_timesteps 1806.
Path 56 | total_timesteps 1822.
Path 57 | total_timesteps 1854.
Path 58 | total_timesteps 1890.
Path 59 | total_timesteps 1907.
Path 60 | total_timesteps 1933.
Path 61 | total_timesteps 1948.
Path 62 | total_timesteps 1969.
Path 63 | total_timesteps 2023.
Path 64 | total_timesteps 2042.
Path 65 | total_timesteps 2051.
Path 66 | total_timesteps 2066.
Path 67 | total_timesteps 2103.
Path 68 | total_timesteps 2122.
Path 69 | total_timesteps 2144.
Path 70 | total_timesteps 2165.
Path 71 | total_timesteps 2188.
Path 72 | total_timesteps 2208.
Path 73 | total_timesteps 2228.
Path 74 | total_timesteps 2243.
Path 75 | total_timesteps 2279.
Path 76 | total_timesteps 2296.
Path 77 | total_timesteps 2329.
Path 78 | total_timesteps 2360.
Path 79 | total_timesteps 2388.
Path 80 | total_timesteps 2419.
Path 81 | total_timesteps 2440.
Path 82 | total_timesteps 2466.
Path 83 | total_timesteps 2504.
Path 84 | total_timesteps 2523.
Path 85 | total_timesteps 2560.
Path 86 | total_timesteps 2585.
Path 87 | total_timesteps 2607.
Path 88 | total_timesteps 2645.
Path 89 | total_timesteps 2662.
Path 90 | total_timesteps 2693.
Path 91 | total_timesteps 2715.
Path 92 | total_timesteps 2743.
Path 93 | total_timesteps 2806.
Path 94 | total_timesteps 2836.
Path 95 | total_timesteps 2851.
Path 96 | total_timesteps 2869.
Path 97 | total_timesteps 2895.
Path 98 | total_timesteps 2928.
Path 99 | total_timesteps 2967.
Path 100 | total_timesteps 3016.
Path 101 | total_timesteps 3093.
Path 102 | total_timesteps 3129.
Path 103 | total_timesteps 3154.
Path 104 | total_timesteps 3168.
Path 105 | total_timesteps 3200.
Path 106 | total_timesteps 3243.
Path 107 | total_timesteps 3270.
Path 108 | total_timesteps 3296.
Path 109 | total_timesteps 3343.
Path 110 | total_timesteps 3394.
Path 111 | total_timesteps 3403.
Path 112 | total_timesteps 3420.
Path 113 | total_timesteps 3439.
Path 114 | total_timesteps 3462.
Path 115 | total_timesteps 3480.
Path 116 | total_timesteps 3497.
Path 117 | total_timesteps 3520.
Path 118 | total_timesteps 3546.
Path 119 | total_timesteps 3593.
Path 120 | total_timesteps 3603.
Path 121 | total_timesteps 3638.
Path 122 | total_timesteps 3669.
Path 123 | total_timesteps 3689.
Path 124 | total_timesteps 3728.
Path 125 | total_timesteps 3766.
Path 126 | total_timesteps 3798.
Path 127 | total_timesteps 3838.
Path 128 | total_timesteps 3851.
Path 129 | total_timesteps 3888.
Path 130 | total_timesteps 3917.
Path 131 | total_timesteps 3936.
Path 132 | total_timesteps 3982.
Path 133 | total_timesteps 4015.
Path 134 | total_timesteps 4027.
Path 135 | total_timesteps 4045.
Path 136 | total_timesteps 4074.
Path 137 | total_timesteps 4104.
Path 138 | total_timesteps 4127.
Path 139 | total_timesteps 4149.
Path 140 | total_timesteps 4191.
Path 141 | total_timesteps 4225.
Path 142 | total_timesteps 4263.
Path 143 | total_timesteps 4292.
Path 144 | total_timesteps 4339.
Path 145 | total_timesteps 4363.
Path 146 | total_timesteps 4393.
Path 147 | total_timesteps 4450.
Path 148 | total_timesteps 4468.
Path 149 | total_timesteps 4488.
Path 150 | total_timesteps 4542.
Path 151 | total_timesteps 4572.
Path 152 | total_timesteps 4589.
Path 153 | total_timesteps 4616.
Path 154 | total_timesteps 4644.
Path 155 | total_timesteps 4685.
Path 156 | total_timesteps 4715.
Path 157 | total_timesteps 4745.
Path 158 | total_timesteps 4760.
Path 159 | total_timesteps 4787.
Path 160 | total_timesteps 4811.
Path 161 | total_timesteps 4848.
Path 162 | total_timesteps 4884.
Path 163 | total_timesteps 4922.
Path 164 | total_timesteps 4945.
Path 165 | total_timesteps 4957.
Path 166 | total_timesteps 4987.
Path 167 | total_timesteps 5011.
Path 168 | total_timesteps 5045.
Path 169 | total_timesteps 5066.
Path 170 | total_timesteps 5091.
Path 171 | total_timesteps 5154.
Path 172 | total_timesteps 5189.
Path 173 | total_timesteps 5220.
Path 174 | total_timesteps 5287.
Path 175 | total_timesteps 5321.
Path 176 | total_timesteps 5341.
Path 177 | total_timesteps 5373.
Path 178 | total_timesteps 5404.
Path 179 | total_timesteps 5443.
Path 180 | total_timesteps 5502.
Path 181 | total_timesteps 5521.
Path 182 | total_timesteps 5542.
Path 183 | total_timesteps 5560.
Path 184 | total_timesteps 5586.
Path 185 | total_timesteps 5610.
Path 186 | total_timesteps 5637.
Path 187 | total_timesteps 5673.
Path 188 | total_timesteps 5696.
Path 189 | total_timesteps 5714.
Path 190 | total_timesteps 5746.
Path 191 | total_timesteps 5781.
Path 192 | total_timesteps 5815.
Path 193 | total_timesteps 5842.
Path 194 | total_timesteps 5856.
Path 195 | total_timesteps 5896.
Path 196 | total_timesteps 5910.
Path 197 | total_timesteps 5939.
Path 198 | total_timesteps 5956.
Path 199 | total_timesteps 5987.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.76    |
| Iteration     | 1        |
| MaximumReturn | 38.2     |
| MinimumReturn | -50      |
| TotalSamples  | 12012    |
----------------------------
itr #2 | 
Fitting dynamics.
Validation loss = 0.3244502544403076
Validation loss = 0.3248336613178253
Validation loss = 0.3235628604888916
Validation loss = 0.3328792154788971
Validation loss = 0.3476691246032715
Validation loss = 0.3751208782196045
Validation loss = 0.34643641114234924
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 54.
Path 2 | total_timesteps 84.
Path 3 | total_timesteps 143.
Path 4 | total_timesteps 202.
Path 5 | total_timesteps 242.
Path 6 | total_timesteps 297.
Path 7 | total_timesteps 344.
Path 8 | total_timesteps 372.
Path 9 | total_timesteps 405.
Path 10 | total_timesteps 470.
Path 11 | total_timesteps 518.
Path 12 | total_timesteps 559.
Path 13 | total_timesteps 612.
Path 14 | total_timesteps 671.
Path 15 | total_timesteps 822.
Path 16 | total_timesteps 850.
Path 17 | total_timesteps 894.
Path 18 | total_timesteps 950.
Path 19 | total_timesteps 997.
Path 20 | total_timesteps 1059.
Path 21 | total_timesteps 1105.
Path 22 | total_timesteps 1153.
Path 23 | total_timesteps 1211.
Path 24 | total_timesteps 1253.
Path 25 | total_timesteps 1291.
Path 26 | total_timesteps 1328.
Path 27 | total_timesteps 1374.
Path 28 | total_timesteps 1402.
Path 29 | total_timesteps 1428.
Path 30 | total_timesteps 1468.
Path 31 | total_timesteps 1492.
Path 32 | total_timesteps 1544.
Path 33 | total_timesteps 1573.
Path 34 | total_timesteps 1661.
Path 35 | total_timesteps 1689.
Path 36 | total_timesteps 1712.
Path 37 | total_timesteps 1745.
Path 38 | total_timesteps 1786.
Path 39 | total_timesteps 1806.
Path 40 | total_timesteps 1848.
Path 41 | total_timesteps 1920.
Path 42 | total_timesteps 1943.
Path 43 | total_timesteps 1993.
Path 44 | total_timesteps 2013.
Path 45 | total_timesteps 2041.
Path 46 | total_timesteps 2070.
Path 47 | total_timesteps 2123.
Path 48 | total_timesteps 2135.
Path 49 | total_timesteps 2167.
Path 50 | total_timesteps 2209.
Path 51 | total_timesteps 2253.
Path 52 | total_timesteps 2275.
Path 53 | total_timesteps 2299.
Path 54 | total_timesteps 2345.
Path 55 | total_timesteps 2371.
Path 56 | total_timesteps 2402.
Path 57 | total_timesteps 2437.
Path 58 | total_timesteps 2457.
Path 59 | total_timesteps 2507.
Path 60 | total_timesteps 2551.
Path 61 | total_timesteps 2584.
Path 62 | total_timesteps 2665.
Path 63 | total_timesteps 2709.
Path 64 | total_timesteps 2788.
Path 65 | total_timesteps 2812.
Path 66 | total_timesteps 2834.
Path 67 | total_timesteps 2860.
Path 68 | total_timesteps 2932.
Path 69 | total_timesteps 2964.
Path 70 | total_timesteps 2985.
Path 71 | total_timesteps 3000.
Path 72 | total_timesteps 3031.
Path 73 | total_timesteps 3096.
Path 74 | total_timesteps 3149.
Path 75 | total_timesteps 3203.
Path 76 | total_timesteps 3260.
Path 77 | total_timesteps 3284.
Path 78 | total_timesteps 3364.
Path 79 | total_timesteps 3402.
Path 80 | total_timesteps 3434.
Path 81 | total_timesteps 3468.
Path 82 | total_timesteps 3501.
Path 83 | total_timesteps 3524.
Path 84 | total_timesteps 3585.
Path 85 | total_timesteps 3630.
Path 86 | total_timesteps 3791.
Path 87 | total_timesteps 3841.
Path 88 | total_timesteps 3915.
Path 89 | total_timesteps 3936.
Path 90 | total_timesteps 3986.
Path 91 | total_timesteps 4005.
Path 92 | total_timesteps 4023.
Path 93 | total_timesteps 4064.
Path 94 | total_timesteps 4117.
Path 95 | total_timesteps 4147.
Path 96 | total_timesteps 4209.
Path 97 | total_timesteps 4253.
Path 98 | total_timesteps 4267.
Path 99 | total_timesteps 4300.
Path 100 | total_timesteps 4330.
Path 101 | total_timesteps 4369.
Path 102 | total_timesteps 4422.
Path 103 | total_timesteps 4463.
Path 104 | total_timesteps 4503.
Path 105 | total_timesteps 4541.
Path 106 | total_timesteps 4567.
Path 107 | total_timesteps 4626.
Path 108 | total_timesteps 4657.
Path 109 | total_timesteps 4703.
Path 110 | total_timesteps 4731.
Path 111 | total_timesteps 4764.
Path 112 | total_timesteps 4869.
Path 113 | total_timesteps 4895.
Path 114 | total_timesteps 4907.
Path 115 | total_timesteps 4923.
Path 116 | total_timesteps 4951.
Path 117 | total_timesteps 4992.
Path 118 | total_timesteps 5039.
Path 119 | total_timesteps 5070.
Path 120 | total_timesteps 5085.
Path 121 | total_timesteps 5119.
Path 122 | total_timesteps 5153.
Path 123 | total_timesteps 5186.
Path 124 | total_timesteps 5220.
Path 125 | total_timesteps 5254.
Path 126 | total_timesteps 5313.
Path 127 | total_timesteps 5401.
Path 128 | total_timesteps 5429.
Path 129 | total_timesteps 5501.
Path 130 | total_timesteps 5534.
Path 131 | total_timesteps 5566.
Path 132 | total_timesteps 5602.
Path 133 | total_timesteps 5634.
Path 134 | total_timesteps 5676.
Path 135 | total_timesteps 5786.
Path 136 | total_timesteps 5835.
Path 137 | total_timesteps 5873.
Path 138 | total_timesteps 5920.
Path 139 | total_timesteps 5941.
Path 140 | total_timesteps 5961.
Path 141 | total_timesteps 5999.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0991  |
| Iteration     | 2        |
| MaximumReturn | 172      |
| MinimumReturn | -37.8    |
| TotalSamples  | 16074    |
----------------------------
itr #3 | 
Fitting dynamics.
Validation loss = 0.3457048535346985
Validation loss = 0.3534885346889496
Validation loss = 0.4100119471549988
Validation loss = 0.3984743356704712
Validation loss = 0.375931978225708
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 58.
Path 2 | total_timesteps 86.
Path 3 | total_timesteps 111.
Path 4 | total_timesteps 174.
Path 5 | total_timesteps 198.
Path 6 | total_timesteps 244.
Path 7 | total_timesteps 284.
Path 8 | total_timesteps 312.
Path 9 | total_timesteps 344.
Path 10 | total_timesteps 430.
Path 11 | total_timesteps 460.
Path 12 | total_timesteps 484.
Path 13 | total_timesteps 511.
Path 14 | total_timesteps 542.
Path 15 | total_timesteps 582.
Path 16 | total_timesteps 629.
Path 17 | total_timesteps 694.
Path 18 | total_timesteps 737.
Path 19 | total_timesteps 754.
Path 20 | total_timesteps 780.
Path 21 | total_timesteps 831.
Path 22 | total_timesteps 849.
Path 23 | total_timesteps 878.
Path 24 | total_timesteps 906.
Path 25 | total_timesteps 1013.
Path 26 | total_timesteps 1034.
Path 27 | total_timesteps 1094.
Path 28 | total_timesteps 1143.
Path 29 | total_timesteps 1174.
Path 30 | total_timesteps 1218.
Path 31 | total_timesteps 1249.
Path 32 | total_timesteps 1319.
Path 33 | total_timesteps 1359.
Path 34 | total_timesteps 1410.
Path 35 | total_timesteps 1435.
Path 36 | total_timesteps 1466.
Path 37 | total_timesteps 1502.
Path 38 | total_timesteps 1581.
Path 39 | total_timesteps 1616.
Path 40 | total_timesteps 1656.
Path 41 | total_timesteps 1682.
Path 42 | total_timesteps 1739.
Path 43 | total_timesteps 1784.
Path 44 | total_timesteps 1819.
Path 45 | total_timesteps 1874.
Path 46 | total_timesteps 1909.
Path 47 | total_timesteps 1941.
Path 48 | total_timesteps 1998.
Path 49 | total_timesteps 2019.
Path 50 | total_timesteps 2045.
Path 51 | total_timesteps 2072.
Path 52 | total_timesteps 2121.
Path 53 | total_timesteps 2150.
Path 54 | total_timesteps 2178.
Path 55 | total_timesteps 2211.
Path 56 | total_timesteps 2247.
Path 57 | total_timesteps 2260.
Path 58 | total_timesteps 2280.
Path 59 | total_timesteps 2320.
Path 60 | total_timesteps 2354.
Path 61 | total_timesteps 2386.
Path 62 | total_timesteps 2424.
Path 63 | total_timesteps 2449.
Path 64 | total_timesteps 2515.
Path 65 | total_timesteps 2565.
Path 66 | total_timesteps 2616.
Path 67 | total_timesteps 2648.
Path 68 | total_timesteps 2679.
Path 69 | total_timesteps 2700.
Path 70 | total_timesteps 2737.
Path 71 | total_timesteps 2798.
Path 72 | total_timesteps 2819.
Path 73 | total_timesteps 2842.
Path 74 | total_timesteps 2911.
Path 75 | total_timesteps 2957.
Path 76 | total_timesteps 3002.
Path 77 | total_timesteps 3026.
Path 78 | total_timesteps 3047.
Path 79 | total_timesteps 3094.
Path 80 | total_timesteps 3135.
Path 81 | total_timesteps 3184.
Path 82 | total_timesteps 3235.
Path 83 | total_timesteps 3278.
Path 84 | total_timesteps 3304.
Path 85 | total_timesteps 3329.
Path 86 | total_timesteps 3351.
Path 87 | total_timesteps 3397.
Path 88 | total_timesteps 3454.
Path 89 | total_timesteps 3555.
Path 90 | total_timesteps 3579.
Path 91 | total_timesteps 3658.
Path 92 | total_timesteps 3726.
Path 93 | total_timesteps 3780.
Path 94 | total_timesteps 3811.
Path 95 | total_timesteps 3842.
Path 96 | total_timesteps 3857.
Path 97 | total_timesteps 3887.
Path 98 | total_timesteps 3911.
Path 99 | total_timesteps 3928.
Path 100 | total_timesteps 3963.
Path 101 | total_timesteps 3989.
Path 102 | total_timesteps 4021.
Path 103 | total_timesteps 4068.
Path 104 | total_timesteps 4093.
Path 105 | total_timesteps 4120.
Path 106 | total_timesteps 4151.
Path 107 | total_timesteps 4187.
Path 108 | total_timesteps 4212.
Path 109 | total_timesteps 4281.
Path 110 | total_timesteps 4302.
Path 111 | total_timesteps 4357.
Path 112 | total_timesteps 4390.
Path 113 | total_timesteps 4425.
Path 114 | total_timesteps 4468.
Path 115 | total_timesteps 4502.
Path 116 | total_timesteps 4571.
Path 117 | total_timesteps 4616.
Path 118 | total_timesteps 4626.
Path 119 | total_timesteps 4654.
Path 120 | total_timesteps 4723.
Path 121 | total_timesteps 4753.
Path 122 | total_timesteps 4775.
Path 123 | total_timesteps 4794.
Path 124 | total_timesteps 4820.
Path 125 | total_timesteps 4877.
Path 126 | total_timesteps 4906.
Path 127 | total_timesteps 4930.
Path 128 | total_timesteps 4974.
Path 129 | total_timesteps 5008.
Path 130 | total_timesteps 5037.
Path 131 | total_timesteps 5076.
Path 132 | total_timesteps 5104.
Path 133 | total_timesteps 5136.
Path 134 | total_timesteps 5164.
Path 135 | total_timesteps 5190.
Path 136 | total_timesteps 5227.
Path 137 | total_timesteps 5270.
Path 138 | total_timesteps 5292.
Path 139 | total_timesteps 5312.
Path 140 | total_timesteps 5341.
Path 141 | total_timesteps 5377.
Path 142 | total_timesteps 5399.
Path 143 | total_timesteps 5421.
Path 144 | total_timesteps 5459.
Path 145 | total_timesteps 5555.
Path 146 | total_timesteps 5582.
Path 147 | total_timesteps 5607.
Path 148 | total_timesteps 5639.
Path 149 | total_timesteps 5649.
Path 150 | total_timesteps 5675.
Path 151 | total_timesteps 5710.
Path 152 | total_timesteps 5724.
Path 153 | total_timesteps 5756.
Path 154 | total_timesteps 5810.
Path 155 | total_timesteps 5837.
Path 156 | total_timesteps 5952.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.56    |
| Iteration     | 3        |
| MaximumReturn | 102      |
| MinimumReturn | -44.7    |
| TotalSamples  | 20098    |
----------------------------
itr #4 | 
Fitting dynamics.
Validation loss = 0.37572282552719116
Validation loss = 0.375866562128067
Validation loss = 0.3876376748085022
Validation loss = 0.3953755497932434
Validation loss = 0.3990005850791931
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 39.
Path 2 | total_timesteps 82.
Path 3 | total_timesteps 122.
Path 4 | total_timesteps 161.
Path 5 | total_timesteps 239.
Path 6 | total_timesteps 288.
Path 7 | total_timesteps 333.
Path 8 | total_timesteps 369.
Path 9 | total_timesteps 405.
Path 10 | total_timesteps 432.
Path 11 | total_timesteps 458.
Path 12 | total_timesteps 493.
Path 13 | total_timesteps 530.
Path 14 | total_timesteps 583.
Path 15 | total_timesteps 596.
Path 16 | total_timesteps 672.
Path 17 | total_timesteps 719.
Path 18 | total_timesteps 767.
Path 19 | total_timesteps 793.
Path 20 | total_timesteps 858.
Path 21 | total_timesteps 891.
Path 22 | total_timesteps 905.
Path 23 | total_timesteps 948.
Path 24 | total_timesteps 996.
Path 25 | total_timesteps 1032.
Path 26 | total_timesteps 1066.
Path 27 | total_timesteps 1083.
Path 28 | total_timesteps 1092.
Path 29 | total_timesteps 1129.
Path 30 | total_timesteps 1157.
Path 31 | total_timesteps 1178.
Path 32 | total_timesteps 1285.
Path 33 | total_timesteps 1342.
Path 34 | total_timesteps 1368.
Path 35 | total_timesteps 1400.
Path 36 | total_timesteps 1433.
Path 37 | total_timesteps 1468.
Path 38 | total_timesteps 1513.
Path 39 | total_timesteps 1587.
Path 40 | total_timesteps 1631.
Path 41 | total_timesteps 1666.
Path 42 | total_timesteps 1725.
Path 43 | total_timesteps 1756.
Path 44 | total_timesteps 1793.
Path 45 | total_timesteps 1835.
Path 46 | total_timesteps 1875.
Path 47 | total_timesteps 1918.
Path 48 | total_timesteps 1943.
Path 49 | total_timesteps 1983.
Path 50 | total_timesteps 2024.
Path 51 | total_timesteps 2071.
Path 52 | total_timesteps 2101.
Path 53 | total_timesteps 2133.
Path 54 | total_timesteps 2166.
Path 55 | total_timesteps 2191.
Path 56 | total_timesteps 2214.
Path 57 | total_timesteps 2261.
Path 58 | total_timesteps 2280.
Path 59 | total_timesteps 2331.
Path 60 | total_timesteps 2352.
Path 61 | total_timesteps 2373.
Path 62 | total_timesteps 2401.
Path 63 | total_timesteps 2458.
Path 64 | total_timesteps 2484.
Path 65 | total_timesteps 2526.
Path 66 | total_timesteps 2579.
Path 67 | total_timesteps 2614.
Path 68 | total_timesteps 2680.
Path 69 | total_timesteps 2724.
Path 70 | total_timesteps 2742.
Path 71 | total_timesteps 2781.
Path 72 | total_timesteps 2835.
Path 73 | total_timesteps 2861.
Path 74 | total_timesteps 2915.
Path 75 | total_timesteps 2949.
Path 76 | total_timesteps 2996.
Path 77 | total_timesteps 3030.
Path 78 | total_timesteps 3099.
Path 79 | total_timesteps 3127.
Path 80 | total_timesteps 3211.
Path 81 | total_timesteps 3257.
Path 82 | total_timesteps 3283.
Path 83 | total_timesteps 3313.
Path 84 | total_timesteps 3346.
Path 85 | total_timesteps 3363.
Path 86 | total_timesteps 3388.
Path 87 | total_timesteps 3424.
Path 88 | total_timesteps 3449.
Path 89 | total_timesteps 3524.
Path 90 | total_timesteps 3554.
Path 91 | total_timesteps 3587.
Path 92 | total_timesteps 3620.
Path 93 | total_timesteps 3659.
Path 94 | total_timesteps 3717.
Path 95 | total_timesteps 3749.
Path 96 | total_timesteps 3786.
Path 97 | total_timesteps 3814.
Path 98 | total_timesteps 3842.
Path 99 | total_timesteps 3893.
Path 100 | total_timesteps 3938.
Path 101 | total_timesteps 3971.
Path 102 | total_timesteps 4027.
Path 103 | total_timesteps 4089.
Path 104 | total_timesteps 4122.
Path 105 | total_timesteps 4152.
Path 106 | total_timesteps 4191.
Path 107 | total_timesteps 4217.
Path 108 | total_timesteps 4253.
Path 109 | total_timesteps 4285.
Path 110 | total_timesteps 4331.
Path 111 | total_timesteps 4373.
Path 112 | total_timesteps 4395.
Path 113 | total_timesteps 4431.
Path 114 | total_timesteps 4463.
Path 115 | total_timesteps 4537.
Path 116 | total_timesteps 4601.
Path 117 | total_timesteps 4630.
Path 118 | total_timesteps 4693.
Path 119 | total_timesteps 4730.
Path 120 | total_timesteps 4764.
Path 121 | total_timesteps 4794.
Path 122 | total_timesteps 4815.
Path 123 | total_timesteps 4840.
Path 124 | total_timesteps 4882.
Path 125 | total_timesteps 4924.
Path 126 | total_timesteps 4961.
Path 127 | total_timesteps 4993.
Path 128 | total_timesteps 5018.
Path 129 | total_timesteps 5047.
Path 130 | total_timesteps 5101.
Path 131 | total_timesteps 5231.
Path 132 | total_timesteps 5262.
Path 133 | total_timesteps 5308.
Path 134 | total_timesteps 5365.
Path 135 | total_timesteps 5425.
Path 136 | total_timesteps 5477.
Path 137 | total_timesteps 5518.
Path 138 | total_timesteps 5566.
Path 139 | total_timesteps 5623.
Path 140 | total_timesteps 5658.
Path 141 | total_timesteps 5697.
Path 142 | total_timesteps 5735.
Path 143 | total_timesteps 5760.
Path 144 | total_timesteps 5784.
Path 145 | total_timesteps 5825.
Path 146 | total_timesteps 5872.
Path 147 | total_timesteps 5913.
Path 148 | total_timesteps 5948.
Path 149 | total_timesteps 5981.
Path 150 | total_timesteps 5996.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.66    |
| Iteration     | 4        |
| MaximumReturn | 48.7     |
| MinimumReturn | -33.8    |
| TotalSamples  | 24115    |
----------------------------
itr #5 | 
Fitting dynamics.
Validation loss = 0.3824087083339691
Validation loss = 0.38994789123535156
Validation loss = 0.39259493350982666
Validation loss = 0.3901219069957733
Validation loss = 0.40304335951805115
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 25.
Path 2 | total_timesteps 64.
Path 3 | total_timesteps 89.
Path 4 | total_timesteps 114.
Path 5 | total_timesteps 172.
Path 6 | total_timesteps 205.
Path 7 | total_timesteps 245.
Path 8 | total_timesteps 286.
Path 9 | total_timesteps 355.
Path 10 | total_timesteps 398.
Path 11 | total_timesteps 483.
Path 12 | total_timesteps 522.
Path 13 | total_timesteps 554.
Path 14 | total_timesteps 579.
Path 15 | total_timesteps 601.
Path 16 | total_timesteps 636.
Path 17 | total_timesteps 673.
Path 18 | total_timesteps 706.
Path 19 | total_timesteps 760.
Path 20 | total_timesteps 785.
Path 21 | total_timesteps 900.
Path 22 | total_timesteps 948.
Path 23 | total_timesteps 962.
Path 24 | total_timesteps 986.
Path 25 | total_timesteps 1027.
Path 26 | total_timesteps 1086.
Path 27 | total_timesteps 1138.
Path 28 | total_timesteps 1162.
Path 29 | total_timesteps 1186.
Path 30 | total_timesteps 1214.
Path 31 | total_timesteps 1267.
Path 32 | total_timesteps 1304.
Path 33 | total_timesteps 1353.
Path 34 | total_timesteps 1393.
Path 35 | total_timesteps 1430.
Path 36 | total_timesteps 1481.
Path 37 | total_timesteps 1544.
Path 38 | total_timesteps 1605.
Path 39 | total_timesteps 1650.
Path 40 | total_timesteps 1698.
Path 41 | total_timesteps 1763.
Path 42 | total_timesteps 1814.
Path 43 | total_timesteps 1853.
Path 44 | total_timesteps 1941.
Path 45 | total_timesteps 1973.
Path 46 | total_timesteps 2036.
Path 47 | total_timesteps 2064.
Path 48 | total_timesteps 2129.
Path 49 | total_timesteps 2183.
Path 50 | total_timesteps 2209.
Path 51 | total_timesteps 2236.
Path 52 | total_timesteps 2309.
Path 53 | total_timesteps 2374.
Path 54 | total_timesteps 2437.
Path 55 | total_timesteps 2474.
Path 56 | total_timesteps 2515.
Path 57 | total_timesteps 2548.
Path 58 | total_timesteps 2582.
Path 59 | total_timesteps 2633.
Path 60 | total_timesteps 2672.
Path 61 | total_timesteps 2700.
Path 62 | total_timesteps 2719.
Path 63 | total_timesteps 2769.
Path 64 | total_timesteps 2807.
Path 65 | total_timesteps 2841.
Path 66 | total_timesteps 2861.
Path 67 | total_timesteps 2895.
Path 68 | total_timesteps 2937.
Path 69 | total_timesteps 3006.
Path 70 | total_timesteps 3040.
Path 71 | total_timesteps 3096.
Path 72 | total_timesteps 3119.
Path 73 | total_timesteps 3149.
Path 74 | total_timesteps 3195.
Path 75 | total_timesteps 3233.
Path 76 | total_timesteps 3262.
Path 77 | total_timesteps 3299.
Path 78 | total_timesteps 3329.
Path 79 | total_timesteps 3360.
Path 80 | total_timesteps 3395.
Path 81 | total_timesteps 3436.
Path 82 | total_timesteps 3462.
Path 83 | total_timesteps 3509.
Path 84 | total_timesteps 3528.
Path 85 | total_timesteps 3553.
Path 86 | total_timesteps 3582.
Path 87 | total_timesteps 3612.
Path 88 | total_timesteps 3658.
Path 89 | total_timesteps 3688.
Path 90 | total_timesteps 3745.
Path 91 | total_timesteps 3792.
Path 92 | total_timesteps 3838.
Path 93 | total_timesteps 3883.
Path 94 | total_timesteps 3937.
Path 95 | total_timesteps 3967.
Path 96 | total_timesteps 4002.
Path 97 | total_timesteps 4055.
Path 98 | total_timesteps 4089.
Path 99 | total_timesteps 4157.
Path 100 | total_timesteps 4199.
Path 101 | total_timesteps 4222.
Path 102 | total_timesteps 4261.
Path 103 | total_timesteps 4307.
Path 104 | total_timesteps 4338.
Path 105 | total_timesteps 4378.
Path 106 | total_timesteps 4439.
Path 107 | total_timesteps 4473.
Path 108 | total_timesteps 4496.
Path 109 | total_timesteps 4589.
Path 110 | total_timesteps 4618.
Path 111 | total_timesteps 4637.
Path 112 | total_timesteps 4669.
Path 113 | total_timesteps 4700.
Path 114 | total_timesteps 4769.
Path 115 | total_timesteps 4810.
Path 116 | total_timesteps 4854.
Path 117 | total_timesteps 4903.
Path 118 | total_timesteps 4918.
Path 119 | total_timesteps 4956.
Path 120 | total_timesteps 4992.
Path 121 | total_timesteps 5021.
Path 122 | total_timesteps 5049.
Path 123 | total_timesteps 5073.
Path 124 | total_timesteps 5108.
Path 125 | total_timesteps 5201.
Path 126 | total_timesteps 5280.
Path 127 | total_timesteps 5314.
Path 128 | total_timesteps 5348.
Path 129 | total_timesteps 5374.
Path 130 | total_timesteps 5410.
Path 131 | total_timesteps 5438.
Path 132 | total_timesteps 5479.
Path 133 | total_timesteps 5580.
Path 134 | total_timesteps 5605.
Path 135 | total_timesteps 5653.
Path 136 | total_timesteps 5704.
Path 137 | total_timesteps 5743.
Path 138 | total_timesteps 5773.
Path 139 | total_timesteps 5802.
Path 140 | total_timesteps 5842.
Path 141 | total_timesteps 5894.
Path 142 | total_timesteps 5944.
Path 143 | total_timesteps 5968.
Path 144 | total_timesteps 5996.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.56    |
| Iteration     | 5        |
| MaximumReturn | 26.4     |
| MinimumReturn | -42.3    |
| TotalSamples  | 28127    |
----------------------------
itr #6 | 
Fitting dynamics.
Validation loss = 0.39649248123168945
Validation loss = 0.3975568115711212
Validation loss = 0.4087008833885193
Validation loss = 0.41347286105155945
Validation loss = 0.41938528418540955
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 20.
Path 2 | total_timesteps 41.
Path 3 | total_timesteps 150.
Path 4 | total_timesteps 192.
Path 5 | total_timesteps 232.
Path 6 | total_timesteps 296.
Path 7 | total_timesteps 342.
Path 8 | total_timesteps 401.
Path 9 | total_timesteps 439.
Path 10 | total_timesteps 478.
Path 11 | total_timesteps 509.
Path 12 | total_timesteps 537.
Path 13 | total_timesteps 569.
Path 14 | total_timesteps 607.
Path 15 | total_timesteps 631.
Path 16 | total_timesteps 671.
Path 17 | total_timesteps 691.
Path 18 | total_timesteps 738.
Path 19 | total_timesteps 762.
Path 20 | total_timesteps 801.
Path 21 | total_timesteps 845.
Path 22 | total_timesteps 903.
Path 23 | total_timesteps 919.
Path 24 | total_timesteps 939.
Path 25 | total_timesteps 970.
Path 26 | total_timesteps 1024.
Path 27 | total_timesteps 1068.
Path 28 | total_timesteps 1083.
Path 29 | total_timesteps 1114.
Path 30 | total_timesteps 1140.
Path 31 | total_timesteps 1164.
Path 32 | total_timesteps 1187.
Path 33 | total_timesteps 1213.
Path 34 | total_timesteps 1236.
Path 35 | total_timesteps 1288.
Path 36 | total_timesteps 1303.
Path 37 | total_timesteps 1319.
Path 38 | total_timesteps 1353.
Path 39 | total_timesteps 1382.
Path 40 | total_timesteps 1404.
Path 41 | total_timesteps 1435.
Path 42 | total_timesteps 1516.
Path 43 | total_timesteps 1551.
Path 44 | total_timesteps 1591.
Path 45 | total_timesteps 1621.
Path 46 | total_timesteps 1654.
Path 47 | total_timesteps 1682.
Path 48 | total_timesteps 1695.
Path 49 | total_timesteps 1732.
Path 50 | total_timesteps 1756.
Path 51 | total_timesteps 1805.
Path 52 | total_timesteps 1843.
Path 53 | total_timesteps 1900.
Path 54 | total_timesteps 1910.
Path 55 | total_timesteps 1939.
Path 56 | total_timesteps 1983.
Path 57 | total_timesteps 2010.
Path 58 | total_timesteps 2046.
Path 59 | total_timesteps 2070.
Path 60 | total_timesteps 2094.
Path 61 | total_timesteps 2144.
Path 62 | total_timesteps 2169.
Path 63 | total_timesteps 2226.
Path 64 | total_timesteps 2249.
Path 65 | total_timesteps 2272.
Path 66 | total_timesteps 2306.
Path 67 | total_timesteps 2328.
Path 68 | total_timesteps 2458.
Path 69 | total_timesteps 2476.
Path 70 | total_timesteps 2511.
Path 71 | total_timesteps 2547.
Path 72 | total_timesteps 2578.
Path 73 | total_timesteps 2610.
Path 74 | total_timesteps 2630.
Path 75 | total_timesteps 2649.
Path 76 | total_timesteps 2676.
Path 77 | total_timesteps 2745.
Path 78 | total_timesteps 2776.
Path 79 | total_timesteps 2822.
Path 80 | total_timesteps 2851.
Path 81 | total_timesteps 2892.
Path 82 | total_timesteps 2940.
Path 83 | total_timesteps 2962.
Path 84 | total_timesteps 3000.
Path 85 | total_timesteps 3029.
Path 86 | total_timesteps 3047.
Path 87 | total_timesteps 3082.
Path 88 | total_timesteps 3102.
Path 89 | total_timesteps 3126.
Path 90 | total_timesteps 3154.
Path 91 | total_timesteps 3196.
Path 92 | total_timesteps 3228.
Path 93 | total_timesteps 3255.
Path 94 | total_timesteps 3274.
Path 95 | total_timesteps 3307.
Path 96 | total_timesteps 3360.
Path 97 | total_timesteps 3384.
Path 98 | total_timesteps 3460.
Path 99 | total_timesteps 3520.
Path 100 | total_timesteps 3549.
Path 101 | total_timesteps 3562.
Path 102 | total_timesteps 3574.
Path 103 | total_timesteps 3598.
Path 104 | total_timesteps 3621.
Path 105 | total_timesteps 3697.
Path 106 | total_timesteps 3715.
Path 107 | total_timesteps 3753.
Path 108 | total_timesteps 3811.
Path 109 | total_timesteps 3842.
Path 110 | total_timesteps 3886.
Path 111 | total_timesteps 3923.
Path 112 | total_timesteps 3951.
Path 113 | total_timesteps 3973.
Path 114 | total_timesteps 4016.
Path 115 | total_timesteps 4059.
Path 116 | total_timesteps 4092.
Path 117 | total_timesteps 4129.
Path 118 | total_timesteps 4155.
Path 119 | total_timesteps 4180.
Path 120 | total_timesteps 4263.
Path 121 | total_timesteps 4304.
Path 122 | total_timesteps 4351.
Path 123 | total_timesteps 4393.
Path 124 | total_timesteps 4437.
Path 125 | total_timesteps 4463.
Path 126 | total_timesteps 4507.
Path 127 | total_timesteps 4539.
Path 128 | total_timesteps 4563.
Path 129 | total_timesteps 4600.
Path 130 | total_timesteps 4642.
Path 131 | total_timesteps 4678.
Path 132 | total_timesteps 4690.
Path 133 | total_timesteps 4713.
Path 134 | total_timesteps 4757.
Path 135 | total_timesteps 4815.
Path 136 | total_timesteps 4858.
Path 137 | total_timesteps 4875.
Path 138 | total_timesteps 4891.
Path 139 | total_timesteps 4911.
Path 140 | total_timesteps 4930.
Path 141 | total_timesteps 4978.
Path 142 | total_timesteps 5014.
Path 143 | total_timesteps 5042.
Path 144 | total_timesteps 5091.
Path 145 | total_timesteps 5152.
Path 146 | total_timesteps 5175.
Path 147 | total_timesteps 5205.
Path 148 | total_timesteps 5255.
Path 149 | total_timesteps 5276.
Path 150 | total_timesteps 5302.
Path 151 | total_timesteps 5338.
Path 152 | total_timesteps 5380.
Path 153 | total_timesteps 5465.
Path 154 | total_timesteps 5491.
Path 155 | total_timesteps 5521.
Path 156 | total_timesteps 5562.
Path 157 | total_timesteps 5587.
Path 158 | total_timesteps 5656.
Path 159 | total_timesteps 5707.
Path 160 | total_timesteps 5736.
Path 161 | total_timesteps 5767.
Path 162 | total_timesteps 5810.
Path 163 | total_timesteps 5835.
Path 164 | total_timesteps 5869.
Path 165 | total_timesteps 5889.
Path 166 | total_timesteps 5923.
Path 167 | total_timesteps 5971.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.23    |
| Iteration     | 6        |
| MaximumReturn | 130      |
| MinimumReturn | -36.2    |
| TotalSamples  | 32133    |
----------------------------
itr #7 | 
Fitting dynamics.
Validation loss = 0.40040090680122375
Validation loss = 0.4042031168937683
Validation loss = 0.41111576557159424
Validation loss = 0.4180215299129486
Validation loss = 0.4223911464214325
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 135.
Path 2 | total_timesteps 246.
Path 3 | total_timesteps 280.
Path 4 | total_timesteps 314.
Path 5 | total_timesteps 356.
Path 6 | total_timesteps 371.
Path 7 | total_timesteps 461.
Path 8 | total_timesteps 503.
Path 9 | total_timesteps 531.
Path 10 | total_timesteps 560.
Path 11 | total_timesteps 584.
Path 12 | total_timesteps 615.
Path 13 | total_timesteps 637.
Path 14 | total_timesteps 664.
Path 15 | total_timesteps 681.
Path 16 | total_timesteps 701.
Path 17 | total_timesteps 741.
Path 18 | total_timesteps 765.
Path 19 | total_timesteps 801.
Path 20 | total_timesteps 836.
Path 21 | total_timesteps 879.
Path 22 | total_timesteps 908.
Path 23 | total_timesteps 972.
Path 24 | total_timesteps 1028.
Path 25 | total_timesteps 1066.
Path 26 | total_timesteps 1119.
Path 27 | total_timesteps 1143.
Path 28 | total_timesteps 1163.
Path 29 | total_timesteps 1216.
Path 30 | total_timesteps 1236.
Path 31 | total_timesteps 1264.
Path 32 | total_timesteps 1287.
Path 33 | total_timesteps 1305.
Path 34 | total_timesteps 1328.
Path 35 | total_timesteps 1351.
Path 36 | total_timesteps 1373.
Path 37 | total_timesteps 1460.
Path 38 | total_timesteps 1503.
Path 39 | total_timesteps 1522.
Path 40 | total_timesteps 1556.
Path 41 | total_timesteps 1585.
Path 42 | total_timesteps 1616.
Path 43 | total_timesteps 1640.
Path 44 | total_timesteps 1669.
Path 45 | total_timesteps 1698.
Path 46 | total_timesteps 1757.
Path 47 | total_timesteps 1790.
Path 48 | total_timesteps 1834.
Path 49 | total_timesteps 1858.
Path 50 | total_timesteps 1910.
Path 51 | total_timesteps 1938.
Path 52 | total_timesteps 1977.
Path 53 | total_timesteps 2002.
Path 54 | total_timesteps 2037.
Path 55 | total_timesteps 2055.
Path 56 | total_timesteps 2096.
Path 57 | total_timesteps 2120.
Path 58 | total_timesteps 2160.
Path 59 | total_timesteps 2191.
Path 60 | total_timesteps 2220.
Path 61 | total_timesteps 2270.
Path 62 | total_timesteps 2286.
Path 63 | total_timesteps 2312.
Path 64 | total_timesteps 2344.
Path 65 | total_timesteps 2380.
Path 66 | total_timesteps 2417.
Path 67 | total_timesteps 2453.
Path 68 | total_timesteps 2487.
Path 69 | total_timesteps 2518.
Path 70 | total_timesteps 2568.
Path 71 | total_timesteps 2593.
Path 72 | total_timesteps 2617.
Path 73 | total_timesteps 2649.
Path 74 | total_timesteps 2699.
Path 75 | total_timesteps 2747.
Path 76 | total_timesteps 2818.
Path 77 | total_timesteps 2855.
Path 78 | total_timesteps 2873.
Path 79 | total_timesteps 2925.
Path 80 | total_timesteps 2954.
Path 81 | total_timesteps 2976.
Path 82 | total_timesteps 3055.
Path 83 | total_timesteps 3097.
Path 84 | total_timesteps 3120.
Path 85 | total_timesteps 3209.
Path 86 | total_timesteps 3235.
Path 87 | total_timesteps 3273.
Path 88 | total_timesteps 3358.
Path 89 | total_timesteps 3408.
Path 90 | total_timesteps 3446.
Path 91 | total_timesteps 3501.
Path 92 | total_timesteps 3544.
Path 93 | total_timesteps 3558.
Path 94 | total_timesteps 3609.
Path 95 | total_timesteps 3658.
Path 96 | total_timesteps 3701.
Path 97 | total_timesteps 3735.
Path 98 | total_timesteps 3817.
Path 99 | total_timesteps 3834.
Path 100 | total_timesteps 3868.
Path 101 | total_timesteps 3887.
Path 102 | total_timesteps 3916.
Path 103 | total_timesteps 3929.
Path 104 | total_timesteps 3954.
Path 105 | total_timesteps 3990.
Path 106 | total_timesteps 4052.
Path 107 | total_timesteps 4084.
Path 108 | total_timesteps 4120.
Path 109 | total_timesteps 4142.
Path 110 | total_timesteps 4170.
Path 111 | total_timesteps 4191.
Path 112 | total_timesteps 4224.
Path 113 | total_timesteps 4250.
Path 114 | total_timesteps 4296.
Path 115 | total_timesteps 4323.
Path 116 | total_timesteps 4350.
Path 117 | total_timesteps 4386.
Path 118 | total_timesteps 4411.
Path 119 | total_timesteps 4430.
Path 120 | total_timesteps 4466.
Path 121 | total_timesteps 4500.
Path 122 | total_timesteps 4531.
Path 123 | total_timesteps 4571.
Path 124 | total_timesteps 4624.
Path 125 | total_timesteps 4670.
Path 126 | total_timesteps 4688.
Path 127 | total_timesteps 4739.
Path 128 | total_timesteps 4757.
Path 129 | total_timesteps 4774.
Path 130 | total_timesteps 4804.
Path 131 | total_timesteps 4883.
Path 132 | total_timesteps 4914.
Path 133 | total_timesteps 4945.
Path 134 | total_timesteps 5014.
Path 135 | total_timesteps 5085.
Path 136 | total_timesteps 5104.
Path 137 | total_timesteps 5140.
Path 138 | total_timesteps 5195.
Path 139 | total_timesteps 5242.
Path 140 | total_timesteps 5265.
Path 141 | total_timesteps 5289.
Path 142 | total_timesteps 5315.
Path 143 | total_timesteps 5339.
Path 144 | total_timesteps 5405.
Path 145 | total_timesteps 5447.
Path 146 | total_timesteps 5472.
Path 147 | total_timesteps 5502.
Path 148 | total_timesteps 5534.
Path 149 | total_timesteps 5570.
Path 150 | total_timesteps 5593.
Path 151 | total_timesteps 5645.
Path 152 | total_timesteps 5681.
Path 153 | total_timesteps 5747.
Path 154 | total_timesteps 5801.
Path 155 | total_timesteps 5847.
Path 156 | total_timesteps 5901.
Path 157 | total_timesteps 5933.
Path 158 | total_timesteps 5947.
Path 159 | total_timesteps 5978.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.26    |
| Iteration     | 7        |
| MaximumReturn | 149      |
| MinimumReturn | -23.5    |
| TotalSamples  | 36151    |
----------------------------
itr #8 | 
Fitting dynamics.
Validation loss = 0.4044329524040222
Validation loss = 0.41541919112205505
Validation loss = 0.4187876880168915
Validation loss = 0.4287463426589966
Validation loss = 0.41959404945373535
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 57.
Path 2 | total_timesteps 119.
Path 3 | total_timesteps 141.
Path 4 | total_timesteps 195.
Path 5 | total_timesteps 240.
Path 6 | total_timesteps 363.
Path 7 | total_timesteps 420.
Path 8 | total_timesteps 484.
Path 9 | total_timesteps 559.
Path 10 | total_timesteps 651.
Path 11 | total_timesteps 757.
Path 12 | total_timesteps 821.
Path 13 | total_timesteps 892.
Path 14 | total_timesteps 926.
Path 15 | total_timesteps 974.
Path 16 | total_timesteps 1037.
Path 17 | total_timesteps 1130.
Path 18 | total_timesteps 1163.
Path 19 | total_timesteps 1259.
Path 20 | total_timesteps 1311.
Path 21 | total_timesteps 1410.
Path 22 | total_timesteps 1456.
Path 23 | total_timesteps 1492.
Path 24 | total_timesteps 1509.
Path 25 | total_timesteps 1547.
Path 26 | total_timesteps 1620.
Path 27 | total_timesteps 1640.
Path 28 | total_timesteps 1739.
Path 29 | total_timesteps 1827.
Path 30 | total_timesteps 1865.
Path 31 | total_timesteps 1972.
Path 32 | total_timesteps 2030.
Path 33 | total_timesteps 2117.
Path 34 | total_timesteps 2217.
Path 35 | total_timesteps 2282.
Path 36 | total_timesteps 2357.
Path 37 | total_timesteps 2412.
Path 38 | total_timesteps 2540.
Path 39 | total_timesteps 2571.
Path 40 | total_timesteps 2636.
Path 41 | total_timesteps 2751.
Path 42 | total_timesteps 2795.
Path 43 | total_timesteps 2857.
Path 44 | total_timesteps 2885.
Path 45 | total_timesteps 2922.
Path 46 | total_timesteps 2957.
Path 47 | total_timesteps 2997.
Path 48 | total_timesteps 3104.
Path 49 | total_timesteps 3199.
Path 50 | total_timesteps 3365.
Path 51 | total_timesteps 3485.
Path 52 | total_timesteps 3605.
Path 53 | total_timesteps 3637.
Path 54 | total_timesteps 3674.
Path 55 | total_timesteps 3704.
Path 56 | total_timesteps 3750.
Path 57 | total_timesteps 3830.
Path 58 | total_timesteps 3894.
Path 59 | total_timesteps 3953.
Path 60 | total_timesteps 4078.
Path 61 | total_timesteps 4171.
Path 62 | total_timesteps 4244.
Path 63 | total_timesteps 4340.
Path 64 | total_timesteps 4403.
Path 65 | total_timesteps 4530.
Path 66 | total_timesteps 4619.
Path 67 | total_timesteps 4670.
Path 68 | total_timesteps 4758.
Path 69 | total_timesteps 4814.
Path 70 | total_timesteps 4892.
Path 71 | total_timesteps 4924.
Path 72 | total_timesteps 4992.
Path 73 | total_timesteps 5029.
Path 74 | total_timesteps 5073.
Path 75 | total_timesteps 5114.
Path 76 | total_timesteps 5163.
Path 77 | total_timesteps 5255.
Path 78 | total_timesteps 5327.
Path 79 | total_timesteps 5387.
Path 80 | total_timesteps 5484.
Path 81 | total_timesteps 5563.
Path 82 | total_timesteps 5641.
Path 83 | total_timesteps 5709.
Path 84 | total_timesteps 5752.
Path 85 | total_timesteps 5781.
Path 86 | total_timesteps 5860.
Path 87 | total_timesteps 5906.
Path 88 | total_timesteps 5959.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.443   |
| Iteration     | 8        |
| MaximumReturn | 142      |
| MinimumReturn | -78.8    |
| TotalSamples  | 40184    |
----------------------------
itr #9 | 
Fitting dynamics.
Validation loss = 0.41375741362571716
Validation loss = 0.42814865708351135
Validation loss = 0.4271507263183594
Validation loss = 0.4363647401332855
Validation loss = 0.43241000175476074
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 85.
Path 2 | total_timesteps 165.
Path 3 | total_timesteps 263.
Path 4 | total_timesteps 281.
Path 5 | total_timesteps 369.
Path 6 | total_timesteps 429.
Path 7 | total_timesteps 473.
Path 8 | total_timesteps 519.
Path 9 | total_timesteps 567.
Path 10 | total_timesteps 697.
Path 11 | total_timesteps 738.
Path 12 | total_timesteps 828.
Path 13 | total_timesteps 899.
Path 14 | total_timesteps 958.
Path 15 | total_timesteps 991.
Path 16 | total_timesteps 1073.
Path 17 | total_timesteps 1120.
Path 18 | total_timesteps 1169.
Path 19 | total_timesteps 1246.
Path 20 | total_timesteps 1310.
Path 21 | total_timesteps 1364.
Path 22 | total_timesteps 1486.
Path 23 | total_timesteps 1563.
Path 24 | total_timesteps 1628.
Path 25 | total_timesteps 1691.
Path 26 | total_timesteps 1735.
Path 27 | total_timesteps 1836.
Path 28 | total_timesteps 1869.
Path 29 | total_timesteps 1905.
Path 30 | total_timesteps 2000.
Path 31 | total_timesteps 2082.
Path 32 | total_timesteps 2134.
Path 33 | total_timesteps 2272.
Path 34 | total_timesteps 2373.
Path 35 | total_timesteps 2497.
Path 36 | total_timesteps 2589.
Path 37 | total_timesteps 2630.
Path 38 | total_timesteps 2674.
Path 39 | total_timesteps 2725.
Path 40 | total_timesteps 2827.
Path 41 | total_timesteps 2880.
Path 42 | total_timesteps 2956.
Path 43 | total_timesteps 3038.
Path 44 | total_timesteps 3079.
Path 45 | total_timesteps 3159.
Path 46 | total_timesteps 3190.
Path 47 | total_timesteps 3246.
Path 48 | total_timesteps 3296.
Path 49 | total_timesteps 3391.
Path 50 | total_timesteps 3463.
Path 51 | total_timesteps 3518.
Path 52 | total_timesteps 3551.
Path 53 | total_timesteps 3634.
Path 54 | total_timesteps 3664.
Path 55 | total_timesteps 3731.
Path 56 | total_timesteps 3773.
Path 57 | total_timesteps 3809.
Path 58 | total_timesteps 3884.
Path 59 | total_timesteps 3950.
Path 60 | total_timesteps 4072.
Path 61 | total_timesteps 4145.
Path 62 | total_timesteps 4189.
Path 63 | total_timesteps 4278.
Path 64 | total_timesteps 4374.
Path 65 | total_timesteps 4443.
Path 66 | total_timesteps 4493.
Path 67 | total_timesteps 4551.
Path 68 | total_timesteps 4597.
Path 69 | total_timesteps 4681.
Path 70 | total_timesteps 4732.
Path 71 | total_timesteps 4817.
Path 72 | total_timesteps 4846.
Path 73 | total_timesteps 4904.
Path 74 | total_timesteps 4959.
Path 75 | total_timesteps 4995.
Path 76 | total_timesteps 5038.
Path 77 | total_timesteps 5109.
Path 78 | total_timesteps 5238.
Path 79 | total_timesteps 5338.
Path 80 | total_timesteps 5397.
Path 81 | total_timesteps 5515.
Path 82 | total_timesteps 5627.
Path 83 | total_timesteps 5693.
Path 84 | total_timesteps 5780.
Path 85 | total_timesteps 5808.
Path 86 | total_timesteps 5897.
Path 87 | total_timesteps 5998.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10.1    |
| Iteration     | 9        |
| MaximumReturn | 44       |
| MinimumReturn | -67.3    |
| TotalSamples  | 44233    |
----------------------------
itr #10 | 
Fitting dynamics.
Validation loss = 0.4224018156528473
Validation loss = 0.432956337928772
Validation loss = 0.43715640902519226
Validation loss = 0.4391952455043793
Validation loss = 0.4545269310474396
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 82.
Path 2 | total_timesteps 219.
Path 3 | total_timesteps 260.
Path 4 | total_timesteps 363.
Path 5 | total_timesteps 499.
Path 6 | total_timesteps 601.
Path 7 | total_timesteps 649.
Path 8 | total_timesteps 721.
Path 9 | total_timesteps 752.
Path 10 | total_timesteps 792.
Path 11 | total_timesteps 874.
Path 12 | total_timesteps 948.
Path 13 | total_timesteps 1003.
Path 14 | total_timesteps 1087.
Path 15 | total_timesteps 1172.
Path 16 | total_timesteps 1282.
Path 17 | total_timesteps 1370.
Path 18 | total_timesteps 1431.
Path 19 | total_timesteps 1477.
Path 20 | total_timesteps 1520.
Path 21 | total_timesteps 1657.
Path 22 | total_timesteps 1759.
Path 23 | total_timesteps 1785.
Path 24 | total_timesteps 1876.
Path 25 | total_timesteps 2011.
Path 26 | total_timesteps 2213.
Path 27 | total_timesteps 2274.
Path 28 | total_timesteps 2358.
Path 29 | total_timesteps 2431.
Path 30 | total_timesteps 2477.
Path 31 | total_timesteps 2548.
Path 32 | total_timesteps 2612.
Path 33 | total_timesteps 2712.
Path 34 | total_timesteps 2774.
Path 35 | total_timesteps 2823.
Path 36 | total_timesteps 2933.
Path 37 | total_timesteps 2962.
Path 38 | total_timesteps 3073.
Path 39 | total_timesteps 3129.
Path 40 | total_timesteps 3164.
Path 41 | total_timesteps 3256.
Path 42 | total_timesteps 3350.
Path 43 | total_timesteps 3382.
Path 44 | total_timesteps 3415.
Path 45 | total_timesteps 3524.
Path 46 | total_timesteps 3572.
Path 47 | total_timesteps 3624.
Path 48 | total_timesteps 3735.
Path 49 | total_timesteps 3813.
Path 50 | total_timesteps 3897.
Path 51 | total_timesteps 3955.
Path 52 | total_timesteps 4038.
Path 53 | total_timesteps 4158.
Path 54 | total_timesteps 4212.
Path 55 | total_timesteps 4258.
Path 56 | total_timesteps 4355.
Path 57 | total_timesteps 4412.
Path 58 | total_timesteps 4566.
Path 59 | total_timesteps 4612.
Path 60 | total_timesteps 4696.
Path 61 | total_timesteps 4851.
Path 62 | total_timesteps 4901.
Path 63 | total_timesteps 5013.
Path 64 | total_timesteps 5122.
Path 65 | total_timesteps 5164.
Path 66 | total_timesteps 5204.
Path 67 | total_timesteps 5318.
Path 68 | total_timesteps 5400.
Path 69 | total_timesteps 5477.
Path 70 | total_timesteps 5523.
Path 71 | total_timesteps 5620.
Path 72 | total_timesteps 5697.
Path 73 | total_timesteps 5759.
Path 74 | total_timesteps 5806.
Path 75 | total_timesteps 5855.
Path 76 | total_timesteps 5934.
Path 77 | total_timesteps 5976.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.86    |
| Iteration     | 10       |
| MaximumReturn | 215      |
| MinimumReturn | -65      |
| TotalSamples  | 48261    |
----------------------------
itr #11 | 
Fitting dynamics.
Validation loss = 0.4306145906448364
Validation loss = 0.44138309359550476
Validation loss = 0.44121798872947693
Validation loss = 0.4457026422023773
Validation loss = 0.4476642906665802
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 67.
Path 2 | total_timesteps 127.
Path 3 | total_timesteps 277.
Path 4 | total_timesteps 323.
Path 5 | total_timesteps 402.
Path 6 | total_timesteps 456.
Path 7 | total_timesteps 519.
Path 8 | total_timesteps 677.
Path 9 | total_timesteps 787.
Path 10 | total_timesteps 879.
Path 11 | total_timesteps 935.
Path 12 | total_timesteps 1020.
Path 13 | total_timesteps 1086.
Path 14 | total_timesteps 1131.
Path 15 | total_timesteps 1214.
Path 16 | total_timesteps 1293.
Path 17 | total_timesteps 1382.
Path 18 | total_timesteps 1469.
Path 19 | total_timesteps 1547.
Path 20 | total_timesteps 1627.
Path 21 | total_timesteps 1649.
Path 22 | total_timesteps 1735.
Path 23 | total_timesteps 1844.
Path 24 | total_timesteps 1938.
Path 25 | total_timesteps 2018.
Path 26 | total_timesteps 2112.
Path 27 | total_timesteps 2185.
Path 28 | total_timesteps 2209.
Path 29 | total_timesteps 2245.
Path 30 | total_timesteps 2349.
Path 31 | total_timesteps 2397.
Path 32 | total_timesteps 2497.
Path 33 | total_timesteps 2558.
Path 34 | total_timesteps 2602.
Path 35 | total_timesteps 2647.
Path 36 | total_timesteps 2733.
Path 37 | total_timesteps 2764.
Path 38 | total_timesteps 2832.
Path 39 | total_timesteps 2926.
Path 40 | total_timesteps 2972.
Path 41 | total_timesteps 3001.
Path 42 | total_timesteps 3084.
Path 43 | total_timesteps 3185.
Path 44 | total_timesteps 3223.
Path 45 | total_timesteps 3314.
Path 46 | total_timesteps 3372.
Path 47 | total_timesteps 3423.
Path 48 | total_timesteps 3539.
Path 49 | total_timesteps 3581.
Path 50 | total_timesteps 3628.
Path 51 | total_timesteps 3659.
Path 52 | total_timesteps 3734.
Path 53 | total_timesteps 3812.
Path 54 | total_timesteps 3843.
Path 55 | total_timesteps 3945.
Path 56 | total_timesteps 4071.
Path 57 | total_timesteps 4141.
Path 58 | total_timesteps 4238.
Path 59 | total_timesteps 4367.
Path 60 | total_timesteps 4450.
Path 61 | total_timesteps 4478.
Path 62 | total_timesteps 4589.
Path 63 | total_timesteps 4681.
Path 64 | total_timesteps 4770.
Path 65 | total_timesteps 4922.
Path 66 | total_timesteps 4969.
Path 67 | total_timesteps 5064.
Path 68 | total_timesteps 5122.
Path 69 | total_timesteps 5213.
Path 70 | total_timesteps 5303.
Path 71 | total_timesteps 5360.
Path 72 | total_timesteps 5440.
Path 73 | total_timesteps 5521.
Path 74 | total_timesteps 5602.
Path 75 | total_timesteps 5658.
Path 76 | total_timesteps 5800.
Path 77 | total_timesteps 5850.
Path 78 | total_timesteps 5901.
Path 79 | total_timesteps 5954.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.4    |
| Iteration     | 11       |
| MaximumReturn | 119      |
| MinimumReturn | -77.1    |
| TotalSamples  | 52274    |
----------------------------
itr #12 | 
Fitting dynamics.
Validation loss = 0.43287229537963867
Validation loss = 0.44899988174438477
Validation loss = 0.45034152269363403
Validation loss = 0.4546954929828644
Validation loss = 0.45607060194015503
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 162.
Path 2 | total_timesteps 223.
Path 3 | total_timesteps 293.
Path 4 | total_timesteps 333.
Path 5 | total_timesteps 414.
Path 6 | total_timesteps 512.
Path 7 | total_timesteps 597.
Path 8 | total_timesteps 682.
Path 9 | total_timesteps 731.
Path 10 | total_timesteps 815.
Path 11 | total_timesteps 968.
Path 12 | total_timesteps 1072.
Path 13 | total_timesteps 1122.
Path 14 | total_timesteps 1245.
Path 15 | total_timesteps 1291.
Path 16 | total_timesteps 1359.
Path 17 | total_timesteps 1406.
Path 18 | total_timesteps 1456.
Path 19 | total_timesteps 1498.
Path 20 | total_timesteps 1552.
Path 21 | total_timesteps 1583.
Path 22 | total_timesteps 1676.
Path 23 | total_timesteps 1730.
Path 24 | total_timesteps 1788.
Path 25 | total_timesteps 1897.
Path 26 | total_timesteps 1992.
Path 27 | total_timesteps 2061.
Path 28 | total_timesteps 2136.
Path 29 | total_timesteps 2189.
Path 30 | total_timesteps 2274.
Path 31 | total_timesteps 2315.
Path 32 | total_timesteps 2410.
Path 33 | total_timesteps 2469.
Path 34 | total_timesteps 2529.
Path 35 | total_timesteps 2587.
Path 36 | total_timesteps 2642.
Path 37 | total_timesteps 2702.
Path 38 | total_timesteps 2769.
Path 39 | total_timesteps 2841.
Path 40 | total_timesteps 2881.
Path 41 | total_timesteps 2997.
Path 42 | total_timesteps 3070.
Path 43 | total_timesteps 3184.
Path 44 | total_timesteps 3250.
Path 45 | total_timesteps 3338.
Path 46 | total_timesteps 3395.
Path 47 | total_timesteps 3480.
Path 48 | total_timesteps 3514.
Path 49 | total_timesteps 3633.
Path 50 | total_timesteps 3681.
Path 51 | total_timesteps 3731.
Path 52 | total_timesteps 3773.
Path 53 | total_timesteps 3834.
Path 54 | total_timesteps 3924.
Path 55 | total_timesteps 3972.
Path 56 | total_timesteps 4046.
Path 57 | total_timesteps 4131.
Path 58 | total_timesteps 4146.
Path 59 | total_timesteps 4177.
Path 60 | total_timesteps 4270.
Path 61 | total_timesteps 4333.
Path 62 | total_timesteps 4420.
Path 63 | total_timesteps 4476.
Path 64 | total_timesteps 4518.
Path 65 | total_timesteps 4605.
Path 66 | total_timesteps 4651.
Path 67 | total_timesteps 4742.
Path 68 | total_timesteps 4787.
Path 69 | total_timesteps 4868.
Path 70 | total_timesteps 4913.
Path 71 | total_timesteps 5132.
Path 72 | total_timesteps 5262.
Path 73 | total_timesteps 5320.
Path 74 | total_timesteps 5377.
Path 75 | total_timesteps 5409.
Path 76 | total_timesteps 5445.
Path 77 | total_timesteps 5524.
Path 78 | total_timesteps 5614.
Path 79 | total_timesteps 5678.
Path 80 | total_timesteps 5736.
Path 81 | total_timesteps 5776.
Path 82 | total_timesteps 5794.
Path 83 | total_timesteps 5819.
Path 84 | total_timesteps 5864.
Path 85 | total_timesteps 5932.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.6    |
| Iteration     | 12       |
| MaximumReturn | 73.2     |
| MinimumReturn | -79.9    |
| TotalSamples  | 56295    |
----------------------------
itr #13 | 
Fitting dynamics.
Validation loss = 0.4341445565223694
Validation loss = 0.4464735686779022
Validation loss = 0.45838111639022827
Validation loss = 0.45164936780929565
Validation loss = 0.45766305923461914
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 53.
Path 2 | total_timesteps 130.
Path 3 | total_timesteps 189.
Path 4 | total_timesteps 238.
Path 5 | total_timesteps 340.
Path 6 | total_timesteps 394.
Path 7 | total_timesteps 477.
Path 8 | total_timesteps 531.
Path 9 | total_timesteps 560.
Path 10 | total_timesteps 661.
Path 11 | total_timesteps 769.
Path 12 | total_timesteps 823.
Path 13 | total_timesteps 876.
Path 14 | total_timesteps 930.
Path 15 | total_timesteps 1021.
Path 16 | total_timesteps 1133.
Path 17 | total_timesteps 1197.
Path 18 | total_timesteps 1278.
Path 19 | total_timesteps 1371.
Path 20 | total_timesteps 1398.
Path 21 | total_timesteps 1438.
Path 22 | total_timesteps 1575.
Path 23 | total_timesteps 1656.
Path 24 | total_timesteps 1716.
Path 25 | total_timesteps 1765.
Path 26 | total_timesteps 1823.
Path 27 | total_timesteps 1884.
Path 28 | total_timesteps 1987.
Path 29 | total_timesteps 2078.
Path 30 | total_timesteps 2170.
Path 31 | total_timesteps 2243.
Path 32 | total_timesteps 2276.
Path 33 | total_timesteps 2365.
Path 34 | total_timesteps 2413.
Path 35 | total_timesteps 2450.
Path 36 | total_timesteps 2495.
Path 37 | total_timesteps 2540.
Path 38 | total_timesteps 2625.
Path 39 | total_timesteps 2737.
Path 40 | total_timesteps 2805.
Path 41 | total_timesteps 2870.
Path 42 | total_timesteps 2935.
Path 43 | total_timesteps 2979.
Path 44 | total_timesteps 3052.
Path 45 | total_timesteps 3111.
Path 46 | total_timesteps 3182.
Path 47 | total_timesteps 3200.
Path 48 | total_timesteps 3243.
Path 49 | total_timesteps 3320.
Path 50 | total_timesteps 3493.
Path 51 | total_timesteps 3566.
Path 52 | total_timesteps 3591.
Path 53 | total_timesteps 3683.
Path 54 | total_timesteps 3769.
Path 55 | total_timesteps 3805.
Path 56 | total_timesteps 3857.
Path 57 | total_timesteps 3923.
Path 58 | total_timesteps 3999.
Path 59 | total_timesteps 4082.
Path 60 | total_timesteps 4131.
Path 61 | total_timesteps 4163.
Path 62 | total_timesteps 4213.
Path 63 | total_timesteps 4311.
Path 64 | total_timesteps 4346.
Path 65 | total_timesteps 4423.
Path 66 | total_timesteps 4468.
Path 67 | total_timesteps 4542.
Path 68 | total_timesteps 4624.
Path 69 | total_timesteps 4670.
Path 70 | total_timesteps 4755.
Path 71 | total_timesteps 4872.
Path 72 | total_timesteps 4900.
Path 73 | total_timesteps 4978.
Path 74 | total_timesteps 5064.
Path 75 | total_timesteps 5114.
Path 76 | total_timesteps 5195.
Path 77 | total_timesteps 5265.
Path 78 | total_timesteps 5320.
Path 79 | total_timesteps 5433.
Path 80 | total_timesteps 5512.
Path 81 | total_timesteps 5609.
Path 82 | total_timesteps 5624.
Path 83 | total_timesteps 5686.
Path 84 | total_timesteps 5779.
Path 85 | total_timesteps 5800.
Path 86 | total_timesteps 5892.
Path 87 | total_timesteps 5957.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -13.2    |
| Iteration     | 13       |
| MaximumReturn | 199      |
| MinimumReturn | -86.9    |
| TotalSamples  | 60304    |
----------------------------
itr #14 | 
Fitting dynamics.
Validation loss = 0.44348904490470886
Validation loss = 0.45209211111068726
Validation loss = 0.45839372277259827
Validation loss = 0.4602191150188446
Validation loss = 0.46154558658599854
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 97.
Path 2 | total_timesteps 163.
Path 3 | total_timesteps 221.
Path 4 | total_timesteps 245.
Path 5 | total_timesteps 317.
Path 6 | total_timesteps 417.
Path 7 | total_timesteps 495.
Path 8 | total_timesteps 581.
Path 9 | total_timesteps 646.
Path 10 | total_timesteps 758.
Path 11 | total_timesteps 835.
Path 12 | total_timesteps 881.
Path 13 | total_timesteps 944.
Path 14 | total_timesteps 980.
Path 15 | total_timesteps 1023.
Path 16 | total_timesteps 1068.
Path 17 | total_timesteps 1147.
Path 18 | total_timesteps 1232.
Path 19 | total_timesteps 1312.
Path 20 | total_timesteps 1352.
Path 21 | total_timesteps 1415.
Path 22 | total_timesteps 1463.
Path 23 | total_timesteps 1542.
Path 24 | total_timesteps 1615.
Path 25 | total_timesteps 1700.
Path 26 | total_timesteps 1844.
Path 27 | total_timesteps 1891.
Path 28 | total_timesteps 1953.
Path 29 | total_timesteps 2001.
Path 30 | total_timesteps 2063.
Path 31 | total_timesteps 2133.
Path 32 | total_timesteps 2173.
Path 33 | total_timesteps 2219.
Path 34 | total_timesteps 2363.
Path 35 | total_timesteps 2457.
Path 36 | total_timesteps 2529.
Path 37 | total_timesteps 2595.
Path 38 | total_timesteps 2661.
Path 39 | total_timesteps 2806.
Path 40 | total_timesteps 2841.
Path 41 | total_timesteps 2986.
Path 42 | total_timesteps 3074.
Path 43 | total_timesteps 3138.
Path 44 | total_timesteps 3205.
Path 45 | total_timesteps 3254.
Path 46 | total_timesteps 3290.
Path 47 | total_timesteps 3366.
Path 48 | total_timesteps 3452.
Path 49 | total_timesteps 3500.
Path 50 | total_timesteps 3589.
Path 51 | total_timesteps 3676.
Path 52 | total_timesteps 3760.
Path 53 | total_timesteps 3837.
Path 54 | total_timesteps 3951.
Path 55 | total_timesteps 4037.
Path 56 | total_timesteps 4084.
Path 57 | total_timesteps 4211.
Path 58 | total_timesteps 4254.
Path 59 | total_timesteps 4284.
Path 60 | total_timesteps 4360.
Path 61 | total_timesteps 4447.
Path 62 | total_timesteps 4480.
Path 63 | total_timesteps 4554.
Path 64 | total_timesteps 4629.
Path 65 | total_timesteps 4696.
Path 66 | total_timesteps 4764.
Path 67 | total_timesteps 4810.
Path 68 | total_timesteps 4862.
Path 69 | total_timesteps 4909.
Path 70 | total_timesteps 4993.
Path 71 | total_timesteps 5064.
Path 72 | total_timesteps 5147.
Path 73 | total_timesteps 5172.
Path 74 | total_timesteps 5253.
Path 75 | total_timesteps 5319.
Path 76 | total_timesteps 5370.
Path 77 | total_timesteps 5445.
Path 78 | total_timesteps 5499.
Path 79 | total_timesteps 5557.
Path 80 | total_timesteps 5591.
Path 81 | total_timesteps 5641.
Path 82 | total_timesteps 5702.
Path 83 | total_timesteps 5780.
Path 84 | total_timesteps 5867.
Path 85 | total_timesteps 5922.
Path 86 | total_timesteps 5973.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -9.03    |
| Iteration     | 14       |
| MaximumReturn | 113      |
| MinimumReturn | -79.8    |
| TotalSamples  | 64336    |
----------------------------
itr #15 | 
Fitting dynamics.
Validation loss = 0.4483596980571747
Validation loss = 0.45200470089912415
Validation loss = 0.4556860327720642
Validation loss = 0.4595853388309479
Validation loss = 0.45975273847579956
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 67.
Path 2 | total_timesteps 109.
Path 3 | total_timesteps 174.
Path 4 | total_timesteps 240.
Path 5 | total_timesteps 263.
Path 6 | total_timesteps 341.
Path 7 | total_timesteps 372.
Path 8 | total_timesteps 396.
Path 9 | total_timesteps 419.
Path 10 | total_timesteps 483.
Path 11 | total_timesteps 522.
Path 12 | total_timesteps 575.
Path 13 | total_timesteps 591.
Path 14 | total_timesteps 638.
Path 15 | total_timesteps 733.
Path 16 | total_timesteps 792.
Path 17 | total_timesteps 827.
Path 18 | total_timesteps 919.
Path 19 | total_timesteps 1002.
Path 20 | total_timesteps 1055.
Path 21 | total_timesteps 1090.
Path 22 | total_timesteps 1159.
Path 23 | total_timesteps 1233.
Path 24 | total_timesteps 1293.
Path 25 | total_timesteps 1525.
Path 26 | total_timesteps 1614.
Path 27 | total_timesteps 1648.
Path 28 | total_timesteps 1742.
Path 29 | total_timesteps 1824.
Path 30 | total_timesteps 1850.
Path 31 | total_timesteps 1898.
Path 32 | total_timesteps 1959.
Path 33 | total_timesteps 2028.
Path 34 | total_timesteps 2091.
Path 35 | total_timesteps 2152.
Path 36 | total_timesteps 2233.
Path 37 | total_timesteps 2252.
Path 38 | total_timesteps 2316.
Path 39 | total_timesteps 2369.
Path 40 | total_timesteps 2433.
Path 41 | total_timesteps 2481.
Path 42 | total_timesteps 2547.
Path 43 | total_timesteps 2624.
Path 44 | total_timesteps 2703.
Path 45 | total_timesteps 2769.
Path 46 | total_timesteps 2861.
Path 47 | total_timesteps 2904.
Path 48 | total_timesteps 2988.
Path 49 | total_timesteps 3059.
Path 50 | total_timesteps 3101.
Path 51 | total_timesteps 3212.
Path 52 | total_timesteps 3289.
Path 53 | total_timesteps 3316.
Path 54 | total_timesteps 3416.
Path 55 | total_timesteps 3451.
Path 56 | total_timesteps 3517.
Path 57 | total_timesteps 3551.
Path 58 | total_timesteps 3595.
Path 59 | total_timesteps 3634.
Path 60 | total_timesteps 3669.
Path 61 | total_timesteps 3774.
Path 62 | total_timesteps 3891.
Path 63 | total_timesteps 3931.
Path 64 | total_timesteps 4016.
Path 65 | total_timesteps 4058.
Path 66 | total_timesteps 4119.
Path 67 | total_timesteps 4223.
Path 68 | total_timesteps 4273.
Path 69 | total_timesteps 4339.
Path 70 | total_timesteps 4390.
Path 71 | total_timesteps 4499.
Path 72 | total_timesteps 4563.
Path 73 | total_timesteps 4634.
Path 74 | total_timesteps 4693.
Path 75 | total_timesteps 4756.
Path 76 | total_timesteps 4844.
Path 77 | total_timesteps 4884.
Path 78 | total_timesteps 4963.
Path 79 | total_timesteps 5058.
Path 80 | total_timesteps 5129.
Path 81 | total_timesteps 5196.
Path 82 | total_timesteps 5261.
Path 83 | total_timesteps 5286.
Path 84 | total_timesteps 5357.
Path 85 | total_timesteps 5419.
Path 86 | total_timesteps 5501.
Path 87 | total_timesteps 5561.
Path 88 | total_timesteps 5680.
Path 89 | total_timesteps 5743.
Path 90 | total_timesteps 5866.
Path 91 | total_timesteps 5895.
Path 92 | total_timesteps 5996.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -9.94    |
| Iteration     | 15       |
| MaximumReturn | 113      |
| MinimumReturn | -84.4    |
| TotalSamples  | 68368    |
----------------------------
itr #16 | 
Fitting dynamics.
Validation loss = 0.44938576221466064
Validation loss = 0.4546453058719635
Validation loss = 0.4589715003967285
Validation loss = 0.4647689461708069
Validation loss = 0.46073827147483826
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 76.
Path 2 | total_timesteps 140.
Path 3 | total_timesteps 171.
Path 4 | total_timesteps 241.
Path 5 | total_timesteps 283.
Path 6 | total_timesteps 405.
Path 7 | total_timesteps 466.
Path 8 | total_timesteps 500.
Path 9 | total_timesteps 585.
Path 10 | total_timesteps 740.
Path 11 | total_timesteps 815.
Path 12 | total_timesteps 875.
Path 13 | total_timesteps 913.
Path 14 | total_timesteps 973.
Path 15 | total_timesteps 1039.
Path 16 | total_timesteps 1075.
Path 17 | total_timesteps 1214.
Path 18 | total_timesteps 1289.
Path 19 | total_timesteps 1380.
Path 20 | total_timesteps 1521.
Path 21 | total_timesteps 1589.
Path 22 | total_timesteps 1679.
Path 23 | total_timesteps 1734.
Path 24 | total_timesteps 1837.
Path 25 | total_timesteps 1919.
Path 26 | total_timesteps 1968.
Path 27 | total_timesteps 2018.
Path 28 | total_timesteps 2053.
Path 29 | total_timesteps 2129.
Path 30 | total_timesteps 2204.
Path 31 | total_timesteps 2255.
Path 32 | total_timesteps 2298.
Path 33 | total_timesteps 2379.
Path 34 | total_timesteps 2470.
Path 35 | total_timesteps 2533.
Path 36 | total_timesteps 2577.
Path 37 | total_timesteps 2621.
Path 38 | total_timesteps 2730.
Path 39 | total_timesteps 2776.
Path 40 | total_timesteps 2804.
Path 41 | total_timesteps 2843.
Path 42 | total_timesteps 2895.
Path 43 | total_timesteps 2949.
Path 44 | total_timesteps 3030.
Path 45 | total_timesteps 3070.
Path 46 | total_timesteps 3096.
Path 47 | total_timesteps 3205.
Path 48 | total_timesteps 3239.
Path 49 | total_timesteps 3315.
Path 50 | total_timesteps 3354.
Path 51 | total_timesteps 3398.
Path 52 | total_timesteps 3435.
Path 53 | total_timesteps 3478.
Path 54 | total_timesteps 3558.
Path 55 | total_timesteps 3615.
Path 56 | total_timesteps 3668.
Path 57 | total_timesteps 3724.
Path 58 | total_timesteps 3749.
Path 59 | total_timesteps 3795.
Path 60 | total_timesteps 3854.
Path 61 | total_timesteps 3899.
Path 62 | total_timesteps 3919.
Path 63 | total_timesteps 3975.
Path 64 | total_timesteps 4025.
Path 65 | total_timesteps 4053.
Path 66 | total_timesteps 4154.
Path 67 | total_timesteps 4304.
Path 68 | total_timesteps 4327.
Path 69 | total_timesteps 4395.
Path 70 | total_timesteps 4450.
Path 71 | total_timesteps 4561.
Path 72 | total_timesteps 4676.
Path 73 | total_timesteps 4723.
Path 74 | total_timesteps 4767.
Path 75 | total_timesteps 4905.
Path 76 | total_timesteps 4992.
Path 77 | total_timesteps 5054.
Path 78 | total_timesteps 5113.
Path 79 | total_timesteps 5129.
Path 80 | total_timesteps 5168.
Path 81 | total_timesteps 5199.
Path 82 | total_timesteps 5255.
Path 83 | total_timesteps 5292.
Path 84 | total_timesteps 5372.
Path 85 | total_timesteps 5409.
Path 86 | total_timesteps 5450.
Path 87 | total_timesteps 5527.
Path 88 | total_timesteps 5555.
Path 89 | total_timesteps 5621.
Path 90 | total_timesteps 5697.
Path 91 | total_timesteps 5748.
Path 92 | total_timesteps 5805.
Path 93 | total_timesteps 5861.
Path 94 | total_timesteps 5915.
Path 95 | total_timesteps 5972.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.71    |
| Iteration     | 16       |
| MaximumReturn | 184      |
| MinimumReturn | -48.3    |
| TotalSamples  | 72389    |
----------------------------
itr #17 | 
Fitting dynamics.
Validation loss = 0.4484149217605591
Validation loss = 0.44928115606307983
Validation loss = 0.45783963799476624
Validation loss = 0.4618544578552246
Validation loss = 0.45881903171539307
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 51.
Path 2 | total_timesteps 81.
Path 3 | total_timesteps 147.
Path 4 | total_timesteps 184.
Path 5 | total_timesteps 226.
Path 6 | total_timesteps 261.
Path 7 | total_timesteps 326.
Path 8 | total_timesteps 400.
Path 9 | total_timesteps 571.
Path 10 | total_timesteps 675.
Path 11 | total_timesteps 711.
Path 12 | total_timesteps 729.
Path 13 | total_timesteps 806.
Path 14 | total_timesteps 907.
Path 15 | total_timesteps 971.
Path 16 | total_timesteps 1080.
Path 17 | total_timesteps 1180.
Path 18 | total_timesteps 1234.
Path 19 | total_timesteps 1279.
Path 20 | total_timesteps 1326.
Path 21 | total_timesteps 1375.
Path 22 | total_timesteps 1427.
Path 23 | total_timesteps 1545.
Path 24 | total_timesteps 1623.
Path 25 | total_timesteps 1714.
Path 26 | total_timesteps 1770.
Path 27 | total_timesteps 1838.
Path 28 | total_timesteps 1941.
Path 29 | total_timesteps 1994.
Path 30 | total_timesteps 2032.
Path 31 | total_timesteps 2108.
Path 32 | total_timesteps 2167.
Path 33 | total_timesteps 2246.
Path 34 | total_timesteps 2302.
Path 35 | total_timesteps 2353.
Path 36 | total_timesteps 2430.
Path 37 | total_timesteps 2484.
Path 38 | total_timesteps 2519.
Path 39 | total_timesteps 2579.
Path 40 | total_timesteps 2649.
Path 41 | total_timesteps 2679.
Path 42 | total_timesteps 2734.
Path 43 | total_timesteps 2784.
Path 44 | total_timesteps 2850.
Path 45 | total_timesteps 2905.
Path 46 | total_timesteps 2945.
Path 47 | total_timesteps 3003.
Path 48 | total_timesteps 3071.
Path 49 | total_timesteps 3110.
Path 50 | total_timesteps 3179.
Path 51 | total_timesteps 3268.
Path 52 | total_timesteps 3301.
Path 53 | total_timesteps 3430.
Path 54 | total_timesteps 3469.
Path 55 | total_timesteps 3531.
Path 56 | total_timesteps 3602.
Path 57 | total_timesteps 3644.
Path 58 | total_timesteps 3696.
Path 59 | total_timesteps 3830.
Path 60 | total_timesteps 3880.
Path 61 | total_timesteps 3899.
Path 62 | total_timesteps 3924.
Path 63 | total_timesteps 4087.
Path 64 | total_timesteps 4118.
Path 65 | total_timesteps 4156.
Path 66 | total_timesteps 4237.
Path 67 | total_timesteps 4306.
Path 68 | total_timesteps 4348.
Path 69 | total_timesteps 4418.
Path 70 | total_timesteps 4466.
Path 71 | total_timesteps 4564.
Path 72 | total_timesteps 4611.
Path 73 | total_timesteps 4711.
Path 74 | total_timesteps 4765.
Path 75 | total_timesteps 4889.
Path 76 | total_timesteps 4997.
Path 77 | total_timesteps 5033.
Path 78 | total_timesteps 5062.
Path 79 | total_timesteps 5108.
Path 80 | total_timesteps 5176.
Path 81 | total_timesteps 5267.
Path 82 | total_timesteps 5397.
Path 83 | total_timesteps 5451.
Path 84 | total_timesteps 5492.
Path 85 | total_timesteps 5529.
Path 86 | total_timesteps 5596.
Path 87 | total_timesteps 5659.
Path 88 | total_timesteps 5752.
Path 89 | total_timesteps 5812.
Path 90 | total_timesteps 5901.
Path 91 | total_timesteps 5976.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -13.6    |
| Iteration     | 17       |
| MaximumReturn | 64.5     |
| MinimumReturn | -101     |
| TotalSamples  | 76409    |
----------------------------
itr #18 | 
Fitting dynamics.
Validation loss = 0.4419844150543213
Validation loss = 0.4569840133190155
Validation loss = 0.459388792514801
Validation loss = 0.45905327796936035
Validation loss = 0.4577217996120453
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 22.
Path 2 | total_timesteps 84.
Path 3 | total_timesteps 162.
Path 4 | total_timesteps 280.
Path 5 | total_timesteps 321.
Path 6 | total_timesteps 361.
Path 7 | total_timesteps 444.
Path 8 | total_timesteps 493.
Path 9 | total_timesteps 552.
Path 10 | total_timesteps 635.
Path 11 | total_timesteps 725.
Path 12 | total_timesteps 819.
Path 13 | total_timesteps 953.
Path 14 | total_timesteps 1019.
Path 15 | total_timesteps 1134.
Path 16 | total_timesteps 1166.
Path 17 | total_timesteps 1194.
Path 18 | total_timesteps 1283.
Path 19 | total_timesteps 1388.
Path 20 | total_timesteps 1482.
Path 21 | total_timesteps 1535.
Path 22 | total_timesteps 1625.
Path 23 | total_timesteps 1699.
Path 24 | total_timesteps 1746.
Path 25 | total_timesteps 1777.
Path 26 | total_timesteps 1879.
Path 27 | total_timesteps 1945.
Path 28 | total_timesteps 1994.
Path 29 | total_timesteps 2061.
Path 30 | total_timesteps 2156.
Path 31 | total_timesteps 2203.
Path 32 | total_timesteps 2250.
Path 33 | total_timesteps 2277.
Path 34 | total_timesteps 2303.
Path 35 | total_timesteps 2386.
Path 36 | total_timesteps 2468.
Path 37 | total_timesteps 2542.
Path 38 | total_timesteps 2623.
Path 39 | total_timesteps 2667.
Path 40 | total_timesteps 2734.
Path 41 | total_timesteps 2796.
Path 42 | total_timesteps 2855.
Path 43 | total_timesteps 2920.
Path 44 | total_timesteps 2957.
Path 45 | total_timesteps 3073.
Path 46 | total_timesteps 3089.
Path 47 | total_timesteps 3147.
Path 48 | total_timesteps 3182.
Path 49 | total_timesteps 3239.
Path 50 | total_timesteps 3254.
Path 51 | total_timesteps 3309.
Path 52 | total_timesteps 3351.
Path 53 | total_timesteps 3378.
Path 54 | total_timesteps 3431.
Path 55 | total_timesteps 3478.
Path 56 | total_timesteps 3639.
Path 57 | total_timesteps 3690.
Path 58 | total_timesteps 3719.
Path 59 | total_timesteps 3804.
Path 60 | total_timesteps 3892.
Path 61 | total_timesteps 3969.
Path 62 | total_timesteps 4087.
Path 63 | total_timesteps 4128.
Path 64 | total_timesteps 4188.
Path 65 | total_timesteps 4334.
Path 66 | total_timesteps 4388.
Path 67 | total_timesteps 4435.
Path 68 | total_timesteps 4500.
Path 69 | total_timesteps 4654.
Path 70 | total_timesteps 4702.
Path 71 | total_timesteps 4723.
Path 72 | total_timesteps 4767.
Path 73 | total_timesteps 4806.
Path 74 | total_timesteps 4873.
Path 75 | total_timesteps 4990.
Path 76 | total_timesteps 5082.
Path 77 | total_timesteps 5154.
Path 78 | total_timesteps 5249.
Path 79 | total_timesteps 5278.
Path 80 | total_timesteps 5341.
Path 81 | total_timesteps 5418.
Path 82 | total_timesteps 5498.
Path 83 | total_timesteps 5618.
Path 84 | total_timesteps 5719.
Path 85 | total_timesteps 5763.
Path 86 | total_timesteps 5821.
Path 87 | total_timesteps 5881.
Path 88 | total_timesteps 5923.
Path 89 | total_timesteps 5991.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.31    |
| Iteration     | 18       |
| MaximumReturn | 86.7     |
| MinimumReturn | -60.8    |
| TotalSamples  | 80441    |
----------------------------
itr #19 | 
Fitting dynamics.
Validation loss = 0.4484032690525055
Validation loss = 0.45689859986305237
Validation loss = 0.4575905203819275
Validation loss = 0.4599589407444
Validation loss = 0.45832911133766174
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 76.
Path 2 | total_timesteps 139.
Path 3 | total_timesteps 211.
Path 4 | total_timesteps 277.
Path 5 | total_timesteps 353.
Path 6 | total_timesteps 416.
Path 7 | total_timesteps 465.
Path 8 | total_timesteps 493.
Path 9 | total_timesteps 530.
Path 10 | total_timesteps 598.
Path 11 | total_timesteps 677.
Path 12 | total_timesteps 710.
Path 13 | total_timesteps 790.
Path 14 | total_timesteps 864.
Path 15 | total_timesteps 926.
Path 16 | total_timesteps 996.
Path 17 | total_timesteps 1247.
Path 18 | total_timesteps 1307.
Path 19 | total_timesteps 1380.
Path 20 | total_timesteps 1456.
Path 21 | total_timesteps 1530.
Path 22 | total_timesteps 1605.
Path 23 | total_timesteps 1689.
Path 24 | total_timesteps 1727.
Path 25 | total_timesteps 1778.
Path 26 | total_timesteps 1855.
Path 27 | total_timesteps 1954.
Path 28 | total_timesteps 2054.
Path 29 | total_timesteps 2148.
Path 30 | total_timesteps 2248.
Path 31 | total_timesteps 2326.
Path 32 | total_timesteps 2376.
Path 33 | total_timesteps 2469.
Path 34 | total_timesteps 2504.
Path 35 | total_timesteps 2598.
Path 36 | total_timesteps 2646.
Path 37 | total_timesteps 2714.
Path 38 | total_timesteps 2867.
Path 39 | total_timesteps 2944.
Path 40 | total_timesteps 3039.
Path 41 | total_timesteps 3099.
Path 42 | total_timesteps 3174.
Path 43 | total_timesteps 3219.
Path 44 | total_timesteps 3283.
Path 45 | total_timesteps 3422.
Path 46 | total_timesteps 3514.
Path 47 | total_timesteps 3563.
Path 48 | total_timesteps 3668.
Path 49 | total_timesteps 3727.
Path 50 | total_timesteps 3809.
Path 51 | total_timesteps 3863.
Path 52 | total_timesteps 3910.
Path 53 | total_timesteps 3973.
Path 54 | total_timesteps 4041.
Path 55 | total_timesteps 4118.
Path 56 | total_timesteps 4179.
Path 57 | total_timesteps 4202.
Path 58 | total_timesteps 4268.
Path 59 | total_timesteps 4312.
Path 60 | total_timesteps 4370.
Path 61 | total_timesteps 4424.
Path 62 | total_timesteps 4539.
Path 63 | total_timesteps 4599.
Path 64 | total_timesteps 4694.
Path 65 | total_timesteps 4822.
Path 66 | total_timesteps 4873.
Path 67 | total_timesteps 4937.
Path 68 | total_timesteps 5035.
Path 69 | total_timesteps 5083.
Path 70 | total_timesteps 5126.
Path 71 | total_timesteps 5268.
Path 72 | total_timesteps 5324.
Path 73 | total_timesteps 5351.
Path 74 | total_timesteps 5427.
Path 75 | total_timesteps 5594.
Path 76 | total_timesteps 5731.
Path 77 | total_timesteps 5811.
Path 78 | total_timesteps 5870.
Path 79 | total_timesteps 5900.
Path 80 | total_timesteps 5976.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10.7    |
| Iteration     | 19       |
| MaximumReturn | 226      |
| MinimumReturn | -72.2    |
| TotalSamples  | 84545    |
----------------------------
itr #20 | 
Fitting dynamics.
Validation loss = 0.4485310912132263
Validation loss = 0.45835012197494507
Validation loss = 0.4598168432712555
Validation loss = 0.460951566696167
Validation loss = 0.45772188901901245
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 70.
Path 2 | total_timesteps 109.
Path 3 | total_timesteps 151.
Path 4 | total_timesteps 206.
Path 5 | total_timesteps 331.
Path 6 | total_timesteps 413.
Path 7 | total_timesteps 505.
Path 8 | total_timesteps 570.
Path 9 | total_timesteps 668.
Path 10 | total_timesteps 717.
Path 11 | total_timesteps 786.
Path 12 | total_timesteps 874.
Path 13 | total_timesteps 980.
Path 14 | total_timesteps 1067.
Path 15 | total_timesteps 1107.
Path 16 | total_timesteps 1249.
Path 17 | total_timesteps 1304.
Path 18 | total_timesteps 1495.
Path 19 | total_timesteps 1625.
Path 20 | total_timesteps 1674.
Path 21 | total_timesteps 1759.
Path 22 | total_timesteps 1815.
Path 23 | total_timesteps 1890.
Path 24 | total_timesteps 1951.
Path 25 | total_timesteps 2026.
Path 26 | total_timesteps 2073.
Path 27 | total_timesteps 2133.
Path 28 | total_timesteps 2215.
Path 29 | total_timesteps 2307.
Path 30 | total_timesteps 2335.
Path 31 | total_timesteps 2371.
Path 32 | total_timesteps 2411.
Path 33 | total_timesteps 2469.
Path 34 | total_timesteps 2563.
Path 35 | total_timesteps 2665.
Path 36 | total_timesteps 2742.
Path 37 | total_timesteps 2782.
Path 38 | total_timesteps 2829.
Path 39 | total_timesteps 2892.
Path 40 | total_timesteps 2928.
Path 41 | total_timesteps 2976.
Path 42 | total_timesteps 3016.
Path 43 | total_timesteps 3120.
Path 44 | total_timesteps 3187.
Path 45 | total_timesteps 3251.
Path 46 | total_timesteps 3392.
Path 47 | total_timesteps 3474.
Path 48 | total_timesteps 3542.
Path 49 | total_timesteps 3606.
Path 50 | total_timesteps 3688.
Path 51 | total_timesteps 3768.
Path 52 | total_timesteps 3830.
Path 53 | total_timesteps 3958.
Path 54 | total_timesteps 4009.
Path 55 | total_timesteps 4098.
Path 56 | total_timesteps 4181.
Path 57 | total_timesteps 4216.
Path 58 | total_timesteps 4323.
Path 59 | total_timesteps 4350.
Path 60 | total_timesteps 4417.
Path 61 | total_timesteps 4481.
Path 62 | total_timesteps 4544.
Path 63 | total_timesteps 4630.
Path 64 | total_timesteps 4739.
Path 65 | total_timesteps 4797.
Path 66 | total_timesteps 4842.
Path 67 | total_timesteps 4910.
Path 68 | total_timesteps 4992.
Path 69 | total_timesteps 5130.
Path 70 | total_timesteps 5158.
Path 71 | total_timesteps 5215.
Path 72 | total_timesteps 5255.
Path 73 | total_timesteps 5385.
Path 74 | total_timesteps 5467.
Path 75 | total_timesteps 5536.
Path 76 | total_timesteps 5576.
Path 77 | total_timesteps 5614.
Path 78 | total_timesteps 5677.
Path 79 | total_timesteps 5715.
Path 80 | total_timesteps 5765.
Path 81 | total_timesteps 5856.
Path 82 | total_timesteps 5932.
Path 83 | total_timesteps 5997.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -15.8    |
| Iteration     | 20       |
| MaximumReturn | 82.1     |
| MinimumReturn | -110     |
| TotalSamples  | 88593    |
----------------------------
itr #21 | 
Fitting dynamics.
Validation loss = 0.4521275460720062
Validation loss = 0.4549766480922699
Validation loss = 0.4585084915161133
Validation loss = 0.4620027542114258
Validation loss = 0.4602148234844208
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 25.
Path 2 | total_timesteps 59.
Path 3 | total_timesteps 92.
Path 4 | total_timesteps 214.
Path 5 | total_timesteps 264.
Path 6 | total_timesteps 323.
Path 7 | total_timesteps 404.
Path 8 | total_timesteps 443.
Path 9 | total_timesteps 484.
Path 10 | total_timesteps 616.
Path 11 | total_timesteps 801.
Path 12 | total_timesteps 826.
Path 13 | total_timesteps 882.
Path 14 | total_timesteps 938.
Path 15 | total_timesteps 1007.
Path 16 | total_timesteps 1052.
Path 17 | total_timesteps 1086.
Path 18 | total_timesteps 1106.
Path 19 | total_timesteps 1213.
Path 20 | total_timesteps 1281.
Path 21 | total_timesteps 1334.
Path 22 | total_timesteps 1381.
Path 23 | total_timesteps 1468.
Path 24 | total_timesteps 1537.
Path 25 | total_timesteps 1689.
Path 26 | total_timesteps 1782.
Path 27 | total_timesteps 1892.
Path 28 | total_timesteps 1935.
Path 29 | total_timesteps 1999.
Path 30 | total_timesteps 2050.
Path 31 | total_timesteps 2105.
Path 32 | total_timesteps 2129.
Path 33 | total_timesteps 2248.
Path 34 | total_timesteps 2291.
Path 35 | total_timesteps 2374.
Path 36 | total_timesteps 2427.
Path 37 | total_timesteps 2498.
Path 38 | total_timesteps 2520.
Path 39 | total_timesteps 2608.
Path 40 | total_timesteps 2664.
Path 41 | total_timesteps 2741.
Path 42 | total_timesteps 2793.
Path 43 | total_timesteps 2890.
Path 44 | total_timesteps 2963.
Path 45 | total_timesteps 3039.
Path 46 | total_timesteps 3109.
Path 47 | total_timesteps 3200.
Path 48 | total_timesteps 3300.
Path 49 | total_timesteps 3377.
Path 50 | total_timesteps 3413.
Path 51 | total_timesteps 3494.
Path 52 | total_timesteps 3575.
Path 53 | total_timesteps 3650.
Path 54 | total_timesteps 3738.
Path 55 | total_timesteps 3800.
Path 56 | total_timesteps 3847.
Path 57 | total_timesteps 3889.
Path 58 | total_timesteps 3952.
Path 59 | total_timesteps 4008.
Path 60 | total_timesteps 4043.
Path 61 | total_timesteps 4108.
Path 62 | total_timesteps 4185.
Path 63 | total_timesteps 4243.
Path 64 | total_timesteps 4298.
Path 65 | total_timesteps 4394.
Path 66 | total_timesteps 4549.
Path 67 | total_timesteps 4778.
Path 68 | total_timesteps 4868.
Path 69 | total_timesteps 4892.
Path 70 | total_timesteps 4943.
Path 71 | total_timesteps 5082.
Path 72 | total_timesteps 5129.
Path 73 | total_timesteps 5211.
Path 74 | total_timesteps 5405.
Path 75 | total_timesteps 5451.
Path 76 | total_timesteps 5543.
Path 77 | total_timesteps 5630.
Path 78 | total_timesteps 5728.
Path 79 | total_timesteps 5822.
Path 80 | total_timesteps 5878.
Path 81 | total_timesteps 5938.
Path 82 | total_timesteps 5990.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12      |
| Iteration     | 21       |
| MaximumReturn | 179      |
| MinimumReturn | -82.7    |
| TotalSamples  | 92649    |
----------------------------
itr #22 | 
Fitting dynamics.
Validation loss = 0.44709450006484985
Validation loss = 0.4538385272026062
Validation loss = 0.4581497311592102
Validation loss = 0.4621548354625702
Validation loss = 0.4602290093898773
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 62.
Path 2 | total_timesteps 167.
Path 3 | total_timesteps 203.
Path 4 | total_timesteps 286.
Path 5 | total_timesteps 330.
Path 6 | total_timesteps 387.
Path 7 | total_timesteps 456.
Path 8 | total_timesteps 564.
Path 9 | total_timesteps 625.
Path 10 | total_timesteps 658.
Path 11 | total_timesteps 746.
Path 12 | total_timesteps 860.
Path 13 | total_timesteps 983.
Path 14 | total_timesteps 1056.
Path 15 | total_timesteps 1145.
Path 16 | total_timesteps 1210.
Path 17 | total_timesteps 1262.
Path 18 | total_timesteps 1289.
Path 19 | total_timesteps 1344.
Path 20 | total_timesteps 1476.
Path 21 | total_timesteps 1545.
Path 22 | total_timesteps 1710.
Path 23 | total_timesteps 1781.
Path 24 | total_timesteps 1826.
Path 25 | total_timesteps 1910.
Path 26 | total_timesteps 1981.
Path 27 | total_timesteps 2057.
Path 28 | total_timesteps 2129.
Path 29 | total_timesteps 2216.
Path 30 | total_timesteps 2252.
Path 31 | total_timesteps 2302.
Path 32 | total_timesteps 2370.
Path 33 | total_timesteps 2433.
Path 34 | total_timesteps 2570.
Path 35 | total_timesteps 2594.
Path 36 | total_timesteps 2637.
Path 37 | total_timesteps 2714.
Path 38 | total_timesteps 2771.
Path 39 | total_timesteps 2792.
Path 40 | total_timesteps 2862.
Path 41 | total_timesteps 2916.
Path 42 | total_timesteps 2964.
Path 43 | total_timesteps 3003.
Path 44 | total_timesteps 3066.
Path 45 | total_timesteps 3112.
Path 46 | total_timesteps 3183.
Path 47 | total_timesteps 3230.
Path 48 | total_timesteps 3285.
Path 49 | total_timesteps 3344.
Path 50 | total_timesteps 3417.
Path 51 | total_timesteps 3494.
Path 52 | total_timesteps 3583.
Path 53 | total_timesteps 3654.
Path 54 | total_timesteps 3710.
Path 55 | total_timesteps 3798.
Path 56 | total_timesteps 3860.
Path 57 | total_timesteps 3945.
Path 58 | total_timesteps 4012.
Path 59 | total_timesteps 4087.
Path 60 | total_timesteps 4160.
Path 61 | total_timesteps 4249.
Path 62 | total_timesteps 4312.
Path 63 | total_timesteps 4370.
Path 64 | total_timesteps 4444.
Path 65 | total_timesteps 4523.
Path 66 | total_timesteps 4663.
Path 67 | total_timesteps 4721.
Path 68 | total_timesteps 4766.
Path 69 | total_timesteps 4815.
Path 70 | total_timesteps 4893.
Path 71 | total_timesteps 4961.
Path 72 | total_timesteps 5120.
Path 73 | total_timesteps 5185.
Path 74 | total_timesteps 5287.
Path 75 | total_timesteps 5348.
Path 76 | total_timesteps 5368.
Path 77 | total_timesteps 5448.
Path 78 | total_timesteps 5548.
Path 79 | total_timesteps 5630.
Path 80 | total_timesteps 5717.
Path 81 | total_timesteps 5827.
Path 82 | total_timesteps 5886.
Path 83 | total_timesteps 5973.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.06    |
| Iteration     | 22       |
| MaximumReturn | 203      |
| MinimumReturn | -82.7    |
| TotalSamples  | 96710    |
----------------------------
itr #23 | 
Fitting dynamics.
Validation loss = 0.44660139083862305
Validation loss = 0.455958753824234
Validation loss = 0.4576249122619629
Validation loss = 0.4587794840335846
Validation loss = 0.4604731500148773
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 125.
Path 2 | total_timesteps 249.
Path 3 | total_timesteps 353.
Path 4 | total_timesteps 398.
Path 5 | total_timesteps 471.
Path 6 | total_timesteps 557.
Path 7 | total_timesteps 668.
Path 8 | total_timesteps 774.
Path 9 | total_timesteps 862.
Path 10 | total_timesteps 998.
Path 11 | total_timesteps 1061.
Path 12 | total_timesteps 1152.
Path 13 | total_timesteps 1254.
Path 14 | total_timesteps 1324.
Path 15 | total_timesteps 1375.
Path 16 | total_timesteps 1458.
Path 17 | total_timesteps 1504.
Path 18 | total_timesteps 1560.
Path 19 | total_timesteps 1672.
Path 20 | total_timesteps 1749.
Path 21 | total_timesteps 1802.
Path 22 | total_timesteps 1884.
Path 23 | total_timesteps 1931.
Path 24 | total_timesteps 1993.
Path 25 | total_timesteps 2043.
Path 26 | total_timesteps 2133.
Path 27 | total_timesteps 2161.
Path 28 | total_timesteps 2273.
Path 29 | total_timesteps 2303.
Path 30 | total_timesteps 2399.
Path 31 | total_timesteps 2465.
Path 32 | total_timesteps 2527.
Path 33 | total_timesteps 2616.
Path 34 | total_timesteps 2688.
Path 35 | total_timesteps 2752.
Path 36 | total_timesteps 2875.
Path 37 | total_timesteps 2989.
Path 38 | total_timesteps 3083.
Path 39 | total_timesteps 3153.
Path 40 | total_timesteps 3286.
Path 41 | total_timesteps 3379.
Path 42 | total_timesteps 3470.
Path 43 | total_timesteps 3567.
Path 44 | total_timesteps 3661.
Path 45 | total_timesteps 3717.
Path 46 | total_timesteps 3823.
Path 47 | total_timesteps 3910.
Path 48 | total_timesteps 3974.
Path 49 | total_timesteps 4040.
Path 50 | total_timesteps 4134.
Path 51 | total_timesteps 4251.
Path 52 | total_timesteps 4305.
Path 53 | total_timesteps 4367.
Path 54 | total_timesteps 4436.
Path 55 | total_timesteps 4525.
Path 56 | total_timesteps 4665.
Path 57 | total_timesteps 4689.
Path 58 | total_timesteps 4796.
Path 59 | total_timesteps 4863.
Path 60 | total_timesteps 4903.
Path 61 | total_timesteps 4958.
Path 62 | total_timesteps 5046.
Path 63 | total_timesteps 5124.
Path 64 | total_timesteps 5198.
Path 65 | total_timesteps 5244.
Path 66 | total_timesteps 5358.
Path 67 | total_timesteps 5422.
Path 68 | total_timesteps 5504.
Path 69 | total_timesteps 5603.
Path 70 | total_timesteps 5677.
Path 71 | total_timesteps 5797.
Path 72 | total_timesteps 5854.
Path 73 | total_timesteps 5904.
Path 74 | total_timesteps 5943.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -21.3    |
| Iteration     | 23       |
| MaximumReturn | 48.5     |
| MinimumReturn | -83.8    |
| TotalSamples  | 100715   |
----------------------------
itr #24 | 
Fitting dynamics.
Validation loss = 0.4506882429122925
Validation loss = 0.4532318115234375
Validation loss = 0.46031898260116577
Validation loss = 0.45910099148750305
Validation loss = 0.4588497281074524
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 68.
Path 2 | total_timesteps 140.
Path 3 | total_timesteps 194.
Path 4 | total_timesteps 278.
Path 5 | total_timesteps 346.
Path 6 | total_timesteps 488.
Path 7 | total_timesteps 542.
Path 8 | total_timesteps 634.
Path 9 | total_timesteps 670.
Path 10 | total_timesteps 733.
Path 11 | total_timesteps 832.
Path 12 | total_timesteps 904.
Path 13 | total_timesteps 1007.
Path 14 | total_timesteps 1122.
Path 15 | total_timesteps 1271.
Path 16 | total_timesteps 1342.
Path 17 | total_timesteps 1387.
Path 18 | total_timesteps 1468.
Path 19 | total_timesteps 1567.
Path 20 | total_timesteps 1623.
Path 21 | total_timesteps 1661.
Path 22 | total_timesteps 1697.
Path 23 | total_timesteps 1771.
Path 24 | total_timesteps 1812.
Path 25 | total_timesteps 1893.
Path 26 | total_timesteps 1998.
Path 27 | total_timesteps 2087.
Path 28 | total_timesteps 2231.
Path 29 | total_timesteps 2317.
Path 30 | total_timesteps 2408.
Path 31 | total_timesteps 2465.
Path 32 | total_timesteps 2525.
Path 33 | total_timesteps 2610.
Path 34 | total_timesteps 2703.
Path 35 | total_timesteps 2767.
Path 36 | total_timesteps 2806.
Path 37 | total_timesteps 2842.
Path 38 | total_timesteps 2897.
Path 39 | total_timesteps 2957.
Path 40 | total_timesteps 3001.
Path 41 | total_timesteps 3077.
Path 42 | total_timesteps 3161.
Path 43 | total_timesteps 3198.
Path 44 | total_timesteps 3281.
Path 45 | total_timesteps 3455.
Path 46 | total_timesteps 3529.
Path 47 | total_timesteps 3551.
Path 48 | total_timesteps 3703.
Path 49 | total_timesteps 3761.
Path 50 | total_timesteps 3847.
Path 51 | total_timesteps 3932.
Path 52 | total_timesteps 4014.
Path 53 | total_timesteps 4069.
Path 54 | total_timesteps 4147.
Path 55 | total_timesteps 4201.
Path 56 | total_timesteps 4253.
Path 57 | total_timesteps 4319.
Path 58 | total_timesteps 4371.
Path 59 | total_timesteps 4439.
Path 60 | total_timesteps 4528.
Path 61 | total_timesteps 4592.
Path 62 | total_timesteps 4733.
Path 63 | total_timesteps 4817.
Path 64 | total_timesteps 4895.
Path 65 | total_timesteps 4940.
Path 66 | total_timesteps 4986.
Path 67 | total_timesteps 5036.
Path 68 | total_timesteps 5055.
Path 69 | total_timesteps 5144.
Path 70 | total_timesteps 5178.
Path 71 | total_timesteps 5241.
Path 72 | total_timesteps 5330.
Path 73 | total_timesteps 5379.
Path 74 | total_timesteps 5460.
Path 75 | total_timesteps 5523.
Path 76 | total_timesteps 5605.
Path 77 | total_timesteps 5697.
Path 78 | total_timesteps 5750.
Path 79 | total_timesteps 5783.
Path 80 | total_timesteps 5863.
Path 81 | total_timesteps 5913.
Path 82 | total_timesteps 5968.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -20.4    |
| Iteration     | 24       |
| MaximumReturn | 151      |
| MinimumReturn | -102     |
| TotalSamples  | 104740   |
----------------------------
itr #25 | 
Fitting dynamics.
Validation loss = 0.4498855173587799
Validation loss = 0.45726776123046875
Validation loss = 0.45811954140663147
Validation loss = 0.4598802328109741
Validation loss = 0.46036750078201294
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 51.
Path 2 | total_timesteps 173.
Path 3 | total_timesteps 260.
Path 4 | total_timesteps 330.
Path 5 | total_timesteps 485.
Path 6 | total_timesteps 565.
Path 7 | total_timesteps 618.
Path 8 | total_timesteps 667.
Path 9 | total_timesteps 758.
Path 10 | total_timesteps 845.
Path 11 | total_timesteps 951.
Path 12 | total_timesteps 994.
Path 13 | total_timesteps 1024.
Path 14 | total_timesteps 1105.
Path 15 | total_timesteps 1157.
Path 16 | total_timesteps 1238.
Path 17 | total_timesteps 1294.
Path 18 | total_timesteps 1380.
Path 19 | total_timesteps 1493.
Path 20 | total_timesteps 1605.
Path 21 | total_timesteps 1717.
Path 22 | total_timesteps 1840.
Path 23 | total_timesteps 1896.
Path 24 | total_timesteps 1955.
Path 25 | total_timesteps 2048.
Path 26 | total_timesteps 2134.
Path 27 | total_timesteps 2188.
Path 28 | total_timesteps 2302.
Path 29 | total_timesteps 2396.
Path 30 | total_timesteps 2478.
Path 31 | total_timesteps 2511.
Path 32 | total_timesteps 2594.
Path 33 | total_timesteps 2669.
Path 34 | total_timesteps 2766.
Path 35 | total_timesteps 2840.
Path 36 | total_timesteps 2905.
Path 37 | total_timesteps 2939.
Path 38 | total_timesteps 3013.
Path 39 | total_timesteps 3148.
Path 40 | total_timesteps 3240.
Path 41 | total_timesteps 3303.
Path 42 | total_timesteps 3371.
Path 43 | total_timesteps 3411.
Path 44 | total_timesteps 3470.
Path 45 | total_timesteps 3571.
Path 46 | total_timesteps 3685.
Path 47 | total_timesteps 3765.
Path 48 | total_timesteps 3859.
Path 49 | total_timesteps 3901.
Path 50 | total_timesteps 3993.
Path 51 | total_timesteps 4045.
Path 52 | total_timesteps 4116.
Path 53 | total_timesteps 4199.
Path 54 | total_timesteps 4339.
Path 55 | total_timesteps 4437.
Path 56 | total_timesteps 4555.
Path 57 | total_timesteps 4691.
Path 58 | total_timesteps 4764.
Path 59 | total_timesteps 4872.
Path 60 | total_timesteps 4950.
Path 61 | total_timesteps 4976.
Path 62 | total_timesteps 5045.
Path 63 | total_timesteps 5147.
Path 64 | total_timesteps 5231.
Path 65 | total_timesteps 5324.
Path 66 | total_timesteps 5410.
Path 67 | total_timesteps 5572.
Path 68 | total_timesteps 5672.
Path 69 | total_timesteps 5755.
Path 70 | total_timesteps 5825.
Path 71 | total_timesteps 5899.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -23.1    |
| Iteration     | 25       |
| MaximumReturn | 199      |
| MinimumReturn | -99.4    |
| TotalSamples  | 108773   |
----------------------------
itr #26 | 
Fitting dynamics.
Validation loss = 0.4527074694633484
Validation loss = 0.4548710286617279
Validation loss = 0.4580843150615692
Validation loss = 0.4565528929233551
Validation loss = 0.46199798583984375
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 149.
Path 2 | total_timesteps 250.
Path 3 | total_timesteps 367.
Path 4 | total_timesteps 467.
Path 5 | total_timesteps 530.
Path 6 | total_timesteps 622.
Path 7 | total_timesteps 708.
Path 8 | total_timesteps 764.
Path 9 | total_timesteps 839.
Path 10 | total_timesteps 916.
Path 11 | total_timesteps 987.
Path 12 | total_timesteps 1164.
Path 13 | total_timesteps 1237.
Path 14 | total_timesteps 1306.
Path 15 | total_timesteps 1397.
Path 16 | total_timesteps 1554.
Path 17 | total_timesteps 1633.
Path 18 | total_timesteps 1668.
Path 19 | total_timesteps 1767.
Path 20 | total_timesteps 1824.
Path 21 | total_timesteps 1934.
Path 22 | total_timesteps 2029.
Path 23 | total_timesteps 2143.
Path 24 | total_timesteps 2211.
Path 25 | total_timesteps 2275.
Path 26 | total_timesteps 2326.
Path 27 | total_timesteps 2373.
Path 28 | total_timesteps 2524.
Path 29 | total_timesteps 2612.
Path 30 | total_timesteps 2685.
Path 31 | total_timesteps 2775.
Path 32 | total_timesteps 2955.
Path 33 | total_timesteps 3043.
Path 34 | total_timesteps 3083.
Path 35 | total_timesteps 3190.
Path 36 | total_timesteps 3268.
Path 37 | total_timesteps 3365.
Path 38 | total_timesteps 3470.
Path 39 | total_timesteps 3617.
Path 40 | total_timesteps 3698.
Path 41 | total_timesteps 3779.
Path 42 | total_timesteps 3873.
Path 43 | total_timesteps 3971.
Path 44 | total_timesteps 4068.
Path 45 | total_timesteps 4206.
Path 46 | total_timesteps 4293.
Path 47 | total_timesteps 4373.
Path 48 | total_timesteps 4456.
Path 49 | total_timesteps 4532.
Path 50 | total_timesteps 4625.
Path 51 | total_timesteps 4728.
Path 52 | total_timesteps 4835.
Path 53 | total_timesteps 4927.
Path 54 | total_timesteps 5018.
Path 55 | total_timesteps 5321.
Path 56 | total_timesteps 5395.
Path 57 | total_timesteps 5460.
Path 58 | total_timesteps 5532.
Path 59 | total_timesteps 5639.
Path 60 | total_timesteps 5703.
Path 61 | total_timesteps 5813.
Path 62 | total_timesteps 5877.
Path 63 | total_timesteps 5973.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.8    |
| Iteration     | 26       |
| MaximumReturn | 330      |
| MinimumReturn | -104     |
| TotalSamples  | 112827   |
----------------------------
itr #27 | 
Fitting dynamics.
Validation loss = 0.4570901691913605
Validation loss = 0.45613986253738403
Validation loss = 0.45804357528686523
Validation loss = 0.46059077978134155
Validation loss = 0.45957979559898376
Validation loss = 0.4581327736377716
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 52.
Path 2 | total_timesteps 125.
Path 3 | total_timesteps 160.
Path 4 | total_timesteps 210.
Path 5 | total_timesteps 326.
Path 6 | total_timesteps 405.
Path 7 | total_timesteps 461.
Path 8 | total_timesteps 571.
Path 9 | total_timesteps 638.
Path 10 | total_timesteps 694.
Path 11 | total_timesteps 817.
Path 12 | total_timesteps 925.
Path 13 | total_timesteps 980.
Path 14 | total_timesteps 1043.
Path 15 | total_timesteps 1103.
Path 16 | total_timesteps 1199.
Path 17 | total_timesteps 1268.
Path 18 | total_timesteps 1351.
Path 19 | total_timesteps 1450.
Path 20 | total_timesteps 1641.
Path 21 | total_timesteps 1718.
Path 22 | total_timesteps 1782.
Path 23 | total_timesteps 1850.
Path 24 | total_timesteps 2001.
Path 25 | total_timesteps 2076.
Path 26 | total_timesteps 2223.
Path 27 | total_timesteps 2346.
Path 28 | total_timesteps 2418.
Path 29 | total_timesteps 2524.
Path 30 | total_timesteps 2672.
Path 31 | total_timesteps 2747.
Path 32 | total_timesteps 2834.
Path 33 | total_timesteps 2934.
Path 34 | total_timesteps 3008.
Path 35 | total_timesteps 3117.
Path 36 | total_timesteps 3393.
Path 37 | total_timesteps 3466.
Path 38 | total_timesteps 3575.
Path 39 | total_timesteps 3695.
Path 40 | total_timesteps 3752.
Path 41 | total_timesteps 3836.
Path 42 | total_timesteps 4045.
Path 43 | total_timesteps 4134.
Path 44 | total_timesteps 4229.
Path 45 | total_timesteps 4271.
Path 46 | total_timesteps 4356.
Path 47 | total_timesteps 4408.
Path 48 | total_timesteps 4533.
Path 49 | total_timesteps 4585.
Path 50 | total_timesteps 4679.
Path 51 | total_timesteps 4815.
Path 52 | total_timesteps 4886.
Path 53 | total_timesteps 4989.
Path 54 | total_timesteps 5063.
Path 55 | total_timesteps 5152.
Path 56 | total_timesteps 5212.
Path 57 | total_timesteps 5285.
Path 58 | total_timesteps 5326.
Path 59 | total_timesteps 5399.
Path 60 | total_timesteps 5474.
Path 61 | total_timesteps 5503.
Path 62 | total_timesteps 5726.
Path 63 | total_timesteps 5803.
Path 64 | total_timesteps 5881.
Path 65 | total_timesteps 5950.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -13.8    |
| Iteration     | 27       |
| MaximumReturn | 174      |
| MinimumReturn | -99.1    |
| TotalSamples  | 116895   |
----------------------------
itr #28 | 
Fitting dynamics.
Validation loss = 0.45016196370124817
Validation loss = 0.4562011957168579
Validation loss = 0.45902135968208313
Validation loss = 0.4602324366569519
Validation loss = 0.46133047342300415
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 26.
Path 2 | total_timesteps 86.
Path 3 | total_timesteps 170.
Path 4 | total_timesteps 218.
Path 5 | total_timesteps 289.
Path 6 | total_timesteps 355.
Path 7 | total_timesteps 403.
Path 8 | total_timesteps 527.
Path 9 | total_timesteps 599.
Path 10 | total_timesteps 732.
Path 11 | total_timesteps 776.
Path 12 | total_timesteps 879.
Path 13 | total_timesteps 977.
Path 14 | total_timesteps 1053.
Path 15 | total_timesteps 1129.
Path 16 | total_timesteps 1243.
Path 17 | total_timesteps 1306.
Path 18 | total_timesteps 1351.
Path 19 | total_timesteps 1420.
Path 20 | total_timesteps 1579.
Path 21 | total_timesteps 1621.
Path 22 | total_timesteps 1688.
Path 23 | total_timesteps 1781.
Path 24 | total_timesteps 1828.
Path 25 | total_timesteps 1895.
Path 26 | total_timesteps 1962.
Path 27 | total_timesteps 2038.
Path 28 | total_timesteps 2123.
Path 29 | total_timesteps 2222.
Path 30 | total_timesteps 2297.
Path 31 | total_timesteps 2370.
Path 32 | total_timesteps 2415.
Path 33 | total_timesteps 2507.
Path 34 | total_timesteps 2552.
Path 35 | total_timesteps 2645.
Path 36 | total_timesteps 2792.
Path 37 | total_timesteps 2840.
Path 38 | total_timesteps 2937.
Path 39 | total_timesteps 2978.
Path 40 | total_timesteps 3046.
Path 41 | total_timesteps 3196.
Path 42 | total_timesteps 3287.
Path 43 | total_timesteps 3353.
Path 44 | total_timesteps 3477.
Path 45 | total_timesteps 3706.
Path 46 | total_timesteps 3732.
Path 47 | total_timesteps 3836.
Path 48 | total_timesteps 3903.
Path 49 | total_timesteps 3940.
Path 50 | total_timesteps 4034.
Path 51 | total_timesteps 4071.
Path 52 | total_timesteps 4189.
Path 53 | total_timesteps 4276.
Path 54 | total_timesteps 4373.
Path 55 | total_timesteps 4429.
Path 56 | total_timesteps 4497.
Path 57 | total_timesteps 4572.
Path 58 | total_timesteps 4631.
Path 59 | total_timesteps 4766.
Path 60 | total_timesteps 4856.
Path 61 | total_timesteps 4946.
Path 62 | total_timesteps 4995.
Path 63 | total_timesteps 5142.
Path 64 | total_timesteps 5242.
Path 65 | total_timesteps 5331.
Path 66 | total_timesteps 5392.
Path 67 | total_timesteps 5554.
Path 68 | total_timesteps 5605.
Path 69 | total_timesteps 5713.
Path 70 | total_timesteps 5775.
Path 71 | total_timesteps 5843.
Path 72 | total_timesteps 5906.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -21.9    |
| Iteration     | 28       |
| MaximumReturn | 42.9     |
| MinimumReturn | -77.1    |
| TotalSamples  | 120897   |
----------------------------
itr #29 | 
Fitting dynamics.
Validation loss = 0.4548214375972748
Validation loss = 0.4558475613594055
Validation loss = 0.4629329741001129
Validation loss = 0.46180155873298645
Validation loss = 0.46126365661621094
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 59.
Path 2 | total_timesteps 113.
Path 3 | total_timesteps 175.
Path 4 | total_timesteps 264.
Path 5 | total_timesteps 359.
Path 6 | total_timesteps 478.
Path 7 | total_timesteps 541.
Path 8 | total_timesteps 637.
Path 9 | total_timesteps 705.
Path 10 | total_timesteps 782.
Path 11 | total_timesteps 885.
Path 12 | total_timesteps 941.
Path 13 | total_timesteps 1040.
Path 14 | total_timesteps 1140.
Path 15 | total_timesteps 1187.
Path 16 | total_timesteps 1274.
Path 17 | total_timesteps 1400.
Path 18 | total_timesteps 1544.
Path 19 | total_timesteps 1645.
Path 20 | total_timesteps 1710.
Path 21 | total_timesteps 1891.
Path 22 | total_timesteps 1951.
Path 23 | total_timesteps 2045.
Path 24 | total_timesteps 2078.
Path 25 | total_timesteps 2233.
Path 26 | total_timesteps 2309.
Path 27 | total_timesteps 2396.
Path 28 | total_timesteps 2522.
Path 29 | total_timesteps 2609.
Path 30 | total_timesteps 2666.
Path 31 | total_timesteps 2766.
Path 32 | total_timesteps 2923.
Path 33 | total_timesteps 3033.
Path 34 | total_timesteps 3135.
Path 35 | total_timesteps 3272.
Path 36 | total_timesteps 3379.
Path 37 | total_timesteps 3468.
Path 38 | total_timesteps 3551.
Path 39 | total_timesteps 3626.
Path 40 | total_timesteps 3725.
Path 41 | total_timesteps 3833.
Path 42 | total_timesteps 3877.
Path 43 | total_timesteps 3953.
Path 44 | total_timesteps 4053.
Path 45 | total_timesteps 4138.
Path 46 | total_timesteps 4223.
Path 47 | total_timesteps 4304.
Path 48 | total_timesteps 4390.
Path 49 | total_timesteps 4500.
Path 50 | total_timesteps 4587.
Path 51 | total_timesteps 4691.
Path 52 | total_timesteps 4773.
Path 53 | total_timesteps 4808.
Path 54 | total_timesteps 4858.
Path 55 | total_timesteps 5054.
Path 56 | total_timesteps 5118.
Path 57 | total_timesteps 5196.
Path 58 | total_timesteps 5275.
Path 59 | total_timesteps 5333.
Path 60 | total_timesteps 5412.
Path 61 | total_timesteps 5502.
Path 62 | total_timesteps 5616.
Path 63 | total_timesteps 5689.
Path 64 | total_timesteps 5769.
Path 65 | total_timesteps 5837.
Path 66 | total_timesteps 5917.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -27.9    |
| Iteration     | 29       |
| MaximumReturn | 132      |
| MinimumReturn | -102     |
| TotalSamples  | 124897   |
----------------------------
itr #30 | 
Fitting dynamics.
Validation loss = 0.4535166621208191
Validation loss = 0.4598374664783478
Validation loss = 0.46263155341148376
Validation loss = 0.4599720239639282
Validation loss = 0.46148768067359924
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 79.
Path 2 | total_timesteps 149.
Path 3 | total_timesteps 232.
Path 4 | total_timesteps 359.
Path 5 | total_timesteps 459.
Path 6 | total_timesteps 525.
Path 7 | total_timesteps 597.
Path 8 | total_timesteps 737.
Path 9 | total_timesteps 806.
Path 10 | total_timesteps 951.
Path 11 | total_timesteps 1053.
Path 12 | total_timesteps 1176.
Path 13 | total_timesteps 1283.
Path 14 | total_timesteps 1375.
Path 15 | total_timesteps 1505.
Path 16 | total_timesteps 1556.
Path 17 | total_timesteps 1655.
Path 18 | total_timesteps 1731.
Path 19 | total_timesteps 1904.
Path 20 | total_timesteps 1986.
Path 21 | total_timesteps 2055.
Path 22 | total_timesteps 2102.
Path 23 | total_timesteps 2149.
Path 24 | total_timesteps 2245.
Path 25 | total_timesteps 2304.
Path 26 | total_timesteps 2358.
Path 27 | total_timesteps 2434.
Path 28 | total_timesteps 2563.
Path 29 | total_timesteps 2616.
Path 30 | total_timesteps 2682.
Path 31 | total_timesteps 2730.
Path 32 | total_timesteps 2856.
Path 33 | total_timesteps 2880.
Path 34 | total_timesteps 3007.
Path 35 | total_timesteps 3150.
Path 36 | total_timesteps 3225.
Path 37 | total_timesteps 3296.
Path 38 | total_timesteps 3354.
Path 39 | total_timesteps 3508.
Path 40 | total_timesteps 3588.
Path 41 | total_timesteps 3633.
Path 42 | total_timesteps 3679.
Path 43 | total_timesteps 3743.
Path 44 | total_timesteps 3828.
Path 45 | total_timesteps 3865.
Path 46 | total_timesteps 3911.
Path 47 | total_timesteps 4000.
Path 48 | total_timesteps 4070.
Path 49 | total_timesteps 4116.
Path 50 | total_timesteps 4289.
Path 51 | total_timesteps 4449.
Path 52 | total_timesteps 4528.
Path 53 | total_timesteps 4596.
Path 54 | total_timesteps 4684.
Path 55 | total_timesteps 4806.
Path 56 | total_timesteps 4926.
Path 57 | total_timesteps 5047.
Path 58 | total_timesteps 5138.
Path 59 | total_timesteps 5226.
Path 60 | total_timesteps 5319.
Path 61 | total_timesteps 5386.
Path 62 | total_timesteps 5510.
Path 63 | total_timesteps 5600.
Path 64 | total_timesteps 5714.
Path 65 | total_timesteps 5818.
Path 66 | total_timesteps 5874.
Path 67 | total_timesteps 5979.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -18.5    |
| Iteration     | 30       |
| MaximumReturn | 162      |
| MinimumReturn | -115     |
| TotalSamples  | 128909   |
----------------------------
itr #31 | 
Fitting dynamics.
Validation loss = 0.4574890732765198
Validation loss = 0.45841607451438904
Validation loss = 0.46058812737464905
Validation loss = 0.46137115359306335
Validation loss = 0.4608619213104248
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 146.
Path 2 | total_timesteps 227.
Path 3 | total_timesteps 359.
Path 4 | total_timesteps 443.
Path 5 | total_timesteps 512.
Path 6 | total_timesteps 556.
Path 7 | total_timesteps 631.
Path 8 | total_timesteps 758.
Path 9 | total_timesteps 819.
Path 10 | total_timesteps 919.
Path 11 | total_timesteps 998.
Path 12 | total_timesteps 1073.
Path 13 | total_timesteps 1151.
Path 14 | total_timesteps 1396.
Path 15 | total_timesteps 1460.
Path 16 | total_timesteps 1670.
Path 17 | total_timesteps 1835.
Path 18 | total_timesteps 1868.
Path 19 | total_timesteps 2006.
Path 20 | total_timesteps 2132.
Path 21 | total_timesteps 2194.
Path 22 | total_timesteps 2264.
Path 23 | total_timesteps 2375.
Path 24 | total_timesteps 2506.
Path 25 | total_timesteps 2605.
Path 26 | total_timesteps 2668.
Path 27 | total_timesteps 2742.
Path 28 | total_timesteps 2840.
Path 29 | total_timesteps 2921.
Path 30 | total_timesteps 3036.
Path 31 | total_timesteps 3263.
Path 32 | total_timesteps 3336.
Path 33 | total_timesteps 3417.
Path 34 | total_timesteps 3469.
Path 35 | total_timesteps 3546.
Path 36 | total_timesteps 3620.
Path 37 | total_timesteps 3676.
Path 38 | total_timesteps 3738.
Path 39 | total_timesteps 3817.
Path 40 | total_timesteps 3906.
Path 41 | total_timesteps 4057.
Path 42 | total_timesteps 4115.
Path 43 | total_timesteps 4288.
Path 44 | total_timesteps 4380.
Path 45 | total_timesteps 4528.
Path 46 | total_timesteps 4580.
Path 47 | total_timesteps 4612.
Path 48 | total_timesteps 4658.
Path 49 | total_timesteps 4823.
Path 50 | total_timesteps 4885.
Path 51 | total_timesteps 4970.
Path 52 | total_timesteps 5117.
Path 53 | total_timesteps 5209.
Path 54 | total_timesteps 5279.
Path 55 | total_timesteps 5375.
Path 56 | total_timesteps 5489.
Path 57 | total_timesteps 5569.
Path 58 | total_timesteps 5807.
Path 59 | total_timesteps 5886.
Path 60 | total_timesteps 5957.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.32    |
| Iteration     | 31       |
| MaximumReturn | 242      |
| MinimumReturn | -76.2    |
| TotalSamples  | 132935   |
----------------------------
itr #32 | 
Fitting dynamics.
Validation loss = 0.4556775689125061
Validation loss = 0.4598830044269562
Validation loss = 0.4603342115879059
Validation loss = 0.46303534507751465
Validation loss = 0.46091392636299133
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 107.
Path 2 | total_timesteps 164.
Path 3 | total_timesteps 234.
Path 4 | total_timesteps 345.
Path 5 | total_timesteps 463.
Path 6 | total_timesteps 529.
Path 7 | total_timesteps 607.
Path 8 | total_timesteps 676.
Path 9 | total_timesteps 786.
Path 10 | total_timesteps 872.
Path 11 | total_timesteps 988.
Path 12 | total_timesteps 1089.
Path 13 | total_timesteps 1179.
Path 14 | total_timesteps 1261.
Path 15 | total_timesteps 1375.
Path 16 | total_timesteps 1477.
Path 17 | total_timesteps 1726.
Path 18 | total_timesteps 1950.
Path 19 | total_timesteps 2015.
Path 20 | total_timesteps 2091.
Path 21 | total_timesteps 2159.
Path 22 | total_timesteps 2233.
Path 23 | total_timesteps 2246.
Path 24 | total_timesteps 2350.
Path 25 | total_timesteps 2447.
Path 26 | total_timesteps 2507.
Path 27 | total_timesteps 2672.
Path 28 | total_timesteps 2747.
Path 29 | total_timesteps 2827.
Path 30 | total_timesteps 2885.
Path 31 | total_timesteps 3000.
Path 32 | total_timesteps 3070.
Path 33 | total_timesteps 3172.
Path 34 | total_timesteps 3205.
Path 35 | total_timesteps 3228.
Path 36 | total_timesteps 3288.
Path 37 | total_timesteps 3372.
Path 38 | total_timesteps 3528.
Path 39 | total_timesteps 3597.
Path 40 | total_timesteps 3694.
Path 41 | total_timesteps 3808.
Path 42 | total_timesteps 3903.
Path 43 | total_timesteps 3989.
Path 44 | total_timesteps 4071.
Path 45 | total_timesteps 4139.
Path 46 | total_timesteps 4198.
Path 47 | total_timesteps 4309.
Path 48 | total_timesteps 4480.
Path 49 | total_timesteps 4596.
Path 50 | total_timesteps 4669.
Path 51 | total_timesteps 4780.
Path 52 | total_timesteps 4848.
Path 53 | total_timesteps 4967.
Path 54 | total_timesteps 5030.
Path 55 | total_timesteps 5134.
Path 56 | total_timesteps 5211.
Path 57 | total_timesteps 5317.
Path 58 | total_timesteps 5387.
Path 59 | total_timesteps 5418.
Path 60 | total_timesteps 5551.
Path 61 | total_timesteps 5635.
Path 62 | total_timesteps 5724.
Path 63 | total_timesteps 5814.
Path 64 | total_timesteps 5881.
Path 65 | total_timesteps 5961.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -29.6    |
| Iteration     | 32       |
| MaximumReturn | 162      |
| MinimumReturn | -93.3    |
| TotalSamples  | 136974   |
----------------------------
