Logging to experiments/invertedPendulum/test-exp-dir/test-exp_seed2531
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7354192733764648
Validation loss = 0.4011092483997345
Validation loss = 0.3547738492488861
Validation loss = 0.3431258201599121
Validation loss = 0.3187326490879059
Validation loss = 0.29893434047698975
Validation loss = 0.2647978663444519
Validation loss = 0.2514089345932007
Validation loss = 0.23385687172412872
Validation loss = 0.23840893805027008
Validation loss = 0.20776689052581787
Validation loss = 0.21409380435943604
Validation loss = 0.19667059183120728
Validation loss = 0.18455320596694946
Validation loss = 0.18490009009838104
Validation loss = 0.17593182623386383
Validation loss = 0.16129526495933533
Validation loss = 0.15736769139766693
Validation loss = 0.1542341411113739
Validation loss = 0.1536927968263626
Validation loss = 0.13780410587787628
Validation loss = 0.13331882655620575
Validation loss = 0.12988649308681488
Validation loss = 0.1262880116701126
Validation loss = 0.13244283199310303
Validation loss = 0.1262972056865692
Validation loss = 0.11637067049741745
Validation loss = 0.1156914234161377
Validation loss = 0.11389747262001038
Validation loss = 0.11058168113231659
Validation loss = 0.10491634160280228
Validation loss = 0.09911014139652252
Validation loss = 0.10699596256017685
Validation loss = 0.09122619777917862
Validation loss = 0.09369773417711258
Validation loss = 0.09167194366455078
Validation loss = 0.10424363613128662
Validation loss = 0.10006649047136307
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7410763502120972
Validation loss = 0.39567360281944275
Validation loss = 0.3691502809524536
Validation loss = 0.3356762230396271
Validation loss = 0.3097071349620819
Validation loss = 0.28456786274909973
Validation loss = 0.2589012086391449
Validation loss = 0.23924848437309265
Validation loss = 0.22318965196609497
Validation loss = 0.2105075865983963
Validation loss = 0.20913240313529968
Validation loss = 0.18812333047389984
Validation loss = 0.18589061498641968
Validation loss = 0.17680926620960236
Validation loss = 0.1628873348236084
Validation loss = 0.15575408935546875
Validation loss = 0.1511097401380539
Validation loss = 0.16362610459327698
Validation loss = 0.1566123366355896
Validation loss = 0.14535078406333923
Validation loss = 0.13450197875499725
Validation loss = 0.12132131308317184
Validation loss = 0.136162668466568
Validation loss = 0.13468818366527557
Validation loss = 0.12229638546705246
Validation loss = 0.11967634409666061
Validation loss = 0.11000370234251022
Validation loss = 0.11179761588573456
Validation loss = 0.1032894179224968
Validation loss = 0.11417456716299057
Validation loss = 0.11552884429693222
Validation loss = 0.11708278954029083
Validation loss = 0.11730723083019257
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7290141582489014
Validation loss = 0.4000377357006073
Validation loss = 0.34837037324905396
Validation loss = 0.33486178517341614
Validation loss = 0.31335270404815674
Validation loss = 0.28267890214920044
Validation loss = 0.25988373160362244
Validation loss = 0.2411024272441864
Validation loss = 0.22444459795951843
Validation loss = 0.21036623418331146
Validation loss = 0.1929750293493271
Validation loss = 0.20272402465343475
Validation loss = 0.18985846638679504
Validation loss = 0.17336593568325043
Validation loss = 0.16825394332408905
Validation loss = 0.16505704820156097
Validation loss = 0.15266230702400208
Validation loss = 0.14205415546894073
Validation loss = 0.1370099037885666
Validation loss = 0.14845672249794006
Validation loss = 0.13482150435447693
Validation loss = 0.13229802250862122
Validation loss = 0.12349563837051392
Validation loss = 0.12454497069120407
Validation loss = 0.12098484486341476
Validation loss = 0.11618278920650482
Validation loss = 0.10372383892536163
Validation loss = 0.11357903480529785
Validation loss = 0.10321289300918579
Validation loss = 0.10809438675642014
Validation loss = 0.11425665766000748
Validation loss = 0.11092089116573334
Validation loss = 0.10237661004066467
Validation loss = 0.10284500569105148
Validation loss = 0.10911757498979568
Validation loss = 0.09905003011226654
Validation loss = 0.1047799289226532
Validation loss = 0.10364777594804764
Validation loss = 0.10339568555355072
Validation loss = 0.10547754913568497
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7317677140235901
Validation loss = 0.4119753837585449
Validation loss = 0.3524809181690216
Validation loss = 0.34399735927581787
Validation loss = 0.31989479064941406
Validation loss = 0.2960970103740692
Validation loss = 0.2678491473197937
Validation loss = 0.2470134049654007
Validation loss = 0.22631821036338806
Validation loss = 0.21230466663837433
Validation loss = 0.20014157891273499
Validation loss = 0.1957373321056366
Validation loss = 0.1820366531610489
Validation loss = 0.1772851049900055
Validation loss = 0.16902874410152435
Validation loss = 0.16381162405014038
Validation loss = 0.15810304880142212
Validation loss = 0.15170133113861084
Validation loss = 0.15059895813465118
Validation loss = 0.14951592683792114
Validation loss = 0.14547835290431976
Validation loss = 0.14836396276950836
Validation loss = 0.12282586097717285
Validation loss = 0.10724259912967682
Validation loss = 0.11020273715257645
Validation loss = 0.1280377060174942
Validation loss = 0.11014065891504288
Validation loss = 0.1118326410651207
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7570399641990662
Validation loss = 0.36272290349006653
Validation loss = 0.385492205619812
Validation loss = 0.35149043798446655
Validation loss = 0.3264836072921753
Validation loss = 0.3045538067817688
Validation loss = 0.28805431723594666
Validation loss = 0.2554020583629608
Validation loss = 0.2396690994501114
Validation loss = 0.23472316563129425
Validation loss = 0.21654854714870453
Validation loss = 0.21734485030174255
Validation loss = 0.20515941083431244
Validation loss = 0.1851045936346054
Validation loss = 0.18513096868991852
Validation loss = 0.17045779526233673
Validation loss = 0.1612052619457245
Validation loss = 0.15168362855911255
Validation loss = 0.1594959944486618
Validation loss = 0.16034263372421265
Validation loss = 0.1459711790084839
Validation loss = 0.13802006840705872
Validation loss = 0.13661490380764008
Validation loss = 0.12350451201200485
Validation loss = 0.11943982541561127
Validation loss = 0.11605492234230042
Validation loss = 0.11959606409072876
Validation loss = 0.1203690767288208
Validation loss = 0.11623772233724594
Validation loss = 0.11067859083414078
Validation loss = 0.1111605167388916
Validation loss = 0.1053573489189148
Validation loss = 0.11608408391475677
Validation loss = 0.1026478111743927
Validation loss = 0.10798801481723785
Validation loss = 0.10970164835453033
Validation loss = 0.10666388273239136
Validation loss = 0.09706096351146698
Validation loss = 0.10667066276073456
Validation loss = 0.10343402624130249
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.34    |
| Iteration     | 0        |
| MaximumReturn | -0.035   |
| MinimumReturn | -28.1    |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3196001648902893
Validation loss = 0.18239185214042664
Validation loss = 0.15904444456100464
Validation loss = 0.14687839150428772
Validation loss = 0.1263732612133026
Validation loss = 0.1124911904335022
Validation loss = 0.10108616948127747
Validation loss = 0.08770982921123505
Validation loss = 0.09083140641450882
Validation loss = 0.09092867374420166
Validation loss = 0.07393425703048706
Validation loss = 0.07606726139783859
Validation loss = 0.06804084032773972
Validation loss = 0.06218378245830536
Validation loss = 0.05856062471866608
Validation loss = 0.06481878459453583
Validation loss = 0.05762476846575737
Validation loss = 0.05404948443174362
Validation loss = 0.05257253348827362
Validation loss = 0.061839114874601364
Validation loss = 0.04886658117175102
Validation loss = 0.05030621960759163
Validation loss = 0.0483739860355854
Validation loss = 0.04872799664735794
Validation loss = 0.04806876555085182
Validation loss = 0.05275267735123634
Validation loss = 0.045584384351968765
Validation loss = 0.04483424872159958
Validation loss = 0.05123576521873474
Validation loss = 0.044701527804136276
Validation loss = 0.04297318682074547
Validation loss = 0.04280809685587883
Validation loss = 0.044804394245147705
Validation loss = 0.058621909469366074
Validation loss = 0.07156454026699066
Validation loss = 0.04232248291373253
Validation loss = 0.05549698695540428
Validation loss = 0.04583358019590378
Validation loss = 0.047917336225509644
Validation loss = 0.04515683650970459
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3131868839263916
Validation loss = 0.19247083365917206
Validation loss = 0.16667449474334717
Validation loss = 0.14806127548217773
Validation loss = 0.12699849903583527
Validation loss = 0.1165243610739708
Validation loss = 0.1099141389131546
Validation loss = 0.09936307370662689
Validation loss = 0.09262841939926147
Validation loss = 0.07782094925642014
Validation loss = 0.07993648946285248
Validation loss = 0.0826449766755104
Validation loss = 0.08226197957992554
Validation loss = 0.06712596118450165
Validation loss = 0.06748594343662262
Validation loss = 0.062479712069034576
Validation loss = 0.053392164409160614
Validation loss = 0.059977613389492035
Validation loss = 0.0549573078751564
Validation loss = 0.052122801542282104
Validation loss = 0.05061572417616844
Validation loss = 0.049108218401670456
Validation loss = 0.0587114617228508
Validation loss = 0.04710876941680908
Validation loss = 0.053744181990623474
Validation loss = 0.05601324141025543
Validation loss = 0.057742007076740265
Validation loss = 0.05223755165934563
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.35506367683410645
Validation loss = 0.18523111939430237
Validation loss = 0.16193804144859314
Validation loss = 0.13939116895198822
Validation loss = 0.12012135237455368
Validation loss = 0.11925822496414185
Validation loss = 0.10154134780168533
Validation loss = 0.08415726572275162
Validation loss = 0.09454945474863052
Validation loss = 0.08009800314903259
Validation loss = 0.06987352669239044
Validation loss = 0.06726361066102982
Validation loss = 0.05424385145306587
Validation loss = 0.055955398827791214
Validation loss = 0.06254478543996811
Validation loss = 0.0568326935172081
Validation loss = 0.04836219921708107
Validation loss = 0.048782989382743835
Validation loss = 0.046105168759822845
Validation loss = 0.04791202396154404
Validation loss = 0.04965068772435188
Validation loss = 0.04064558073878288
Validation loss = 0.04331035912036896
Validation loss = 0.04661267250776291
Validation loss = 0.042735058814287186
Validation loss = 0.05131226032972336
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.31853434443473816
Validation loss = 0.1790592074394226
Validation loss = 0.16601011157035828
Validation loss = 0.14240539073944092
Validation loss = 0.12577271461486816
Validation loss = 0.11673104763031006
Validation loss = 0.11434997618198395
Validation loss = 0.09471916407346725
Validation loss = 0.09494569897651672
Validation loss = 0.08454988896846771
Validation loss = 0.08369074761867523
Validation loss = 0.09377798438072205
Validation loss = 0.07407424598932266
Validation loss = 0.06582459807395935
Validation loss = 0.059998542070388794
Validation loss = 0.06662283092737198
Validation loss = 0.05641069635748863
Validation loss = 0.07104052603244781
Validation loss = 0.05837800353765488
Validation loss = 0.061317216604948044
Validation loss = 0.057471007108688354
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.30356743931770325
Validation loss = 0.17228436470031738
Validation loss = 0.1550203114748001
Validation loss = 0.13579311966896057
Validation loss = 0.11896325647830963
Validation loss = 0.10107732564210892
Validation loss = 0.0938338115811348
Validation loss = 0.091313436627388
Validation loss = 0.08197755366563797
Validation loss = 0.0661541074514389
Validation loss = 0.0729459896683693
Validation loss = 0.08006798475980759
Validation loss = 0.0645577535033226
Validation loss = 0.058875031769275665
Validation loss = 0.06136482208967209
Validation loss = 0.06216849014163017
Validation loss = 0.05779596418142319
Validation loss = 0.05427569895982742
Validation loss = 0.05563487112522125
Validation loss = 0.05948561802506447
Validation loss = 0.05686699226498604
Validation loss = 0.048979032784700394
Validation loss = 0.0474429652094841
Validation loss = 0.05493781715631485
Validation loss = 0.04826367273926735
Validation loss = 0.04762304574251175
Validation loss = 0.05539432913064957
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.2    |
| Iteration     | 1        |
| MaximumReturn | -0.0917  |
| MinimumReturn | -55.8    |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16126248240470886
Validation loss = 0.06950194388628006
Validation loss = 0.05115896090865135
Validation loss = 0.03610964119434357
Validation loss = 0.03145309537649155
Validation loss = 0.02815442718565464
Validation loss = 0.027042493224143982
Validation loss = 0.026854926720261574
Validation loss = 0.02827826887369156
Validation loss = 0.02392018400132656
Validation loss = 0.025340981781482697
Validation loss = 0.024994386360049248
Validation loss = 0.023028327152132988
Validation loss = 0.0264874380081892
Validation loss = 0.019472938030958176
Validation loss = 0.027387181296944618
Validation loss = 0.02225027047097683
Validation loss = 0.02364979311823845
Validation loss = 0.019992854446172714
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15788975358009338
Validation loss = 0.06410229206085205
Validation loss = 0.0416618175804615
Validation loss = 0.035876475274562836
Validation loss = 0.04425430670380592
Validation loss = 0.030819717794656754
Validation loss = 0.031962692737579346
Validation loss = 0.03255264088511467
Validation loss = 0.032259948551654816
Validation loss = 0.028579527512192726
Validation loss = 0.027204662561416626
Validation loss = 0.025847554206848145
Validation loss = 0.028405100107192993
Validation loss = 0.03001982904970646
Validation loss = 0.02697903662919998
Validation loss = 0.02544851042330265
Validation loss = 0.026782969012856483
Validation loss = 0.024521563202142715
Validation loss = 0.024635041132569313
Validation loss = 0.023321354761719704
Validation loss = 0.02368738315999508
Validation loss = 0.025938477367162704
Validation loss = 0.028445791453123093
Validation loss = 0.02477436326444149
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.18012207746505737
Validation loss = 0.06764468550682068
Validation loss = 0.048790767788887024
Validation loss = 0.03624315187335014
Validation loss = 0.03614436462521553
Validation loss = 0.03992462158203125
Validation loss = 0.02866652049124241
Validation loss = 0.02917972207069397
Validation loss = 0.031804606318473816
Validation loss = 0.027241628617048264
Validation loss = 0.024652574211359024
Validation loss = 0.031967587769031525
Validation loss = 0.028886444866657257
Validation loss = 0.02763380855321884
Validation loss = 0.023199724033474922
Validation loss = 0.024009715765714645
Validation loss = 0.024224234744906425
Validation loss = 0.02617831528186798
Validation loss = 0.023218419402837753
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.14697414636611938
Validation loss = 0.06523376703262329
Validation loss = 0.06131616234779358
Validation loss = 0.04349219426512718
Validation loss = 0.04621012508869171
Validation loss = 0.037985458970069885
Validation loss = 0.03340180590748787
Validation loss = 0.029282087460160255
Validation loss = 0.03563982993364334
Validation loss = 0.02931150048971176
Validation loss = 0.026525499299168587
Validation loss = 0.026514437049627304
Validation loss = 0.02617054618895054
Validation loss = 0.02633826620876789
Validation loss = 0.03651370853185654
Validation loss = 0.02819700352847576
Validation loss = 0.0259141456335783
Validation loss = 0.029948560521006584
Validation loss = 0.02899234928190708
Validation loss = 0.02939474955201149
Validation loss = 0.037121936678886414
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.18210582435131073
Validation loss = 0.07311991602182388
Validation loss = 0.044366758316755295
Validation loss = 0.03693859279155731
Validation loss = 0.042711593210697174
Validation loss = 0.028554854914546013
Validation loss = 0.02699676714837551
Validation loss = 0.03009379655122757
Validation loss = 0.03089871071279049
Validation loss = 0.026586301624774933
Validation loss = 0.027253631502389908
Validation loss = 0.027028562501072884
Validation loss = 0.02750968188047409
Validation loss = 0.02944251522421837
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0814  |
| Iteration     | 2        |
| MaximumReturn | -0.0451  |
| MinimumReturn | -0.124   |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06334920972585678
Validation loss = 0.03739584609866142
Validation loss = 0.029499372467398643
Validation loss = 0.02507583051919937
Validation loss = 0.02263973094522953
Validation loss = 0.02218407206237316
Validation loss = 0.018669920042157173
Validation loss = 0.016536589711904526
Validation loss = 0.014497396536171436
Validation loss = 0.015579256229102612
Validation loss = 0.013703994452953339
Validation loss = 0.015666009858250618
Validation loss = 0.01798294112086296
Validation loss = 0.01912958361208439
Validation loss = 0.01646440103650093
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07054369151592255
Validation loss = 0.044039249420166016
Validation loss = 0.03452857956290245
Validation loss = 0.03122319094836712
Validation loss = 0.024060489609837532
Validation loss = 0.022513320669531822
Validation loss = 0.020651258528232574
Validation loss = 0.02725082077085972
Validation loss = 0.020137205719947815
Validation loss = 0.017566364258527756
Validation loss = 0.017172308638691902
Validation loss = 0.018830610439181328
Validation loss = 0.015480313450098038
Validation loss = 0.017189977690577507
Validation loss = 0.015998857095837593
Validation loss = 0.016840605065226555
Validation loss = 0.027020657435059547
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06823913753032684
Validation loss = 0.0365912988781929
Validation loss = 0.028825193643569946
Validation loss = 0.02823673002421856
Validation loss = 0.028070660308003426
Validation loss = 0.029115891084074974
Validation loss = 0.02324904501438141
Validation loss = 0.016837073490023613
Validation loss = 0.019014757126569748
Validation loss = 0.020457981154322624
Validation loss = 0.01517513394355774
Validation loss = 0.016000082716345787
Validation loss = 0.014580209739506245
Validation loss = 0.020600011572241783
Validation loss = 0.022538533434271812
Validation loss = 0.022288622334599495
Validation loss = 0.017720526084303856
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08304227888584137
Validation loss = 0.039546575397253036
Validation loss = 0.03201116621494293
Validation loss = 0.033143050968647
Validation loss = 0.028375402092933655
Validation loss = 0.025480711832642555
Validation loss = 0.02079792320728302
Validation loss = 0.020313570275902748
Validation loss = 0.01624021679162979
Validation loss = 0.017302261665463448
Validation loss = 0.018798625096678734
Validation loss = 0.020469367504119873
Validation loss = 0.015925167128443718
Validation loss = 0.018107417970895767
Validation loss = 0.019033828750252724
Validation loss = 0.0171137023717165
Validation loss = 0.021520329639315605
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0643649771809578
Validation loss = 0.03198664262890816
Validation loss = 0.04201417788863182
Validation loss = 0.030901579186320305
Validation loss = 0.022395605221390724
Validation loss = 0.025585899129509926
Validation loss = 0.022878015413880348
Validation loss = 0.019425030797719955
Validation loss = 0.0192823838442564
Validation loss = 0.01822221837937832
Validation loss = 0.017432792112231255
Validation loss = 0.017849691212177277
Validation loss = 0.019090091809630394
Validation loss = 0.024459095671772957
Validation loss = 0.020346995443105698
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -76.2    |
| Iteration     | 3        |
| MaximumReturn | -62.8    |
| MinimumReturn | -88      |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0899200439453125
Validation loss = 0.024364201352000237
Validation loss = 0.021468929946422577
Validation loss = 0.018591508269309998
Validation loss = 0.022480256855487823
Validation loss = 0.015696920454502106
Validation loss = 0.01102030836045742
Validation loss = 0.014862103387713432
Validation loss = 0.013591127470135689
Validation loss = 0.019677449017763138
Validation loss = 0.016205592080950737
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09212661534547806
Validation loss = 0.02642536349594593
Validation loss = 0.023979730904102325
Validation loss = 0.02297602966427803
Validation loss = 0.014960170723497868
Validation loss = 0.01686708815395832
Validation loss = 0.016428213566541672
Validation loss = 0.01769864745438099
Validation loss = 0.015460369177162647
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08529465645551682
Validation loss = 0.02723466232419014
Validation loss = 0.02747703157365322
Validation loss = 0.013203947804868221
Validation loss = 0.014597665518522263
Validation loss = 0.013981476426124573
Validation loss = 0.013518230058252811
Validation loss = 0.017296073958277702
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07615295797586441
Validation loss = 0.023533780127763748
Validation loss = 0.018188226968050003
Validation loss = 0.015071138739585876
Validation loss = 0.01490864809602499
Validation loss = 0.015722107142210007
Validation loss = 0.01999499276280403
Validation loss = 0.01241169311106205
Validation loss = 0.01592734269797802
Validation loss = 0.021382713690400124
Validation loss = 0.013876433484256268
Validation loss = 0.017511066049337387
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07577648758888245
Validation loss = 0.0386873334646225
Validation loss = 0.022709853947162628
Validation loss = 0.02736923284828663
Validation loss = 0.021225981414318085
Validation loss = 0.017874039709568024
Validation loss = 0.015061480924487114
Validation loss = 0.013393071480095387
Validation loss = 0.012690374627709389
Validation loss = 0.013845507055521011
Validation loss = 0.014937951229512691
Validation loss = 0.013866003602743149
Validation loss = 0.01758863590657711
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0367  |
| Iteration     | 4        |
| MaximumReturn | -0.0218  |
| MinimumReturn | -0.0612  |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024140072986483574
Validation loss = 0.011951036751270294
Validation loss = 0.01081294845789671
Validation loss = 0.012297664768993855
Validation loss = 0.016375713050365448
Validation loss = 0.011685755103826523
Validation loss = 0.013308966532349586
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03713160380721092
Validation loss = 0.017580505460500717
Validation loss = 0.015687312930822372
Validation loss = 0.014517394825816154
Validation loss = 0.01269491296261549
Validation loss = 0.012679001316428185
Validation loss = 0.01663237437605858
Validation loss = 0.00996195524930954
Validation loss = 0.009433645755052567
Validation loss = 0.008771074004471302
Validation loss = 0.013529345393180847
Validation loss = 0.011403164826333523
Validation loss = 0.011552847921848297
Validation loss = 0.010737048462033272
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03840230777859688
Validation loss = 0.01690341904759407
Validation loss = 0.013436229899525642
Validation loss = 0.01227917056530714
Validation loss = 0.013058009557425976
Validation loss = 0.011202670633792877
Validation loss = 0.016083741560578346
Validation loss = 0.010300537571310997
Validation loss = 0.01097569614648819
Validation loss = 0.01361885853111744
Validation loss = 0.011961622163653374
Validation loss = 0.013673779554665089
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04876638203859329
Validation loss = 0.01456435490399599
Validation loss = 0.010090845637023449
Validation loss = 0.010182792320847511
Validation loss = 0.014890564605593681
Validation loss = 0.01991647481918335
Validation loss = 0.010729256086051464
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025109341368079185
Validation loss = 0.014174522832036018
Validation loss = 0.015138551592826843
Validation loss = 0.009740939363837242
Validation loss = 0.01666450873017311
Validation loss = 0.015274430625140667
Validation loss = 0.009666371159255505
Validation loss = 0.01415208913385868
Validation loss = 0.00892996322363615
Validation loss = 0.012305567972362041
Validation loss = 0.009704618714749813
Validation loss = 0.010568075813353062
Validation loss = 0.011201759800314903
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00246 |
| Iteration     | 5        |
| MaximumReturn | -0.00184 |
| MinimumReturn | -0.00334 |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023151131346821785
Validation loss = 0.020477643236517906
Validation loss = 0.012849141843616962
Validation loss = 0.015380516648292542
Validation loss = 0.014255678281188011
Validation loss = 0.0111786387860775
Validation loss = 0.016708459705114365
Validation loss = 0.020107755437493324
Validation loss = 0.010655263438820839
Validation loss = 0.017231421545147896
Validation loss = 0.010676143690943718
Validation loss = 0.0101807015016675
Validation loss = 0.011841995641589165
Validation loss = 0.012739025056362152
Validation loss = 0.013171682134270668
Validation loss = 0.01795823499560356
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03692792356014252
Validation loss = 0.02149587869644165
Validation loss = 0.01311108935624361
Validation loss = 0.011087555438280106
Validation loss = 0.010400711558759212
Validation loss = 0.010135508142411709
Validation loss = 0.012141683138906956
Validation loss = 0.009874512441456318
Validation loss = 0.013193096034228802
Validation loss = 0.019074197858572006
Validation loss = 0.013800305314362049
Validation loss = 0.011994448490440845
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.028238123282790184
Validation loss = 0.014538397081196308
Validation loss = 0.01743246428668499
Validation loss = 0.011798867955803871
Validation loss = 0.013566868379712105
Validation loss = 0.011391092091798782
Validation loss = 0.011074969545006752
Validation loss = 0.012391520664095879
Validation loss = 0.012581129558384418
Validation loss = 0.012400883249938488
Validation loss = 0.01248880010098219
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028255388140678406
Validation loss = 0.015149945393204689
Validation loss = 0.016073204576969147
Validation loss = 0.019883273169398308
Validation loss = 0.020546264946460724
Validation loss = 0.016154546290636063
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024098778143525124
Validation loss = 0.013626250438392162
Validation loss = 0.02295023202896118
Validation loss = 0.01461813785135746
Validation loss = 0.012501435354351997
Validation loss = 0.012936802580952644
Validation loss = 0.01346365176141262
Validation loss = 0.009638144634664059
Validation loss = 0.014677206985652447
Validation loss = 0.014855829067528248
Validation loss = 0.015865983441472054
Validation loss = 0.009439641609787941
Validation loss = 0.010962235741317272
Validation loss = 0.01178382895886898
Validation loss = 0.013715294189751148
Validation loss = 0.02416529878973961
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.3    |
| Iteration     | 6        |
| MaximumReturn | -2.04    |
| MinimumReturn | -38.7    |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.056284308433532715
Validation loss = 0.017334679141640663
Validation loss = 0.011921174824237823
Validation loss = 0.010909738950431347
Validation loss = 0.007435890380293131
Validation loss = 0.008702341467142105
Validation loss = 0.009377294220030308
Validation loss = 0.007489137351512909
Validation loss = 0.008731216192245483
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024321356788277626
Validation loss = 0.016094056889414787
Validation loss = 0.012557592242956161
Validation loss = 0.01572064310312271
Validation loss = 0.012381009757518768
Validation loss = 0.009242676198482513
Validation loss = 0.013143260031938553
Validation loss = 0.0071991183795034885
Validation loss = 0.007986619137227535
Validation loss = 0.008659693412482738
Validation loss = 0.008680055849254131
Validation loss = 0.007804907392710447
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02779044397175312
Validation loss = 0.017713451758027077
Validation loss = 0.010068429633975029
Validation loss = 0.010382880456745625
Validation loss = 0.011139101348817348
Validation loss = 0.010536734014749527
Validation loss = 0.009366304613649845
Validation loss = 0.007757498417049646
Validation loss = 0.008016909472644329
Validation loss = 0.010050185956060886
Validation loss = 0.010104346089065075
Validation loss = 0.007826360873878002
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02923324704170227
Validation loss = 0.011322869919240475
Validation loss = 0.013028805144131184
Validation loss = 0.009110629558563232
Validation loss = 0.009527484886348248
Validation loss = 0.013598773628473282
Validation loss = 0.006584984716027975
Validation loss = 0.008610849268734455
Validation loss = 0.010435335338115692
Validation loss = 0.012706263922154903
Validation loss = 0.010317075066268444
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02931736409664154
Validation loss = 0.011399536393582821
Validation loss = 0.0114860525354743
Validation loss = 0.008447226136922836
Validation loss = 0.007089992519468069
Validation loss = 0.01052702497690916
Validation loss = 0.008168712258338928
Validation loss = 0.01177956908941269
Validation loss = 0.009476670064032078
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00473 |
| Iteration     | 7        |
| MaximumReturn | -0.00331 |
| MinimumReturn | -0.00648 |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0139860725030303
Validation loss = 0.00939472671598196
Validation loss = 0.011596680618822575
Validation loss = 0.00655388506129384
Validation loss = 0.011967959813773632
Validation loss = 0.007924952544271946
Validation loss = 0.01023781392723322
Validation loss = 0.006495047360658646
Validation loss = 0.006029009353369474
Validation loss = 0.005976065061986446
Validation loss = 0.005392218939960003
Validation loss = 0.006876833736896515
Validation loss = 0.015365121886134148
Validation loss = 0.009306615218520164
Validation loss = 0.00781445112079382
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016665143892169
Validation loss = 0.010097245685756207
Validation loss = 0.012769794091582298
Validation loss = 0.008820191957056522
Validation loss = 0.008687066845595837
Validation loss = 0.006690367124974728
Validation loss = 0.007729717530310154
Validation loss = 0.0076240720227360725
Validation loss = 0.008641734719276428
Validation loss = 0.006836907472461462
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018712850287556648
Validation loss = 0.00754379341378808
Validation loss = 0.0068921176716685295
Validation loss = 0.006761101074516773
Validation loss = 0.009014873765408993
Validation loss = 0.007092664483934641
Validation loss = 0.010335521772503853
Validation loss = 0.005647672805935144
Validation loss = 0.00956790428608656
Validation loss = 0.007284361403435469
Validation loss = 0.0051902830600738525
Validation loss = 0.005922759883105755
Validation loss = 0.0074869319796562195
Validation loss = 0.008746304549276829
Validation loss = 0.007032269146293402
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02054317109286785
Validation loss = 0.012440972961485386
Validation loss = 0.009251922369003296
Validation loss = 0.008837701752781868
Validation loss = 0.008427472785115242
Validation loss = 0.009008790366351604
Validation loss = 0.007805164437741041
Validation loss = 0.008037121035158634
Validation loss = 0.008068260736763477
Validation loss = 0.007044653873890638
Validation loss = 0.011343784630298615
Validation loss = 0.009220014326274395
Validation loss = 0.007788870949298143
Validation loss = 0.007077690679579973
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016874659806489944
Validation loss = 0.007418492343276739
Validation loss = 0.006657599471509457
Validation loss = 0.0059066638350486755
Validation loss = 0.007026395760476589
Validation loss = 0.008582348935306072
Validation loss = 0.009369009174406528
Validation loss = 0.006572443060576916
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000722 |
| Iteration     | 8         |
| MaximumReturn | -0.000533 |
| MinimumReturn | -0.000921 |
| TotalSamples  | 16660     |
-----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011873632669448853
Validation loss = 0.010830249637365341
Validation loss = 0.007430045399814844
Validation loss = 0.007602979429066181
Validation loss = 0.007962986826896667
Validation loss = 0.007088212296366692
Validation loss = 0.007679162546992302
Validation loss = 0.011652540415525436
Validation loss = 0.007365613244473934
Validation loss = 0.00768271554261446
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010639376938343048
Validation loss = 0.015251779928803444
Validation loss = 0.005820105783641338
Validation loss = 0.007971878163516521
Validation loss = 0.006799686700105667
Validation loss = 0.00991920568048954
Validation loss = 0.006027162075042725
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03212282061576843
Validation loss = 0.007888924330472946
Validation loss = 0.009812446311116219
Validation loss = 0.007341704796999693
Validation loss = 0.008327394723892212
Validation loss = 0.01306264940649271
Validation loss = 0.00915761012583971
Validation loss = 0.007671534549444914
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022246623411774635
Validation loss = 0.011798348277807236
Validation loss = 0.008648766204714775
Validation loss = 0.008327843621373177
Validation loss = 0.006745520047843456
Validation loss = 0.00892209354788065
Validation loss = 0.009616578929126263
Validation loss = 0.006083841901272535
Validation loss = 0.008077248930931091
Validation loss = 0.009432127699255943
Validation loss = 0.00702630914747715
Validation loss = 0.007866347208619118
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021382136270403862
Validation loss = 0.008377531543374062
Validation loss = 0.010265139862895012
Validation loss = 0.006933433935046196
Validation loss = 0.010248521342873573
Validation loss = 0.005676347762346268
Validation loss = 0.006402713246643543
Validation loss = 0.0073548867367208
Validation loss = 0.007283326238393784
Validation loss = 0.01112744864076376
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -28.7    |
| Iteration     | 9        |
| MaximumReturn | -0.98    |
| MinimumReturn | -53.9    |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011731564998626709
Validation loss = 0.0034605716355144978
Validation loss = 0.00593513622879982
Validation loss = 0.005066880024969578
Validation loss = 0.006410462316125631
Validation loss = 0.004246935248374939
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01807226985692978
Validation loss = 0.005250274203717709
Validation loss = 0.007046968676149845
Validation loss = 0.005535167176276445
Validation loss = 0.004167587496340275
Validation loss = 0.004926503635942936
Validation loss = 0.0034117004834115505
Validation loss = 0.0032138812821358442
Validation loss = 0.003991070669144392
Validation loss = 0.0037434047553688288
Validation loss = 0.004742107354104519
Validation loss = 0.0032606939785182476
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02613859623670578
Validation loss = 0.006162818055599928
Validation loss = 0.004072623793035746
Validation loss = 0.004786377307027578
Validation loss = 0.0066079613752663136
Validation loss = 0.007014537695795298
Validation loss = 0.004888677038252354
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010850254446268082
Validation loss = 0.006463344674557447
Validation loss = 0.009649019688367844
Validation loss = 0.003813035786151886
Validation loss = 0.003907834645360708
Validation loss = 0.003804995445534587
Validation loss = 0.00578464288264513
Validation loss = 0.003898545168340206
Validation loss = 0.004984197206795216
Validation loss = 0.004064752720296383
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01485578902065754
Validation loss = 0.0046100798062980175
Validation loss = 0.006115390919148922
Validation loss = 0.006332757882773876
Validation loss = 0.006811255589127541
Validation loss = 0.007419290021061897
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.71    |
| Iteration     | 10       |
| MaximumReturn | -0.0193  |
| MinimumReturn | -29.4    |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010613670572638512
Validation loss = 0.003567104460671544
Validation loss = 0.0032036255579441786
Validation loss = 0.003401621710509062
Validation loss = 0.003473010379821062
Validation loss = 0.0030222616624087095
Validation loss = 0.004992201924324036
Validation loss = 0.0025424621999263763
Validation loss = 0.004861524794250727
Validation loss = 0.004058650694787502
Validation loss = 0.007003744598478079
Validation loss = 0.004273091908544302
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007751059718430042
Validation loss = 0.003162394743412733
Validation loss = 0.0035636108368635178
Validation loss = 0.003579074051231146
Validation loss = 0.003322314005345106
Validation loss = 0.003234614385291934
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007687520235776901
Validation loss = 0.0035886429250240326
Validation loss = 0.0034524723887443542
Validation loss = 0.0032672262750566006
Validation loss = 0.00556552316993475
Validation loss = 0.003933933563530445
Validation loss = 0.0030641118064522743
Validation loss = 0.003083871677517891
Validation loss = 0.005341990850865841
Validation loss = 0.006535469088703394
Validation loss = 0.004491758998483419
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0061481427401304245
Validation loss = 0.0038160749245435
Validation loss = 0.00585005572065711
Validation loss = 0.005935365334153175
Validation loss = 0.0034761473070830107
Validation loss = 0.003781113773584366
Validation loss = 0.0037092375569045544
Validation loss = 0.00835917703807354
Validation loss = 0.0040648444555699825
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007593845017254353
Validation loss = 0.004292258061468601
Validation loss = 0.003474454628303647
Validation loss = 0.005585546139627695
Validation loss = 0.0054869940504431725
Validation loss = 0.0036948651541024446
Validation loss = 0.003430598182603717
Validation loss = 0.003353603882715106
Validation loss = 0.003421356435865164
Validation loss = 0.004974684212356806
Validation loss = 0.003663555486127734
Validation loss = 0.006199546158313751
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00283 |
| Iteration     | 11       |
| MaximumReturn | -0.002   |
| MinimumReturn | -0.00486 |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007698418106883764
Validation loss = 0.005987465847283602
Validation loss = 0.005624721292406321
Validation loss = 0.004015828482806683
Validation loss = 0.004752777516841888
Validation loss = 0.0032116312067955732
Validation loss = 0.0035895654000341892
Validation loss = 0.007076153997331858
Validation loss = 0.004225163720548153
Validation loss = 0.00458918884396553
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009244272485375404
Validation loss = 0.003256030846387148
Validation loss = 0.0038118832744657993
Validation loss = 0.0028841241728514433
Validation loss = 0.004650302231311798
Validation loss = 0.005219220649451017
Validation loss = 0.007446193136274815
Validation loss = 0.0036976109258830547
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005790782161056995
Validation loss = 0.0034969530533999205
Validation loss = 0.005346641410142183
Validation loss = 0.003998822532594204
Validation loss = 0.003520125988870859
Validation loss = 0.007234188728034496
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010009357705712318
Validation loss = 0.0036572597455233335
Validation loss = 0.00469982111826539
Validation loss = 0.004844091832637787
Validation loss = 0.005095750093460083
Validation loss = 0.005579597316682339
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010478511452674866
Validation loss = 0.004883986432105303
Validation loss = 0.00400902796536684
Validation loss = 0.004286034498363733
Validation loss = 0.0038305274210870266
Validation loss = 0.006736348383128643
Validation loss = 0.004587390925735235
Validation loss = 0.005304201506078243
Validation loss = 0.0036251551937311888
Validation loss = 0.006019934080541134
Validation loss = 0.003296139882877469
Validation loss = 0.004189531784504652
Validation loss = 0.004510910250246525
Validation loss = 0.004636307246983051
Validation loss = 0.003865036414936185
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0179  |
| Iteration     | 12       |
| MaximumReturn | -0.0101  |
| MinimumReturn | -0.0296  |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004530167207121849
Validation loss = 0.00408687861636281
Validation loss = 0.003009019885212183
Validation loss = 0.004749544430524111
Validation loss = 0.006328259129077196
Validation loss = 0.003950745332986116
Validation loss = 0.003748505376279354
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0064134420827031136
Validation loss = 0.002958411816507578
Validation loss = 0.005528313107788563
Validation loss = 0.004387043882161379
Validation loss = 0.003208663547411561
Validation loss = 0.004964753519743681
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005736845545470715
Validation loss = 0.004359630402177572
Validation loss = 0.007560029625892639
Validation loss = 0.008081731386482716
Validation loss = 0.007984962314367294
Validation loss = 0.00600913492962718
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004450891632586718
Validation loss = 0.00324263796210289
Validation loss = 0.0029173665679991245
Validation loss = 0.003127182135358453
Validation loss = 0.007184701040387154
Validation loss = 0.003212942276149988
Validation loss = 0.004198212176561356
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004701714497059584
Validation loss = 0.00648722006008029
Validation loss = 0.00472286157310009
Validation loss = 0.010608485899865627
Validation loss = 0.0031764814630150795
Validation loss = 0.0036315356846898794
Validation loss = 0.003192583564668894
Validation loss = 0.007551382761448622
Validation loss = 0.002993809524923563
Validation loss = 0.004403413273394108
Validation loss = 0.00804919097572565
Validation loss = 0.004619956016540527
Validation loss = 0.0032249726355075836
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00331 |
| Iteration     | 13       |
| MaximumReturn | -0.00234 |
| MinimumReturn | -0.00557 |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005389917176216841
Validation loss = 0.004573782440274954
Validation loss = 0.0031437447760254145
Validation loss = 0.0024146249052137136
Validation loss = 0.004646574147045612
Validation loss = 0.0036155099514871836
Validation loss = 0.009808706119656563
Validation loss = 0.003683290211483836
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008173325099050999
Validation loss = 0.004802441690117121
Validation loss = 0.003815705655142665
Validation loss = 0.003684118390083313
Validation loss = 0.0036297759506851435
Validation loss = 0.00495810667052865
Validation loss = 0.006577875930815935
Validation loss = 0.006429274100810289
Validation loss = 0.002712944522500038
Validation loss = 0.006027962546795607
Validation loss = 0.0035467890556901693
Validation loss = 0.0052221002988517284
Validation loss = 0.009637298062443733
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004461116157472134
Validation loss = 0.003090566722676158
Validation loss = 0.003030800959095359
Validation loss = 0.0036173963453620672
Validation loss = 0.004009503871202469
Validation loss = 0.0040118093602359295
Validation loss = 0.00351294525898993
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00481355469673872
Validation loss = 0.004338912200182676
Validation loss = 0.005695665255188942
Validation loss = 0.004671140108257532
Validation loss = 0.004177786875516176
Validation loss = 0.004632629919797182
Validation loss = 0.0050332932732999325
Validation loss = 0.003115357831120491
Validation loss = 0.004495636094361544
Validation loss = 0.0046549285762012005
Validation loss = 0.004066911526024342
Validation loss = 0.0027838547248393297
Validation loss = 0.003889671294018626
Validation loss = 0.004762334283441305
Validation loss = 0.003711917670443654
Validation loss = 0.0052475705742836
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0056404550559818745
Validation loss = 0.004187431652098894
Validation loss = 0.0027642592322081327
Validation loss = 0.003931507933884859
Validation loss = 0.004035170655697584
Validation loss = 0.003324012504890561
Validation loss = 0.004214734770357609
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -82.6    |
| Iteration     | 14       |
| MaximumReturn | -9.6     |
| MinimumReturn | -114     |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009108582511544228
Validation loss = 0.0025112556759268045
Validation loss = 0.0019353247480466962
Validation loss = 0.0031569520942866802
Validation loss = 0.003138603875413537
Validation loss = 0.0025913226418197155
Validation loss = 0.001986816758289933
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010347447358071804
Validation loss = 0.002212081104516983
Validation loss = 0.002875406062230468
Validation loss = 0.002966596744954586
Validation loss = 0.002440524520352483
Validation loss = 0.0022732997313141823
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0077146608382463455
Validation loss = 0.0033600127790123224
Validation loss = 0.0027132919058203697
Validation loss = 0.0038262854795902967
Validation loss = 0.003278185846284032
Validation loss = 0.0026946833822876215
Validation loss = 0.0024687822442501783
Validation loss = 0.0020028133876621723
Validation loss = 0.004331703297793865
Validation loss = 0.0030484634917229414
Validation loss = 0.003085640724748373
Validation loss = 0.002749477978795767
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010041046887636185
Validation loss = 0.001947593642398715
Validation loss = 0.0027316112536937
Validation loss = 0.0018404535949230194
Validation loss = 0.002505742944777012
Validation loss = 0.002026290399953723
Validation loss = 0.0027265266980975866
Validation loss = 0.0034252130426466465
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008085480891168118
Validation loss = 0.002924617612734437
Validation loss = 0.0023646694608032703
Validation loss = 0.001808545901440084
Validation loss = 0.0027029605116695166
Validation loss = 0.002385735046118498
Validation loss = 0.0033440531697124243
Validation loss = 0.0032793721184134483
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14      |
| Iteration     | 15       |
| MaximumReturn | -0.054   |
| MinimumReturn | -55.7    |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0038068571593612432
Validation loss = 0.00371379847638309
Validation loss = 0.002087407046929002
Validation loss = 0.00301849702373147
Validation loss = 0.0035898892674595118
Validation loss = 0.002808116842061281
Validation loss = 0.0021533886902034283
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0022295736707746983
Validation loss = 0.0016846404178068042
Validation loss = 0.0024453350342810154
Validation loss = 0.003688346827402711
Validation loss = 0.0022519712802022696
Validation loss = 0.002612466225400567
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002707216888666153
Validation loss = 0.001805519568733871
Validation loss = 0.0034518451429903507
Validation loss = 0.0026961322873830795
Validation loss = 0.003055667970329523
Validation loss = 0.0030780662782490253
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004257539752870798
Validation loss = 0.002563073066994548
Validation loss = 0.002093955408781767
Validation loss = 0.003001820994541049
Validation loss = 0.002870469819754362
Validation loss = 0.0015283901011571288
Validation loss = 0.0026902048848569393
Validation loss = 0.0018452147487550974
Validation loss = 0.0029889887664467096
Validation loss = 0.0017112174537032843
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004064266569912434
Validation loss = 0.001970407785847783
Validation loss = 0.0023787703830748796
Validation loss = 0.0020709438249468803
Validation loss = 0.0025956020690500736
Validation loss = 0.00267407251521945
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0194  |
| Iteration     | 16       |
| MaximumReturn | -0.00894 |
| MinimumReturn | -0.041   |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0035712874960154295
Validation loss = 0.0018948515644297004
Validation loss = 0.00198556762188673
Validation loss = 0.0018402714049443603
Validation loss = 0.0019064840162172914
Validation loss = 0.012585056945681572
Validation loss = 0.0032344877254217863
Validation loss = 0.0024130409583449364
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0027827159501612186
Validation loss = 0.003003828227519989
Validation loss = 0.0026054726913571358
Validation loss = 0.002117191907018423
Validation loss = 0.002160470699891448
Validation loss = 0.004138392396271229
Validation loss = 0.0018550916574895382
Validation loss = 0.0028105631936341524
Validation loss = 0.001935182954184711
Validation loss = 0.002920405473560095
Validation loss = 0.0024608962703496218
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004170806147158146
Validation loss = 0.0019124724203720689
Validation loss = 0.002024234039708972
Validation loss = 0.002032494405284524
Validation loss = 0.0050277807749807835
Validation loss = 0.0020170065108686686
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002251502126455307
Validation loss = 0.003913619089871645
Validation loss = 0.0026954850181937218
Validation loss = 0.0017218353459611535
Validation loss = 0.0025176024064421654
Validation loss = 0.0019505020463839173
Validation loss = 0.002198499161750078
Validation loss = 0.0038889399729669094
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002906359964981675
Validation loss = 0.001570066437125206
Validation loss = 0.0025869095697999
Validation loss = 0.0033665322698652744
Validation loss = 0.0025759779382497072
Validation loss = 0.002194944303482771
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.806   |
| Iteration     | 17       |
| MaximumReturn | -0.0223  |
| MinimumReturn | -7.54    |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0029837351758033037
Validation loss = 0.001938355970196426
Validation loss = 0.003163054818287492
Validation loss = 0.0026738557498902082
Validation loss = 0.001988823525607586
Validation loss = 0.006151353474706411
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0033807852305471897
Validation loss = 0.0024220754858106375
Validation loss = 0.004190200008451939
Validation loss = 0.00258838664740324
Validation loss = 0.0035524743143469095
Validation loss = 0.0037434992846101522
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0019971695728600025
Validation loss = 0.0029757069423794746
Validation loss = 0.004416218027472496
Validation loss = 0.00205942802131176
Validation loss = 0.002619836013764143
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0033694154117256403
Validation loss = 0.0040871454402804375
Validation loss = 0.0019317188998684287
Validation loss = 0.0029714240226894617
Validation loss = 0.003076293971389532
Validation loss = 0.002470016712322831
Validation loss = 0.0013148096622899175
Validation loss = 0.001735846046358347
Validation loss = 0.0014662990579381585
Validation loss = 0.0018430794589221478
Validation loss = 0.00314074638299644
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003260668832808733
Validation loss = 0.0019836395513266325
Validation loss = 0.0027138637378811836
Validation loss = 0.0018662206130102277
Validation loss = 0.0033247924875468016
Validation loss = 0.0021144081838428974
Validation loss = 0.003627715865150094
Validation loss = 0.002487245248630643
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0634  |
| Iteration     | 18       |
| MaximumReturn | -0.00407 |
| MinimumReturn | -0.947   |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0041744038462638855
Validation loss = 0.0018983716145157814
Validation loss = 0.0020835299510508776
Validation loss = 0.0025348137132823467
Validation loss = 0.0023144641891121864
Validation loss = 0.0015863070730119944
Validation loss = 0.0028944213408976793
Validation loss = 0.0034931087866425514
Validation loss = 0.0017280944157391787
Validation loss = 0.004194996785372496
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0035375449806451797
Validation loss = 0.002205963945016265
Validation loss = 0.0017395804170519114
Validation loss = 0.0019405537750571966
Validation loss = 0.0028080132324248552
Validation loss = 0.0023351162672042847
Validation loss = 0.0023399973288178444
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0036314779426902533
Validation loss = 0.0022443393245339394
Validation loss = 0.0017147557809948921
Validation loss = 0.0020280766766518354
Validation loss = 0.0011750708799809217
Validation loss = 0.0027691780123859644
Validation loss = 0.0040298388339579105
Validation loss = 0.00502663291990757
Validation loss = 0.002038887469097972
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0032862285152077675
Validation loss = 0.0032635703682899475
Validation loss = 0.0018832746427506208
Validation loss = 0.0014100695261731744
Validation loss = 0.0017014897894114256
Validation loss = 0.004755857866257429
Validation loss = 0.002552267862483859
Validation loss = 0.0032557002268731594
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004462930373847485
Validation loss = 0.0023803808726370335
Validation loss = 0.004457451403141022
Validation loss = 0.004173834342509508
Validation loss = 0.0033070880454033613
Validation loss = 0.0036375275813043118
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0882  |
| Iteration     | 19       |
| MaximumReturn | -0.0198  |
| MinimumReturn | -1.31    |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00246712239459157
Validation loss = 0.001953314058482647
Validation loss = 0.0039017656818032265
Validation loss = 0.002577031496912241
Validation loss = 0.002292400924488902
Validation loss = 0.0013677654787898064
Validation loss = 0.0018215639283880591
Validation loss = 0.002441845368593931
Validation loss = 0.0016658591339364648
Validation loss = 0.00188163120765239
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00263953092508018
Validation loss = 0.002317726844921708
Validation loss = 0.0024485569447278976
Validation loss = 0.0027580824680626392
Validation loss = 0.00460772542282939
Validation loss = 0.002483814023435116
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0029695837292820215
Validation loss = 0.0030027779284864664
Validation loss = 0.0019315980607643723
Validation loss = 0.002649488393217325
Validation loss = 0.001732773263938725
Validation loss = 0.0019431771943345666
Validation loss = 0.0016451184637844563
Validation loss = 0.002933402080088854
Validation loss = 0.0029715795535594225
Validation loss = 0.0017287124646827579
Validation loss = 0.0033670475240796804
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0025352593511343002
Validation loss = 0.001645621843636036
Validation loss = 0.002695550210773945
Validation loss = 0.006219177506864071
Validation loss = 0.0015523137990385294
Validation loss = 0.0022407725919038057
Validation loss = 0.0022943969815969467
Validation loss = 0.0023945888970047235
Validation loss = 0.0016368830110877752
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002020331099629402
Validation loss = 0.002763622673228383
Validation loss = 0.0031606238335371017
Validation loss = 0.0026547955349087715
Validation loss = 0.0014146799221634865
Validation loss = 0.0026716459542512894
Validation loss = 0.002788420533761382
Validation loss = 0.0017710954416543245
Validation loss = 0.0014098382089287043
Validation loss = 0.002905998844653368
Validation loss = 0.0017346119275316596
Validation loss = 0.0016444966895505786
Validation loss = 0.002374600153416395
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.84    |
| Iteration     | 20       |
| MaximumReturn | -0.0755  |
| MinimumReturn | -38.5    |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0020269856322556734
Validation loss = 0.001596492133103311
Validation loss = 0.0020606289617717266
Validation loss = 0.0030292728915810585
Validation loss = 0.001457266160286963
Validation loss = 0.0017866366542875767
Validation loss = 0.0027704662643373013
Validation loss = 0.0024466931354254484
Validation loss = 0.0022394051775336266
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002468116581439972
Validation loss = 0.0019321292638778687
Validation loss = 0.0018526888452470303
Validation loss = 0.0020066718570888042
Validation loss = 0.0028765671886503696
Validation loss = 0.003093923209235072
Validation loss = 0.001643426832742989
Validation loss = 0.002348060254007578
Validation loss = 0.002046925015747547
Validation loss = 0.003033215180039406
Validation loss = 0.002839927328750491
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0030578994192183018
Validation loss = 0.001528374501504004
Validation loss = 0.001726214773952961
Validation loss = 0.0021464796736836433
Validation loss = 0.002075039781630039
Validation loss = 0.0016218144446611404
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0023316973820328712
Validation loss = 0.0020023260731250048
Validation loss = 0.003139035776257515
Validation loss = 0.0017086046282202005
Validation loss = 0.002991591114550829
Validation loss = 0.0019361848244443536
Validation loss = 0.0016091861762106419
Validation loss = 0.0017808324191719294
Validation loss = 0.0011090997140854597
Validation loss = 0.0022300772834569216
Validation loss = 0.007161793764680624
Validation loss = 0.0012273925822228193
Validation loss = 0.0019943295046687126
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023290158715099096
Validation loss = 0.001948328921571374
Validation loss = 0.0013076006434857845
Validation loss = 0.0020926145371049643
Validation loss = 0.0013405110221356153
Validation loss = 0.0020159585401415825
Validation loss = 0.001381235895678401
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -44      |
| Iteration     | 21       |
| MaximumReturn | -2.12    |
| MinimumReturn | -80.4    |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004297053907066584
Validation loss = 0.00141143798828125
Validation loss = 0.0012812449131160975
Validation loss = 0.002197437919676304
Validation loss = 0.0018197810277342796
Validation loss = 0.0022319199051707983
Validation loss = 0.002073204144835472
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0031492062844336033
Validation loss = 0.002272356068715453
Validation loss = 0.0018729782896116376
Validation loss = 0.004060511477291584
Validation loss = 0.0025493463035672903
Validation loss = 0.001483907806687057
Validation loss = 0.001745767891407013
Validation loss = 0.0013675327645614743
Validation loss = 0.0018228943226858974
Validation loss = 0.003750766161829233
Validation loss = 0.001242943573743105
Validation loss = 0.0015645246021449566
Validation loss = 0.0016374366823583841
Validation loss = 0.003554743714630604
Validation loss = 0.0017610725481063128
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002443802310153842
Validation loss = 0.0016512991860508919
Validation loss = 0.002195026259869337
Validation loss = 0.0016794375842437148
Validation loss = 0.0021160817705094814
Validation loss = 0.0017104996368288994
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0035072248429059982
Validation loss = 0.0021292208693921566
Validation loss = 0.0017376028699800372
Validation loss = 0.0016994207398965955
Validation loss = 0.0013891144189983606
Validation loss = 0.0022742673754692078
Validation loss = 0.0015056321863085032
Validation loss = 0.002232466358691454
Validation loss = 0.0014784808736294508
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003762255422770977
Validation loss = 0.0020823227241635323
Validation loss = 0.001497651101090014
Validation loss = 0.0017371824942529202
Validation loss = 0.0013223121641203761
Validation loss = 0.001400483539327979
Validation loss = 0.0027623027563095093
Validation loss = 0.002192482352256775
Validation loss = 0.0014580670977011323
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.901   |
| Iteration     | 22       |
| MaximumReturn | -0.0759  |
| MinimumReturn | -12.2    |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0019415861461311579
Validation loss = 0.0023994380608201027
Validation loss = 0.0026609189808368683
Validation loss = 0.0013954408932477236
Validation loss = 0.0013392267283052206
Validation loss = 0.0023334999568760395
Validation loss = 0.0017143754521384835
Validation loss = 0.002881254767999053
Validation loss = 0.0014357820618897676
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002072276547551155
Validation loss = 0.0017474530031904578
Validation loss = 0.0015652645379304886
Validation loss = 0.0015946499770507216
Validation loss = 0.001383841154165566
Validation loss = 0.0016149478033185005
Validation loss = 0.0016856479924172163
Validation loss = 0.0015879396814852953
Validation loss = 0.0013587885769084096
Validation loss = 0.001467130845412612
Validation loss = 0.003002955112606287
Validation loss = 0.0020115943625569344
Validation loss = 0.0021075347904115915
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021192359272390604
Validation loss = 0.001992898527532816
Validation loss = 0.0026331953704357147
Validation loss = 0.0037234225310385227
Validation loss = 0.002407156163826585
Validation loss = 0.00212140753865242
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017844156827777624
Validation loss = 0.0018314262852072716
Validation loss = 0.0017762720817700028
Validation loss = 0.0013385294005274773
Validation loss = 0.0009803564753383398
Validation loss = 0.0011196888517588377
Validation loss = 0.0024263157974928617
Validation loss = 0.0019105648389086127
Validation loss = 0.0011521835112944245
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002254158491268754
Validation loss = 0.0018477892735973
Validation loss = 0.0016851283144205809
Validation loss = 0.0014075611252337694
Validation loss = 0.0013676004018634558
Validation loss = 0.0024412393104285
Validation loss = 0.001347618061117828
Validation loss = 0.002426828257739544
Validation loss = 0.0017519363900646567
Validation loss = 0.0019908309914171696
Validation loss = 0.0014649064978584647
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.289   |
| Iteration     | 23       |
| MaximumReturn | -0.143   |
| MinimumReturn | -0.672   |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0017382174264639616
Validation loss = 0.001476697507314384
Validation loss = 0.0009982987539842725
Validation loss = 0.0012546597281470895
Validation loss = 0.0022833410184830427
Validation loss = 0.0013707445468753576
Validation loss = 0.0021761064417660236
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017373114824295044
Validation loss = 0.0015632843133062124
Validation loss = 0.0012229004641994834
Validation loss = 0.0015853883232921362
Validation loss = 0.002289061900228262
Validation loss = 0.0018365809228271246
Validation loss = 0.001040664385072887
Validation loss = 0.0028771706856787205
Validation loss = 0.0012474447721615434
Validation loss = 0.0026466739363968372
Validation loss = 0.0028695210348814726
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0020861371885985136
Validation loss = 0.0015895485412329435
Validation loss = 0.0016025335062295198
Validation loss = 0.0021364851854741573
Validation loss = 0.0021798457019031048
Validation loss = 0.001373570179566741
Validation loss = 0.0016853008419275284
Validation loss = 0.0012844030279666185
Validation loss = 0.0020622722804546356
Validation loss = 0.0014779067132622004
Validation loss = 0.0016488612163811922
Validation loss = 0.0014117027167230844
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011000527301803231
Validation loss = 0.001081524882465601
Validation loss = 0.0013071111170575023
Validation loss = 0.0009579510660842061
Validation loss = 0.0015425982419401407
Validation loss = 0.0009463407332077622
Validation loss = 0.0018188661197200418
Validation loss = 0.0021654486190527678
Validation loss = 0.0011557562975212932
Validation loss = 0.001450832118280232
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0033833489287644625
Validation loss = 0.0016666362062096596
Validation loss = 0.0019770849030464888
Validation loss = 0.0015804379945620894
Validation loss = 0.0015432641375809908
Validation loss = 0.002060051541775465
Validation loss = 0.001644658506847918
Validation loss = 0.0016003288328647614
Validation loss = 0.001884880824945867
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0239  |
| Iteration     | 24       |
| MaximumReturn | -0.0142  |
| MinimumReturn | -0.0377  |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018030216451734304
Validation loss = 0.0015030939830467105
Validation loss = 0.0018218948971480131
Validation loss = 0.0019019473111256957
Validation loss = 0.0012332635233178735
Validation loss = 0.001247410662472248
Validation loss = 0.0010063847294077277
Validation loss = 0.0014801848446950316
Validation loss = 0.0019641926046460867
Validation loss = 0.0018300213851034641
Validation loss = 0.0022064591757953167
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003862790297716856
Validation loss = 0.0011322339996695518
Validation loss = 0.0014340821653604507
Validation loss = 0.0023628671187907457
Validation loss = 0.0018996649887412786
Validation loss = 0.001520665129646659
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0026828486006706953
Validation loss = 0.0021576788276433945
Validation loss = 0.001342690084129572
Validation loss = 0.0017460394883528352
Validation loss = 0.003522921819239855
Validation loss = 0.0013255687663331628
Validation loss = 0.0014429745497182012
Validation loss = 0.0015562687767669559
Validation loss = 0.0020998925901949406
Validation loss = 0.0022325394675135612
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011937218951061368
Validation loss = 0.0012536576250568032
Validation loss = 0.0015743629774078727
Validation loss = 0.0013597827637568116
Validation loss = 0.001290090731345117
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001974875805899501
Validation loss = 0.0019069758709520102
Validation loss = 0.0014608430210500956
Validation loss = 0.0015827458119019866
Validation loss = 0.0017809572163969278
Validation loss = 0.0013597614597529173
Validation loss = 0.0018125353381037712
Validation loss = 0.0014876665081828833
Validation loss = 0.0017960096010938287
Validation loss = 0.001312685664743185
Validation loss = 0.0019827261567115784
Validation loss = 0.0016145601402968168
Validation loss = 0.0012383724097162485
Validation loss = 0.0014434000477194786
Validation loss = 0.001438880106434226
Validation loss = 0.002223936142399907
Validation loss = 0.0010954596800729632
Validation loss = 0.0014405656838789582
Validation loss = 0.0017102566780522466
Validation loss = 0.002085836371406913
Validation loss = 0.000982152996584773
Validation loss = 0.0020709712989628315
Validation loss = 0.0024922098964452744
Validation loss = 0.002179262461140752
Validation loss = 0.0012752619804814458
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.317   |
| Iteration     | 25       |
| MaximumReturn | -0.126   |
| MinimumReturn | -0.645   |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0017929759342223406
Validation loss = 0.0017891778843477368
Validation loss = 0.0011526449816301465
Validation loss = 0.002001506509259343
Validation loss = 0.001767593901604414
Validation loss = 0.0014137934194877744
Validation loss = 0.0016912904102355242
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013583581894636154
Validation loss = 0.0018441301072016358
Validation loss = 0.0018182304920628667
Validation loss = 0.002061178209260106
Validation loss = 0.0014476417563855648
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012711967574432492
Validation loss = 0.0018000808777287602
Validation loss = 0.0017589029157534242
Validation loss = 0.0014566052705049515
Validation loss = 0.001866822363808751
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001652927021495998
Validation loss = 0.001649786252528429
Validation loss = 0.0016889440594241023
Validation loss = 0.0011136968387290835
Validation loss = 0.0032662951853126287
Validation loss = 0.0010638912208378315
Validation loss = 0.0017125679878517985
Validation loss = 0.0016067373799160123
Validation loss = 0.0011025251587852836
Validation loss = 0.000919428828638047
Validation loss = 0.0010092036100104451
Validation loss = 0.0009859820129349828
Validation loss = 0.001516328426077962
Validation loss = 0.0016979563515633345
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002411868888884783
Validation loss = 0.0010876728920266032
Validation loss = 0.0021454000379890203
Validation loss = 0.0013463783543556929
Validation loss = 0.00229791272431612
Validation loss = 0.001521818689070642
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0407  |
| Iteration     | 26       |
| MaximumReturn | -0.0158  |
| MinimumReturn | -0.091   |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003699013963341713
Validation loss = 0.0015702889068052173
Validation loss = 0.0019013079581782222
Validation loss = 0.001385088311508298
Validation loss = 0.0022079020272940397
Validation loss = 0.0015963511541485786
Validation loss = 0.0018510643858462572
Validation loss = 0.0008269691024906933
Validation loss = 0.0013315678806975484
Validation loss = 0.0033547028433531523
Validation loss = 0.0011233907425776124
Validation loss = 0.00211516534909606
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012139190221205354
Validation loss = 0.00361223379150033
Validation loss = 0.0024096013512462378
Validation loss = 0.0013722111470997334
Validation loss = 0.001483777305111289
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001603608368895948
Validation loss = 0.001570753287523985
Validation loss = 0.002382600214332342
Validation loss = 0.000968913605902344
Validation loss = 0.0027788891457021236
Validation loss = 0.002759701805189252
Validation loss = 0.0018790139583870769
Validation loss = 0.0011081240372732282
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014022609684616327
Validation loss = 0.0014650881057605147
Validation loss = 0.0011999454582110047
Validation loss = 0.0011850737500935793
Validation loss = 0.0011664002668112516
Validation loss = 0.0016172457253560424
Validation loss = 0.0009909747168421745
Validation loss = 0.0012562043266370893
Validation loss = 0.001562655670568347
Validation loss = 0.000885068264324218
Validation loss = 0.0012540319003164768
Validation loss = 0.0008663823246024549
Validation loss = 0.0012676588958129287
Validation loss = 0.002052751136943698
Validation loss = 0.0008593830280005932
Validation loss = 0.0010531189618632197
Validation loss = 0.0010837512090802193
Validation loss = 0.002091651316732168
Validation loss = 0.0012995823053643107
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001764533924870193
Validation loss = 0.001320426817983389
Validation loss = 0.0014993863878771663
Validation loss = 0.0015001222491264343
Validation loss = 0.0012643042718991637
Validation loss = 0.0016134291654452682
Validation loss = 0.0010111801093444228
Validation loss = 0.0013296472607180476
Validation loss = 0.0008754853042773902
Validation loss = 0.0011704169446602464
Validation loss = 0.0011187136406078935
Validation loss = 0.0014973031356930733
Validation loss = 0.001292600529268384
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.143   |
| Iteration     | 27       |
| MaximumReturn | -0.0359  |
| MinimumReturn | -1.02    |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001348368008621037
Validation loss = 0.0014312028652057052
Validation loss = 0.0011431557359173894
Validation loss = 0.002456977730616927
Validation loss = 0.0009050414082594216
Validation loss = 0.0012654875172302127
Validation loss = 0.0023773368448019028
Validation loss = 0.0015369681641459465
Validation loss = 0.0013693008804693818
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011735656298696995
Validation loss = 0.002387370215728879
Validation loss = 0.0019207134610041976
Validation loss = 0.001280111144296825
Validation loss = 0.0011464775307103992
Validation loss = 0.0023445526603609324
Validation loss = 0.0012368393363431096
Validation loss = 0.0009347295272164047
Validation loss = 0.001651638769544661
Validation loss = 0.0011890005553141236
Validation loss = 0.0012649568961933255
Validation loss = 0.0020664543844759464
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011779991909861565
Validation loss = 0.002950468799099326
Validation loss = 0.0018945374758914113
Validation loss = 0.00127748295199126
Validation loss = 0.0012338720262050629
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009742454276420176
Validation loss = 0.001370276790112257
Validation loss = 0.002428592648357153
Validation loss = 0.0009822874562814832
Validation loss = 0.0008759568445384502
Validation loss = 0.001203665160574019
Validation loss = 0.0009445471805520356
Validation loss = 0.001357154338620603
Validation loss = 0.0015689498977735639
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013258028775453568
Validation loss = 0.002831509569659829
Validation loss = 0.001546347513794899
Validation loss = 0.0011859881924465299
Validation loss = 0.0011599623830989003
Validation loss = 0.0022327115293592215
Validation loss = 0.0009705987758934498
Validation loss = 0.0019691125489771366
Validation loss = 0.001230792491696775
Validation loss = 0.0015760953538119793
Validation loss = 0.001044652541168034
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00376 |
| Iteration     | 28       |
| MaximumReturn | -0.00299 |
| MinimumReturn | -0.00522 |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0024303290992975235
Validation loss = 0.0011002233950421214
Validation loss = 0.0010511658620089293
Validation loss = 0.0019706846214830875
Validation loss = 0.0011655627749860287
Validation loss = 0.0015311938477680087
Validation loss = 0.001455953810364008
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014115897938609123
Validation loss = 0.001051805797033012
Validation loss = 0.0015792576596140862
Validation loss = 0.0012852140935137868
Validation loss = 0.0008664557244628668
Validation loss = 0.0013791151577606797
Validation loss = 0.0011443307157605886
Validation loss = 0.000989937805570662
Validation loss = 0.0013592340983450413
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0025799190625548363
Validation loss = 0.0008875076309777796
Validation loss = 0.002946656895801425
Validation loss = 0.001374037005007267
Validation loss = 0.001493939314968884
Validation loss = 0.0014833727618679404
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017157860565930605
Validation loss = 0.0015451163053512573
Validation loss = 0.0011238172883167863
Validation loss = 0.0013389737578108907
Validation loss = 0.0010067250113934278
Validation loss = 0.0009371773921884596
Validation loss = 0.0012228690320625901
Validation loss = 0.0007407641387544572
Validation loss = 0.001324302051216364
Validation loss = 0.0017596541438251734
Validation loss = 0.0006853414815850556
Validation loss = 0.0023708385415375233
Validation loss = 0.001177049009129405
Validation loss = 0.0008391052251681685
Validation loss = 0.0015528983203694224
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015714550390839577
Validation loss = 0.0012289484729990363
Validation loss = 0.001403341768309474
Validation loss = 0.0024673601146787405
Validation loss = 0.000980603275820613
Validation loss = 0.001450023497454822
Validation loss = 0.0007859154138714075
Validation loss = 0.0013329742942005396
Validation loss = 0.00143786845728755
Validation loss = 0.000996051705442369
Validation loss = 0.0010246637975797057
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00121  |
| Iteration     | 29        |
| MaximumReturn | -0.000718 |
| MinimumReturn | -0.00691  |
| TotalSamples  | 51646     |
-----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002392415888607502
Validation loss = 0.0014143942389637232
Validation loss = 0.0010394533164799213
Validation loss = 0.0011466371361166239
Validation loss = 0.000860353116877377
Validation loss = 0.0019408538937568665
Validation loss = 0.0009996830485761166
Validation loss = 0.0010768831707537174
Validation loss = 0.0012189733097329736
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008150757057592273
Validation loss = 0.0014900233363732696
Validation loss = 0.0024509031791239977
Validation loss = 0.0011306909145787358
Validation loss = 0.002038839040324092
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003762178122997284
Validation loss = 0.001703623216599226
Validation loss = 0.0023871567100286484
Validation loss = 0.0009263979736715555
Validation loss = 0.0010477411560714245
Validation loss = 0.0011883233673870564
Validation loss = 0.0020727820228785276
Validation loss = 0.0015360888792201877
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0021081038285046816
Validation loss = 0.0010428600944578648
Validation loss = 0.0012612341670319438
Validation loss = 0.0006945130880922079
Validation loss = 0.0011395174078643322
Validation loss = 0.0016481262864544988
Validation loss = 0.0008669146336615086
Validation loss = 0.0008164417813532054
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010157156502828002
Validation loss = 0.0017641864251345396
Validation loss = 0.0014735044678673148
Validation loss = 0.001162121188826859
Validation loss = 0.0011959070106968284
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00736 |
| Iteration     | 30       |
| MaximumReturn | -0.00276 |
| MinimumReturn | -0.00952 |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013900845078751445
Validation loss = 0.0018579422030597925
Validation loss = 0.001531675225123763
Validation loss = 0.000842265144456178
Validation loss = 0.0008965615415945649
Validation loss = 0.002203448675572872
Validation loss = 0.0009295030613429844
Validation loss = 0.0010464348597452044
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012580787297338247
Validation loss = 0.0017900264356285334
Validation loss = 0.001063794712536037
Validation loss = 0.0009451875812374055
Validation loss = 0.0010339017026126385
Validation loss = 0.0016535528702661395
Validation loss = 0.001105839153751731
Validation loss = 0.0018884639721363783
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011549950577318668
Validation loss = 0.001474931021220982
Validation loss = 0.001285759499296546
Validation loss = 0.0016047382960096002
Validation loss = 0.0014895717613399029
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013838517479598522
Validation loss = 0.0007935492321848869
Validation loss = 0.00249696196988225
Validation loss = 0.0011486998992040753
Validation loss = 0.001135259517468512
Validation loss = 0.0017886706627905369
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009347465238533914
Validation loss = 0.0016498561017215252
Validation loss = 0.0009002826409414411
Validation loss = 0.0011746177915483713
Validation loss = 0.0011932819616049528
Validation loss = 0.0023605641908943653
Validation loss = 0.0009367046295665205
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00408  |
| Iteration     | 31        |
| MaximumReturn | -0.000831 |
| MinimumReturn | -0.00825  |
| TotalSamples  | 54978     |
-----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013321287697181106
Validation loss = 0.0009615293820388615
Validation loss = 0.0016106406692415476
Validation loss = 0.0011836501071229577
Validation loss = 0.0010514460736885667
Validation loss = 0.0020177329424768686
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0024028271436691284
Validation loss = 0.0011532630305737257
Validation loss = 0.001593568827956915
Validation loss = 0.0014002172974869609
Validation loss = 0.0009149563848040998
Validation loss = 0.000882237683981657
Validation loss = 0.001273354166187346
Validation loss = 0.0015039980644360185
Validation loss = 0.0015974664129316807
Validation loss = 0.0009406004101037979
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012926600174978375
Validation loss = 0.0015124889323487878
Validation loss = 0.0021562245674431324
Validation loss = 0.0010979385115206242
Validation loss = 0.0008918647072277963
Validation loss = 0.001533052301965654
Validation loss = 0.001676767598837614
Validation loss = 0.0013651014305651188
Validation loss = 0.0010856277076527476
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014307735254988074
Validation loss = 0.0007879247423261404
Validation loss = 0.0011377958580851555
Validation loss = 0.0011399054201319814
Validation loss = 0.0012829011538997293
Validation loss = 0.0012342645786702633
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001136350678279996
Validation loss = 0.0013873465359210968
Validation loss = 0.0013175002532079816
Validation loss = 0.0017167634796351194
Validation loss = 0.0015185800148174167
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.127   |
| Iteration     | 32       |
| MaximumReturn | -0.0713  |
| MinimumReturn | -0.188   |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012692600721493363
Validation loss = 0.0006128636887297034
Validation loss = 0.0007446125382557511
Validation loss = 0.0009649856365285814
Validation loss = 0.0010504195233806968
Validation loss = 0.0007424372597597539
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0022244739811867476
Validation loss = 0.000821868481580168
Validation loss = 0.0008539563277736306
Validation loss = 0.000663326121866703
Validation loss = 0.0009215098689310253
Validation loss = 0.0011845260160043836
Validation loss = 0.0009991198312491179
Validation loss = 0.0007341658929362893
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008114503580145538
Validation loss = 0.0011866854038089514
Validation loss = 0.0017022468382492661
Validation loss = 0.0013234170619398355
Validation loss = 0.0016296675894409418
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017083241837099195
Validation loss = 0.000852948403917253
Validation loss = 0.0008346796385012567
Validation loss = 0.0016257838578894734
Validation loss = 0.0007523083477281034
Validation loss = 0.0009888808708637953
Validation loss = 0.0008655294077470899
Validation loss = 0.000812901183962822
Validation loss = 0.001213490730151534
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010560936061665416
Validation loss = 0.0010652345372363925
Validation loss = 0.0007887864485383034
Validation loss = 0.0011267125373706222
Validation loss = 0.0011151497019454837
Validation loss = 0.0008554807864129543
Validation loss = 0.0007442008354701102
Validation loss = 0.0010976145276799798
Validation loss = 0.0008552289800718427
Validation loss = 0.0008324541850015521
Validation loss = 0.0008260532631538808
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.151   |
| Iteration     | 33       |
| MaximumReturn | -0.0989  |
| MinimumReturn | -0.191   |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009652309818193316
Validation loss = 0.0007104822434484959
Validation loss = 0.0009343789424747229
Validation loss = 0.0016107220435515046
Validation loss = 0.0006468544015660882
Validation loss = 0.0017743628704920411
Validation loss = 0.0006978788878768682
Validation loss = 0.0011966811725869775
Validation loss = 0.0008199815056286752
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006835348904132843
Validation loss = 0.0011752016143873334
Validation loss = 0.0007438275497406721
Validation loss = 0.0007777294958941638
Validation loss = 0.0006567570380866528
Validation loss = 0.0010267088655382395
Validation loss = 0.0009176565799862146
Validation loss = 0.0009405964519828558
Validation loss = 0.0014490701723843813
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008596499101258814
Validation loss = 0.0006819895934313536
Validation loss = 0.0013238921528682113
Validation loss = 0.0012775121722370386
Validation loss = 0.0012202527141198516
Validation loss = 0.001281846547499299
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010872919810935855
Validation loss = 0.0007672639912925661
Validation loss = 0.001426534028723836
Validation loss = 0.00074602389940992
Validation loss = 0.0007639696705155075
Validation loss = 0.0007393129635602236
Validation loss = 0.0007384142372757196
Validation loss = 0.0007295070099644363
Validation loss = 0.0008787879487499595
Validation loss = 0.0010575195774435997
Validation loss = 0.0007250946364365518
Validation loss = 0.0006326040020212531
Validation loss = 0.0008535856031812727
Validation loss = 0.0005457844235934317
Validation loss = 0.0009595392039045691
Validation loss = 0.001026655314490199
Validation loss = 0.0011762931244447827
Validation loss = 0.0014754594303667545
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001177289872430265
Validation loss = 0.0007173342164605856
Validation loss = 0.0005892038461752236
Validation loss = 0.0011206390336155891
Validation loss = 0.000753078144043684
Validation loss = 0.0008033584454096854
Validation loss = 0.0008176745031960309
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.104   |
| Iteration     | 34       |
| MaximumReturn | -0.0486  |
| MinimumReturn | -0.16    |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014853718457743526
Validation loss = 0.0009928684448823333
Validation loss = 0.0005875735660083592
Validation loss = 0.0007264312589541078
Validation loss = 0.0009503188775852323
Validation loss = 0.0008447289001196623
Validation loss = 0.0012407338945195079
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008663510670885444
Validation loss = 0.0006026802584528923
Validation loss = 0.001013644621707499
Validation loss = 0.0009464049362577498
Validation loss = 0.0008499805117025971
Validation loss = 0.0011879573576152325
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010619900422170758
Validation loss = 0.0007922666845843196
Validation loss = 0.0006818481488153338
Validation loss = 0.000836062477901578
Validation loss = 0.0007477378821931779
Validation loss = 0.0008790312567725778
Validation loss = 0.001471302704885602
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00088943523587659
Validation loss = 0.0006069998489692807
Validation loss = 0.000771852326579392
Validation loss = 0.0011110937921330333
Validation loss = 0.0008722228812985122
Validation loss = 0.0008227796060964465
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012950386153534055
Validation loss = 0.0008689246024005115
Validation loss = 0.0006914802361279726
Validation loss = 0.0006415309035219252
Validation loss = 0.0007266029133461416
Validation loss = 0.000792822043877095
Validation loss = 0.0006283812690526247
Validation loss = 0.0009930972009897232
Validation loss = 0.0008833074825815856
Validation loss = 0.0012511864770203829
Validation loss = 0.000894576427526772
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0925  |
| Iteration     | 35       |
| MaximumReturn | -0.0365  |
| MinimumReturn | -0.128   |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001344747724942863
Validation loss = 0.001144101144745946
Validation loss = 0.0006643873057328165
Validation loss = 0.0005867904401384294
Validation loss = 0.0007613303023390472
Validation loss = 0.0007618989911861718
Validation loss = 0.0006750124739482999
Validation loss = 0.0012761912075802684
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014133756048977375
Validation loss = 0.0016110774595290422
Validation loss = 0.0005961827700957656
Validation loss = 0.0013380394084379077
Validation loss = 0.0008377096382901073
Validation loss = 0.0007310842629522085
Validation loss = 0.0012947130016982555
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014410715084522963
Validation loss = 0.0008038273663260043
Validation loss = 0.0008601778536103666
Validation loss = 0.001229607965797186
Validation loss = 0.0005893101915717125
Validation loss = 0.0005813217139802873
Validation loss = 0.0007601659162901342
Validation loss = 0.0014949539909139276
Validation loss = 0.0009443411254324019
Validation loss = 0.0008765282691456378
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008717335294932127
Validation loss = 0.0006368097383528948
Validation loss = 0.000698789197485894
Validation loss = 0.0007933223969303071
Validation loss = 0.0008305276278406382
Validation loss = 0.0006696266354992986
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009760881075635552
Validation loss = 0.0008869351004250348
Validation loss = 0.0013180141104385257
Validation loss = 0.0008194539695978165
Validation loss = 0.0010403324849903584
Validation loss = 0.0007103620446287096
Validation loss = 0.0006992517155595124
Validation loss = 0.0012159360339865088
Validation loss = 0.0009409908088855445
Validation loss = 0.0006230091094039381
Validation loss = 0.0007113305618986487
Validation loss = 0.0007170084863901138
Validation loss = 0.0005350136780180037
Validation loss = 0.0012727243592962623
Validation loss = 0.0005960658309049904
Validation loss = 0.0007345833000726998
Validation loss = 0.0008171960362233222
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0115   |
| Iteration     | 36        |
| MaximumReturn | -0.000897 |
| MinimumReturn | -0.0544   |
| TotalSamples  | 63308     |
-----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008580541471019387
Validation loss = 0.0008305389201268554
Validation loss = 0.0009555580327287316
Validation loss = 0.0007913611480034888
Validation loss = 0.0008844879921525717
Validation loss = 0.0008418556535616517
Validation loss = 0.0007147815776988864
Validation loss = 0.000912985997274518
Validation loss = 0.0019850668031722307
Validation loss = 0.0005648510414175689
Validation loss = 0.0006529719103127718
Validation loss = 0.0007198185194283724
Validation loss = 0.0007686502067372203
Validation loss = 0.0008672768599353731
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010112525196745992
Validation loss = 0.0014768581604585052
Validation loss = 0.0014903582632541656
Validation loss = 0.0013174478663131595
Validation loss = 0.000951204972807318
Validation loss = 0.0009901104494929314
Validation loss = 0.0014224150218069553
Validation loss = 0.0023657185956835747
Validation loss = 0.0007212080527096987
Validation loss = 0.001770788338035345
Validation loss = 0.0010015199659392238
Validation loss = 0.0007292137597687542
Validation loss = 0.0007283897139132023
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008925418369472027
Validation loss = 0.000731516454834491
Validation loss = 0.0008418049546889961
Validation loss = 0.0008830223232507706
Validation loss = 0.0006856484105810523
Validation loss = 0.0014046391006559134
Validation loss = 0.0009015886462293565
Validation loss = 0.0007378347218036652
Validation loss = 0.0012153356801718473
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006377625977620482
Validation loss = 0.001160956802777946
Validation loss = 0.0008967363392002881
Validation loss = 0.0012641800567507744
Validation loss = 0.0014379373751580715
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007752998499199748
Validation loss = 0.0006449161446653306
Validation loss = 0.0009347181185148656
Validation loss = 0.000987995881587267
Validation loss = 0.0010738250566646457
Validation loss = 0.0007734590326435864
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -54.4    |
| Iteration     | 37       |
| MaximumReturn | -23      |
| MinimumReturn | -87.8    |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002044599037617445
Validation loss = 0.0007997021311894059
Validation loss = 0.0005992695805616677
Validation loss = 0.0008027829462662339
Validation loss = 0.0006082684267312288
Validation loss = 0.0008543577278032899
Validation loss = 0.0011462790425866842
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016994329635053873
Validation loss = 0.0009745272691361606
Validation loss = 0.000716000737156719
Validation loss = 0.0008582088630646467
Validation loss = 0.0006660725921392441
Validation loss = 0.0008691503899171948
Validation loss = 0.0009329813183285296
Validation loss = 0.0009323363774456084
Validation loss = 0.0009116561850532889
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002188547747209668
Validation loss = 0.0007579048979096115
Validation loss = 0.0013254755176603794
Validation loss = 0.0011212907265871763
Validation loss = 0.000983891193754971
Validation loss = 0.00101478211581707
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0019439817406237125
Validation loss = 0.0010239050025120378
Validation loss = 0.0008486708393320441
Validation loss = 0.0008171872468665242
Validation loss = 0.0010965754045173526
Validation loss = 0.0006868719356134534
Validation loss = 0.0011706022778525949
Validation loss = 0.0008926877053454518
Validation loss = 0.0008432335453107953
Validation loss = 0.0009375330992043018
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010186333674937487
Validation loss = 0.0006618264596909285
Validation loss = 0.0007050405256450176
Validation loss = 0.0006493567489087582
Validation loss = 0.0006279546651057899
Validation loss = 0.0008811539737507701
Validation loss = 0.0011540442937985063
Validation loss = 0.0007973561296239495
Validation loss = 0.0007236440433189273
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -40.7    |
| Iteration     | 38       |
| MaximumReturn | -0.853   |
| MinimumReturn | -71.5    |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011199226137250662
Validation loss = 0.0006654569879174232
Validation loss = 0.0007755674887448549
Validation loss = 0.0008712545386515558
Validation loss = 0.0006838704575784504
Validation loss = 0.0007074701134115458
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010761450976133347
Validation loss = 0.0006797578535042703
Validation loss = 0.0006584606599062681
Validation loss = 0.0010933529119938612
Validation loss = 0.001121227745898068
Validation loss = 0.001341801369562745
Validation loss = 0.0010042029898613691
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010583193507045507
Validation loss = 0.0012975624995306134
Validation loss = 0.00125803891569376
Validation loss = 0.002211939776316285
Validation loss = 0.0008253964479081333
Validation loss = 0.0010178684024140239
Validation loss = 0.0009110560640692711
Validation loss = 0.0008857042994350195
Validation loss = 0.000915020820684731
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010615716455504298
Validation loss = 0.0007668487378396094
Validation loss = 0.0008330395212396979
Validation loss = 0.000938612618483603
Validation loss = 0.0015990304527804255
Validation loss = 0.0008554362575523555
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008994139498099685
Validation loss = 0.000636503507848829
Validation loss = 0.000805049086920917
Validation loss = 0.0009965334320440888
Validation loss = 0.0008510518819093704
Validation loss = 0.0010516189504414797
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -36.8    |
| Iteration     | 39       |
| MaximumReturn | -0.422   |
| MinimumReturn | -67.2    |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001223072875291109
Validation loss = 0.001450656563974917
Validation loss = 0.000943162536714226
Validation loss = 0.0008293915307149291
Validation loss = 0.0006989193498156965
Validation loss = 0.001641675946302712
Validation loss = 0.0006584080401808023
Validation loss = 0.000633701216429472
Validation loss = 0.0008215110865421593
Validation loss = 0.0008523412980139256
Validation loss = 0.0007838974124751985
Validation loss = 0.0006490959203802049
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011270818067714572
Validation loss = 0.000822440255433321
Validation loss = 0.0008897036896087229
Validation loss = 0.0010374869452789426
Validation loss = 0.0007445280207321048
Validation loss = 0.0006220376817509532
Validation loss = 0.0015787885058671236
Validation loss = 0.0012110605603083968
Validation loss = 0.00162097392603755
Validation loss = 0.001192345516756177
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0020056930370628834
Validation loss = 0.002186801051720977
Validation loss = 0.0012852440122514963
Validation loss = 0.0006812080391682684
Validation loss = 0.0006960817845538259
Validation loss = 0.0009609780972823501
Validation loss = 0.0016972786979749799
Validation loss = 0.0012444977182894945
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012516474816948175
Validation loss = 0.0012474479153752327
Validation loss = 0.00059420958859846
Validation loss = 0.000745788449421525
Validation loss = 0.0008889876189641654
Validation loss = 0.0012030070647597313
Validation loss = 0.0007873713038861752
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013498211046680808
Validation loss = 0.0007678450201638043
Validation loss = 0.0007384564960375428
Validation loss = 0.0011266234796494246
Validation loss = 0.0013552374439314008
Validation loss = 0.0008368775015696883
Validation loss = 0.0005844893166795373
Validation loss = 0.0006578119355253875
Validation loss = 0.0010139712831005454
Validation loss = 0.000783416151534766
Validation loss = 0.0007082169759087265
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.374   |
| Iteration     | 40       |
| MaximumReturn | -0.204   |
| MinimumReturn | -0.656   |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008215171401388943
Validation loss = 0.0007071785512380302
Validation loss = 0.0011081520933657885
Validation loss = 0.0009594115545041859
Validation loss = 0.0006140772602520883
Validation loss = 0.0008384494576603174
Validation loss = 0.0008245590142905712
Validation loss = 0.0008430765592493117
Validation loss = 0.0010713445954024792
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006296514184214175
Validation loss = 0.0007236414239741862
Validation loss = 0.0008891213801689446
Validation loss = 0.0010942212538793683
Validation loss = 0.0009531938703730702
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009317089570686221
Validation loss = 0.0009983398485928774
Validation loss = 0.0009852821240201592
Validation loss = 0.0009492980316281319
Validation loss = 0.0007570974994450808
Validation loss = 0.0009587176027707756
Validation loss = 0.001234919996932149
Validation loss = 0.001042925170622766
Validation loss = 0.0008353986777365208
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000674807874020189
Validation loss = 0.0007187669398263097
Validation loss = 0.0004868491378147155
Validation loss = 0.0008051152108237147
Validation loss = 0.0006205677636899054
Validation loss = 0.0006793670472688973
Validation loss = 0.0007810392999090254
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006557431188412011
Validation loss = 0.000646181171759963
Validation loss = 0.0007433441933244467
Validation loss = 0.0005897571099922061
Validation loss = 0.0007005865918472409
Validation loss = 0.000566472124774009
Validation loss = 0.0007060485077090561
Validation loss = 0.0019005786161869764
Validation loss = 0.0008489664760418236
Validation loss = 0.0006236493354663253
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.642   |
| Iteration     | 41       |
| MaximumReturn | -0.343   |
| MinimumReturn | -0.988   |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008534437511116266
Validation loss = 0.0007385789067484438
Validation loss = 0.0008836397901177406
Validation loss = 0.0008111849310807884
Validation loss = 0.000841860834043473
Validation loss = 0.0008995328680612147
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006574756698682904
Validation loss = 0.0005334709421731532
Validation loss = 0.0007067089318297803
Validation loss = 0.0009051183587871492
Validation loss = 0.0031823506578803062
Validation loss = 0.0005505391745828092
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006369163165800273
Validation loss = 0.0007382229086942971
Validation loss = 0.0011668044608086348
Validation loss = 0.0009480329463258386
Validation loss = 0.0007518182974308729
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0004598744853865355
Validation loss = 0.0007403381168842316
Validation loss = 0.0006524825002998114
Validation loss = 0.000623667030595243
Validation loss = 0.0009647654369473457
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007378066657111049
Validation loss = 0.0009196270839311182
Validation loss = 0.0006213316228240728
Validation loss = 0.0005727997631765902
Validation loss = 0.0004437617026269436
Validation loss = 0.0005575486575253308
Validation loss = 0.0007393130799755454
Validation loss = 0.0008415758493356407
Validation loss = 0.0008549969643354416
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -90.4    |
| Iteration     | 42       |
| MaximumReturn | -58.9    |
| MinimumReturn | -113     |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008452659822069108
Validation loss = 0.000522830116096884
Validation loss = 0.0004289976495783776
Validation loss = 0.0004104720428586006
Validation loss = 0.0006030119257047772
Validation loss = 0.0004352130927145481
Validation loss = 0.0008211752283386886
Validation loss = 0.0014030436286702752
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008783453376963735
Validation loss = 0.0005896173533983529
Validation loss = 0.0005629706429317594
Validation loss = 0.0005186849739402533
Validation loss = 0.0007032493012957275
Validation loss = 0.0006455753464251757
Validation loss = 0.0006716258940286934
Validation loss = 0.0005125626339577138
Validation loss = 0.0008644746267236769
Validation loss = 0.0005991877987980843
Validation loss = 0.0006796935340389609
Validation loss = 0.0007971387240104377
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015079532749950886
Validation loss = 0.0007349595543928444
Validation loss = 0.0004911404103040695
Validation loss = 0.0013753481907770038
Validation loss = 0.0007327875355258584
Validation loss = 0.0005048808525316417
Validation loss = 0.0006681275554001331
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0025362803135067225
Validation loss = 0.0005481439875438809
Validation loss = 0.0006058462313376367
Validation loss = 0.0005901003023609519
Validation loss = 0.0005035069189034402
Validation loss = 0.0008028718875721097
Validation loss = 0.0006141767371445894
Validation loss = 0.0007808989612385631
Validation loss = 0.0005290830158628523
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001978399930521846
Validation loss = 0.00045630050590261817
Validation loss = 0.0005803643725812435
Validation loss = 0.0004788273945450783
Validation loss = 0.0007750866352580488
Validation loss = 0.0006620686035603285
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -96.6    |
| Iteration     | 43       |
| MaximumReturn | -0.87    |
| MinimumReturn | -149     |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005613723769783974
Validation loss = 0.0010037135798484087
Validation loss = 0.0004749922372866422
Validation loss = 0.0006403263541869819
Validation loss = 0.0008030487224459648
Validation loss = 0.0008663897751830518
Validation loss = 0.0005389901925809681
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007417962769977748
Validation loss = 0.0006168566760607064
Validation loss = 0.0005414868937805295
Validation loss = 0.0005213725962676108
Validation loss = 0.0005455242935568094
Validation loss = 0.0007468940457329154
Validation loss = 0.000695402966812253
Validation loss = 0.0006138791795819998
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011302824132144451
Validation loss = 0.0006699378718622029
Validation loss = 0.0005989098572172225
Validation loss = 0.00047875838936306536
Validation loss = 0.0006104357307776809
Validation loss = 0.0004581220564432442
Validation loss = 0.0005016198847442865
Validation loss = 0.0007445321534760296
Validation loss = 0.0008238599402830005
Validation loss = 0.00048531309585087
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000674752052873373
Validation loss = 0.0007453104481101036
Validation loss = 0.0012385364389047027
Validation loss = 0.0007794644916430116
Validation loss = 0.0005672391853295267
Validation loss = 0.0006231105071492493
Validation loss = 0.000784262316301465
Validation loss = 0.0006897069397382438
Validation loss = 0.0006245669210329652
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007717168773524463
Validation loss = 0.0006180485361255705
Validation loss = 0.0008433812763541937
Validation loss = 0.0009020962752401829
Validation loss = 0.0007202604901976883
Validation loss = 0.00068109534913674
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -41.5    |
| Iteration     | 44       |
| MaximumReturn | -0.246   |
| MinimumReturn | -126     |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005059834802523255
Validation loss = 0.0005431471508927643
Validation loss = 0.0007117085624486208
Validation loss = 0.0007117153727449477
Validation loss = 0.0004986297572031617
Validation loss = 0.000964763923548162
Validation loss = 0.0004932113224640489
Validation loss = 0.0007383145857602358
Validation loss = 0.0006818224210292101
Validation loss = 0.000525121227838099
Validation loss = 0.00048685906222090125
Validation loss = 0.0005444544367492199
Validation loss = 0.0005406034761108458
Validation loss = 0.0007929589482955635
Validation loss = 0.0005464234272949398
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009658565977588296
Validation loss = 0.0005622453754767776
Validation loss = 0.0006326718139462173
Validation loss = 0.0004623560234904289
Validation loss = 0.0006132455309852958
Validation loss = 0.000613165320828557
Validation loss = 0.0008460686076432467
Validation loss = 0.0004521986411418766
Validation loss = 0.0004979821387678385
Validation loss = 0.0005846981657668948
Validation loss = 0.00043051151442341506
Validation loss = 0.0004805041244253516
Validation loss = 0.0005445617134682834
Validation loss = 0.0004924502572976053
Validation loss = 0.00049121881602332
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005234864074736834
Validation loss = 0.0005614944384433329
Validation loss = 0.0005881597171537578
Validation loss = 0.0008137908298522234
Validation loss = 0.0006469389190897346
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005139870336279273
Validation loss = 0.0005428569857031107
Validation loss = 0.0007407961529679596
Validation loss = 0.0004799675371032208
Validation loss = 0.00046297977678477764
Validation loss = 0.0004238687688484788
Validation loss = 0.0005468768067657948
Validation loss = 0.000544095819350332
Validation loss = 0.0005147203337401152
Validation loss = 0.0010964501416310668
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004921377403661609
Validation loss = 0.0004354733391664922
Validation loss = 0.0004882655630353838
Validation loss = 0.0008588191121816635
Validation loss = 0.0008045283611863852
Validation loss = 0.0005433731712400913
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.78    |
| Iteration     | 45       |
| MaximumReturn | -0.084   |
| MinimumReturn | -31.9    |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005163986352272332
Validation loss = 0.0004962453385815024
Validation loss = 0.0009048706269823015
Validation loss = 0.000387550302548334
Validation loss = 0.0007145283161662519
Validation loss = 0.0011227718787267804
Validation loss = 0.0007255871314555407
Validation loss = 0.0005142666050232947
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000643399718683213
Validation loss = 0.0006113107083365321
Validation loss = 0.0005005272105336189
Validation loss = 0.000417286908486858
Validation loss = 0.00046899073640815914
Validation loss = 0.0005428080330602825
Validation loss = 0.0005110668716952205
Validation loss = 0.0004167795996181667
Validation loss = 0.0005467241280712187
Validation loss = 0.0005043921992182732
Validation loss = 0.00045800110092386603
Validation loss = 0.00047207585885189474
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006204918609000742
Validation loss = 0.0011124738957732916
Validation loss = 0.0004164642305113375
Validation loss = 0.0007694513769820333
Validation loss = 0.0005566393374465406
Validation loss = 0.0004832430277019739
Validation loss = 0.0008876268984749913
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006012119702063501
Validation loss = 0.00043843104504048824
Validation loss = 0.00040980614721775055
Validation loss = 0.0005574849201366305
Validation loss = 0.0010300440480932593
Validation loss = 0.0005309212720021605
Validation loss = 0.001098972512409091
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000453723972896114
Validation loss = 0.0005288125248625875
Validation loss = 0.0008291779668070376
Validation loss = 0.0006758337840437889
Validation loss = 0.0006220974028110504
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -28.7    |
| Iteration     | 46       |
| MaximumReturn | -0.175   |
| MinimumReturn | -73.1    |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004380089230835438
Validation loss = 0.0004752159002237022
Validation loss = 0.000414146576076746
Validation loss = 0.00046159932389855385
Validation loss = 0.0008809965802356601
Validation loss = 0.0006865632021799684
Validation loss = 0.0005542772123590112
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000623370346147567
Validation loss = 0.0008744571241550148
Validation loss = 0.0006529652746394277
Validation loss = 0.0005353152519091964
Validation loss = 0.0006031013908796012
Validation loss = 0.0005599671858362854
Validation loss = 0.0004756959096994251
Validation loss = 0.000497336674015969
Validation loss = 0.0006105826469138265
Validation loss = 0.0005417987122200429
Validation loss = 0.0006722084945067763
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009166860254481435
Validation loss = 0.0006709502777084708
Validation loss = 0.0004912042059004307
Validation loss = 0.0007703849114477634
Validation loss = 0.0006144336657598615
Validation loss = 0.0007598110823892057
Validation loss = 0.0005440145614556968
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00067986297653988
Validation loss = 0.0011588517809286714
Validation loss = 0.0006456480477936566
Validation loss = 0.0004884818335995078
Validation loss = 0.0006648172857239842
Validation loss = 0.00040025898488238454
Validation loss = 0.0005911142798140645
Validation loss = 0.0005257299053482711
Validation loss = 0.0004019383923150599
Validation loss = 0.0005574367824010551
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00047138985246419907
Validation loss = 0.000711451459210366
Validation loss = 0.0006076734280213714
Validation loss = 0.0005211302195675671
Validation loss = 0.0004851967969443649
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.3    |
| Iteration     | 47       |
| MaximumReturn | -0.0549  |
| MinimumReturn | -51.9    |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004994895425625145
Validation loss = 0.0004893008735962212
Validation loss = 0.0004888892872259021
Validation loss = 0.0005074542132206261
Validation loss = 0.0005606827908195555
Validation loss = 0.0006206000107340515
Validation loss = 0.00044129291200079024
Validation loss = 0.0007748663192614913
Validation loss = 0.0005573037196882069
Validation loss = 0.00047925589024089277
Validation loss = 0.0005278741591610014
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005754924495704472
Validation loss = 0.0005167897907085717
Validation loss = 0.0005461776163429022
Validation loss = 0.0004811004619114101
Validation loss = 0.0004128820146434009
Validation loss = 0.0004996445495635271
Validation loss = 0.0004793886619154364
Validation loss = 0.0003802633727900684
Validation loss = 0.0006781767588108778
Validation loss = 0.0005601041484624147
Validation loss = 0.00042988284258171916
Validation loss = 0.0005223909392952919
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013246926246210933
Validation loss = 0.0004507020930759609
Validation loss = 0.0005644324119202793
Validation loss = 0.00042257289169356227
Validation loss = 0.0007282827864401042
Validation loss = 0.0007983160903677344
Validation loss = 0.000660286343190819
Validation loss = 0.00040099077159538865
Validation loss = 0.0012275573099032044
Validation loss = 0.00038420018972828984
Validation loss = 0.0005276540759950876
Validation loss = 0.00043523096246644855
Validation loss = 0.000421225733589381
Validation loss = 0.0006931955576874316
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00041505444096401334
Validation loss = 0.0006506489589810371
Validation loss = 0.00046994761214591563
Validation loss = 0.0007273938972502947
Validation loss = 0.0005388272693380713
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004562379908747971
Validation loss = 0.000701731420122087
Validation loss = 0.0004047960974276066
Validation loss = 0.0005896305665373802
Validation loss = 0.00044650203199125826
Validation loss = 0.0004174550122115761
Validation loss = 0.0004935564356856048
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -69.3    |
| Iteration     | 48       |
| MaximumReturn | -0.394   |
| MinimumReturn | -203     |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0026750515680760145
Validation loss = 0.001685270108282566
Validation loss = 0.0007716387044638395
Validation loss = 0.0012850194470956922
Validation loss = 0.001586551545187831
Validation loss = 0.0007228885660879314
Validation loss = 0.001054465537890792
Validation loss = 0.000654694507829845
Validation loss = 0.0009187501855194569
Validation loss = 0.0006233142339624465
Validation loss = 0.0008266668883152306
Validation loss = 0.0014846056001260877
Validation loss = 0.0005288189277052879
Validation loss = 0.0010522960219532251
Validation loss = 0.0009746687719598413
Validation loss = 0.0007588480948470533
Validation loss = 0.0008972782525233924
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00834701955318451
Validation loss = 0.001823205384425819
Validation loss = 0.0010715840617194772
Validation loss = 0.0011053638299927115
Validation loss = 0.0007980948430486023
Validation loss = 0.0015015531098470092
Validation loss = 0.0007377683650702238
Validation loss = 0.000740814721211791
Validation loss = 0.0007790483068674803
Validation loss = 0.0009061174350790679
Validation loss = 0.0015782876871526241
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004717575386166573
Validation loss = 0.0013468714896589518
Validation loss = 0.001167928334325552
Validation loss = 0.0012071029050275683
Validation loss = 0.0013675675727427006
Validation loss = 0.0013994358014315367
Validation loss = 0.0009505762136541307
Validation loss = 0.0010011750273406506
Validation loss = 0.0007374705164693296
Validation loss = 0.0010634575737640262
Validation loss = 0.000713629531674087
Validation loss = 0.0012268943246454
Validation loss = 0.0011681867763400078
Validation loss = 0.0007684503216296434
Validation loss = 0.0009809252806007862
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0033875368535518646
Validation loss = 0.0012749385787174106
Validation loss = 0.001379754743538797
Validation loss = 0.0013188852462917566
Validation loss = 0.000798977620434016
Validation loss = 0.001135099446401
Validation loss = 0.0010780312586575747
Validation loss = 0.0007662405841983855
Validation loss = 0.0008854460902512074
Validation loss = 0.0006165632512420416
Validation loss = 0.0012243855744600296
Validation loss = 0.000744030112400651
Validation loss = 0.0010076223406940699
Validation loss = 0.0008346521062776446
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005947529338300228
Validation loss = 0.002007855800911784
Validation loss = 0.0016956549370661378
Validation loss = 0.001156817888841033
Validation loss = 0.001034118002280593
Validation loss = 0.0014739928301423788
Validation loss = 0.0007370475213974714
Validation loss = 0.0019569823052734137
Validation loss = 0.0009085735655389726
Validation loss = 0.00098705873824656
Validation loss = 0.0011983358999714255
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.634   |
| Iteration     | 49       |
| MaximumReturn | -0.203   |
| MinimumReturn | -1.65    |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0028271046467125416
Validation loss = 0.0037422843743115664
Validation loss = 0.002366353292018175
Validation loss = 0.0028304182924330235
Validation loss = 0.002599347848445177
Validation loss = 0.003204024862498045
Validation loss = 0.003327395999804139
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002871245611459017
Validation loss = 0.0029223228339105844
Validation loss = 0.003184159519150853
Validation loss = 0.0032748477533459663
Validation loss = 0.005342524498701096
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00248710997402668
Validation loss = 0.0025710705667734146
Validation loss = 0.0022643122356384993
Validation loss = 0.003855859860777855
Validation loss = 0.0023438194766640663
Validation loss = 0.0018853178480640054
Validation loss = 0.002429867861792445
Validation loss = 0.002312796888872981
Validation loss = 0.0020753727294504642
Validation loss = 0.002344514476135373
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0026954556815326214
Validation loss = 0.0018878785194829106
Validation loss = 0.004415248986333609
Validation loss = 0.0019376097479835153
Validation loss = 0.0022771197836846113
Validation loss = 0.003072806866839528
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0025990339927375317
Validation loss = 0.0027169615495949984
Validation loss = 0.0017930598696693778
Validation loss = 0.0023603406734764576
Validation loss = 0.002720006974413991
Validation loss = 0.0031346757896244526
Validation loss = 0.002593734534457326
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.07    |
| Iteration     | 50       |
| MaximumReturn | -0.251   |
| MinimumReturn | -62.1    |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0022301182616502047
Validation loss = 0.0031047966331243515
Validation loss = 0.0025284087751060724
Validation loss = 0.002192441141232848
Validation loss = 0.002773757092654705
Validation loss = 0.0026554346550256014
Validation loss = 0.002437315648421645
Validation loss = 0.0023671411909163
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002649417147040367
Validation loss = 0.0025921277701854706
Validation loss = 0.003448496339842677
Validation loss = 0.003175457939505577
Validation loss = 0.0024254810996353626
Validation loss = 0.002661915495991707
Validation loss = 0.0029005324468016624
Validation loss = 0.0030107772909104824
Validation loss = 0.003389278193935752
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021715275943279266
Validation loss = 0.001598574686795473
Validation loss = 0.0017040966777130961
Validation loss = 0.001729748328216374
Validation loss = 0.003020241856575012
Validation loss = 0.0017591820796951652
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002340958220884204
Validation loss = 0.0018096030689775944
Validation loss = 0.001967762131243944
Validation loss = 0.0020208368077874184
Validation loss = 0.003937730100005865
Validation loss = 0.0030518444254994392
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0036509528290480375
Validation loss = 0.0036588218063116074
Validation loss = 0.0031785417813807726
Validation loss = 0.0025379410944879055
Validation loss = 0.0034329877234995365
Validation loss = 0.0028279602993279696
Validation loss = 0.0020341561175882816
Validation loss = 0.0021121730096638203
Validation loss = 0.003447542432695627
Validation loss = 0.0035253914538770914
Validation loss = 0.003806882072240114
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0104   |
| Iteration     | 51        |
| MaximumReturn | -0.000827 |
| MinimumReturn | -0.134    |
| TotalSamples  | 88298     |
-----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0027155866846442223
Validation loss = 0.004016246180981398
Validation loss = 0.0027837345842272043
Validation loss = 0.002139950869604945
Validation loss = 0.0025940665509551764
Validation loss = 0.0028368572238832712
Validation loss = 0.0028050390537828207
Validation loss = 0.0025391890667378902
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003535336349159479
Validation loss = 0.003033457091078162
Validation loss = 0.003147322917357087
Validation loss = 0.002445654710754752
Validation loss = 0.0031902638729661703
Validation loss = 0.0027403386775404215
Validation loss = 0.0029478082433342934
Validation loss = 0.002770816208794713
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0029373725410550833
Validation loss = 0.0021116468124091625
Validation loss = 0.0019799410365521908
Validation loss = 0.0025891063269227743
Validation loss = 0.0032116465736180544
Validation loss = 0.0018946899799630046
Validation loss = 0.00164138269610703
Validation loss = 0.0016949850833043456
Validation loss = 0.002765424083918333
Validation loss = 0.0019109859131276608
Validation loss = 0.0015278581995517015
Validation loss = 0.0020616864785552025
Validation loss = 0.004537857603281736
Validation loss = 0.0014950450276955962
Validation loss = 0.0013170907041057944
Validation loss = 0.0018815604271367192
Validation loss = 0.001855142298154533
Validation loss = 0.0017671276582404971
Validation loss = 0.0016427583759650588
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002199104754254222
Validation loss = 0.0017170850187540054
Validation loss = 0.00160607835277915
Validation loss = 0.0027415454387664795
Validation loss = 0.003354146843776107
Validation loss = 0.006271979305893183
Validation loss = 0.003928130026906729
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0032587340101599693
Validation loss = 0.00211505638435483
Validation loss = 0.0019098129123449326
Validation loss = 0.002643545623868704
Validation loss = 0.0019935104064643383
Validation loss = 0.0021399399265646935
Validation loss = 0.002966774394735694
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.69    |
| Iteration     | 52       |
| MaximumReturn | -0.0832  |
| MinimumReturn | -26.2    |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002683864440768957
Validation loss = 0.00267576752230525
Validation loss = 0.002513379789888859
Validation loss = 0.0023653407115489244
Validation loss = 0.0023600689601153135
Validation loss = 0.002623003674671054
Validation loss = 0.002932127332314849
Validation loss = 0.0020163641311228275
Validation loss = 0.0036405418068170547
Validation loss = 0.0024268398992717266
Validation loss = 0.0036776934284716845
Validation loss = 0.003016336355358362
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002978125587105751
Validation loss = 0.007824365980923176
Validation loss = 0.0036030805204063654
Validation loss = 0.0032610823400318623
Validation loss = 0.00314140971750021
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016403296031057835
Validation loss = 0.0029881810769438744
Validation loss = 0.0016955395694822073
Validation loss = 0.0012474096147343516
Validation loss = 0.0022406396456062794
Validation loss = 0.001376922708004713
Validation loss = 0.0025913226418197155
Validation loss = 0.0024370509199798107
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002146113198250532
Validation loss = 0.0023600789718329906
Validation loss = 0.001552371308207512
Validation loss = 0.002176891313865781
Validation loss = 0.0015248379204422235
Validation loss = 0.0015341119142249227
Validation loss = 0.001829190761782229
Validation loss = 0.0020906066056340933
Validation loss = 0.003439441090449691
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01110780332237482
Validation loss = 0.0023332422133535147
Validation loss = 0.0024161103647202253
Validation loss = 0.002091952133923769
Validation loss = 0.0026385777164250612
Validation loss = 0.002544407034292817
Validation loss = 0.002595355035737157
Validation loss = 0.0018243856029585004
Validation loss = 0.001956725027412176
Validation loss = 0.0017674032133072615
Validation loss = 0.0017635795520618558
Validation loss = 0.0032788494136184454
Validation loss = 0.0023496944922953844
Validation loss = 0.002065141685307026
Validation loss = 0.0025129131972789764
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -185     |
| Iteration     | 53       |
| MaximumReturn | -119     |
| MinimumReturn | -224     |
| TotalSamples  | 91630    |
----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0036994540132582188
Validation loss = 0.0019584386609494686
Validation loss = 0.0015165163204073906
Validation loss = 0.0017245584167540073
Validation loss = 0.0020510107278823853
Validation loss = 0.0018868984188884497
Validation loss = 0.0014527416788041592
Validation loss = 0.001727327355183661
Validation loss = 0.00154966046102345
Validation loss = 0.0023232209496200085
Validation loss = 0.0020933232735842466
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005555626470595598
Validation loss = 0.0018767087021842599
Validation loss = 0.0019669062457978725
Validation loss = 0.0021607535891234875
Validation loss = 0.001677572843618691
Validation loss = 0.0017235968261957169
Validation loss = 0.0015896920813247561
Validation loss = 0.0018897437257692218
Validation loss = 0.001577842514961958
Validation loss = 0.0026232178788632154
Validation loss = 0.001557112205773592
Validation loss = 0.001560768811032176
Validation loss = 0.0019829333759844303
Validation loss = 0.0015300073428079486
Validation loss = 0.0029250564984977245
Validation loss = 0.0016897170571610332
Validation loss = 0.0012551271356642246
Validation loss = 0.001635198132134974
Validation loss = 0.0013559461804106832
Validation loss = 0.001748793525621295
Validation loss = 0.001652487670071423
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00211262796074152
Validation loss = 0.0015008709160611033
Validation loss = 0.001719920663163066
Validation loss = 0.0025319631677120924
Validation loss = 0.0011953318025916815
Validation loss = 0.0013404528144747019
Validation loss = 0.0011341155041009188
Validation loss = 0.0011163171147927642
Validation loss = 0.001148245413787663
Validation loss = 0.0019158752402290702
Validation loss = 0.0019057976314797997
Validation loss = 0.0012664796086028218
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0035987074952572584
Validation loss = 0.0021145702339708805
Validation loss = 0.0015530659584328532
Validation loss = 0.0016460686456412077
Validation loss = 0.0010991089511662722
Validation loss = 0.0013735508546233177
Validation loss = 0.0016482468927279115
Validation loss = 0.0017779221525415778
Validation loss = 0.0015783364651724696
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0026696091517806053
Validation loss = 0.0014730339171364903
Validation loss = 0.001411440665833652
Validation loss = 0.001156183541752398
Validation loss = 0.0013483129441738129
Validation loss = 0.0013637621887028217
Validation loss = 0.0016642322298139334
Validation loss = 0.0014847071142867208
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -218     |
| Iteration     | 54       |
| MaximumReturn | -167     |
| MinimumReturn | -229     |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0026006565894931555
Validation loss = 0.0024662530049681664
Validation loss = 0.0021568669471889734
Validation loss = 0.0016532030422240496
Validation loss = 0.0016075901221483946
Validation loss = 0.001827456522732973
Validation loss = 0.0022239245008677244
Validation loss = 0.002035224810242653
Validation loss = 0.00202794186770916
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007058656774461269
Validation loss = 0.001404131529852748
Validation loss = 0.0020418409258127213
Validation loss = 0.0015001349383965135
Validation loss = 0.0015302953543141484
Validation loss = 0.0015341044636443257
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0030090089421719313
Validation loss = 0.0016324847238138318
Validation loss = 0.0017306071240454912
Validation loss = 0.0013772310921922326
Validation loss = 0.001911491621285677
Validation loss = 0.0012747679138556123
Validation loss = 0.001009396743029356
Validation loss = 0.0012915425468236208
Validation loss = 0.0013075750321149826
Validation loss = 0.0012867614859715104
Validation loss = 0.0017370808636769652
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003006624523550272
Validation loss = 0.0014847408747300506
Validation loss = 0.0010834586573764682
Validation loss = 0.0013505471870303154
Validation loss = 0.0009933055844157934
Validation loss = 0.0010985360713675618
Validation loss = 0.0013291557552292943
Validation loss = 0.001248830696567893
Validation loss = 0.0010114011820405722
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0020073342602699995
Validation loss = 0.0017187061021104455
Validation loss = 0.0017573439981788397
Validation loss = 0.005532745271921158
Validation loss = 0.001617620000615716
Validation loss = 0.0012977983569726348
Validation loss = 0.001692462246865034
Validation loss = 0.0019670636393129826
Validation loss = 0.0018971215467900038
Validation loss = 0.0011301747290417552
Validation loss = 0.0014033541083335876
Validation loss = 0.0021564760245382786
Validation loss = 0.0017430142033845186
Validation loss = 0.0020741939079016447
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -208     |
| Iteration     | 55       |
| MaximumReturn | -179     |
| MinimumReturn | -230     |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002581424545496702
Validation loss = 0.0015255712205544114
Validation loss = 0.0015012816293165088
Validation loss = 0.0009541849722154438
Validation loss = 0.0012073215330019593
Validation loss = 0.00875062681734562
Validation loss = 0.0009598765172995627
Validation loss = 0.0017992333741858602
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008456690236926079
Validation loss = 0.0013359548756852746
Validation loss = 0.0011373987654224038
Validation loss = 0.0017118636751547456
Validation loss = 0.0010896577732637525
Validation loss = 0.001291069551371038
Validation loss = 0.0012027393095195293
Validation loss = 0.0012277005007490516
Validation loss = 0.002163404831662774
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021614981815218925
Validation loss = 0.0014262826880440116
Validation loss = 0.0014865380944684148
Validation loss = 0.0014076890656724572
Validation loss = 0.0009183106012642384
Validation loss = 0.0009024712489917874
Validation loss = 0.001463022897951305
Validation loss = 0.0013450577389448881
Validation loss = 0.0009373322827741504
Validation loss = 0.0011310026748105884
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002442791126668453
Validation loss = 0.0015971713000908494
Validation loss = 0.001462202169932425
Validation loss = 0.001041247509419918
Validation loss = 0.0011780644999817014
Validation loss = 0.001160105224698782
Validation loss = 0.0014370852150022984
Validation loss = 0.0017667784122750163
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006387799978256226
Validation loss = 0.0011993658263236284
Validation loss = 0.0015573729760944843
Validation loss = 0.0008470508037135005
Validation loss = 0.001541141769848764
Validation loss = 0.0011780657805502415
Validation loss = 0.0013259899569675326
Validation loss = 0.0011746949749067426
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -97.8    |
| Iteration     | 56       |
| MaximumReturn | -0.232   |
| MinimumReturn | -198     |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015905611217021942
Validation loss = 0.0018611756386235356
Validation loss = 0.0015582875348627567
Validation loss = 0.0010971653973683715
Validation loss = 0.0013195070205256343
Validation loss = 0.0009340688702650368
Validation loss = 0.0008918342064134777
Validation loss = 0.0009629682754166424
Validation loss = 0.0010663861175999045
Validation loss = 0.003359974129125476
Validation loss = 0.0008701197220943868
Validation loss = 0.0010690940544009209
Validation loss = 0.0010197615483775735
Validation loss = 0.0009275240008719265
Validation loss = 0.0012264302931725979
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016602501273155212
Validation loss = 0.001066769240424037
Validation loss = 0.0010637721279636025
Validation loss = 0.00104337593074888
Validation loss = 0.0011705831857398152
Validation loss = 0.001034510787576437
Validation loss = 0.001062889234162867
Validation loss = 0.0011954379733651876
Validation loss = 0.0010134606854990125
Validation loss = 0.0013046711683273315
Validation loss = 0.002097752643749118
Validation loss = 0.0010701128048822284
Validation loss = 0.0012130228569731116
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010494292946532369
Validation loss = 0.0012312802718952298
Validation loss = 0.0012198361800983548
Validation loss = 0.002374981762841344
Validation loss = 0.0008178713615052402
Validation loss = 0.001793770119547844
Validation loss = 0.00106780172791332
Validation loss = 0.0016519436612725258
Validation loss = 0.0008532237843610346
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013610656606033444
Validation loss = 0.0010390444658696651
Validation loss = 0.0019150165608152747
Validation loss = 0.001082497532479465
Validation loss = 0.0012466643238440156
Validation loss = 0.001454546581953764
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011019072262570262
Validation loss = 0.0017875797348096967
Validation loss = 0.0011763410875573754
Validation loss = 0.0009495759732089937
Validation loss = 0.001163099310360849
Validation loss = 0.0011743954382836819
Validation loss = 0.0013561956584453583
Validation loss = 0.0009496685233898461
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -51.4    |
| Iteration     | 57       |
| MaximumReturn | -0.147   |
| MinimumReturn | -179     |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013186231954023242
Validation loss = 0.0017341143684461713
Validation loss = 0.0019676662050187588
Validation loss = 0.0012630568817257881
Validation loss = 0.001059866277500987
Validation loss = 0.0011841306695714593
Validation loss = 0.0013247764436528087
Validation loss = 0.001466836896724999
Validation loss = 0.0011320755584165454
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001848671236075461
Validation loss = 0.0015087912324815989
Validation loss = 0.0014249616069719195
Validation loss = 0.000991109642200172
Validation loss = 0.001233454211615026
Validation loss = 0.0014092478668317199
Validation loss = 0.0016950330464169383
Validation loss = 0.0012377450475469232
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014545947778970003
Validation loss = 0.0014738410245627165
Validation loss = 0.0020162267610430717
Validation loss = 0.0009484069305472076
Validation loss = 0.0013579835649579763
Validation loss = 0.0008903351263143122
Validation loss = 0.0008669921080581844
Validation loss = 0.0016159421065822244
Validation loss = 0.0013050754787400365
Validation loss = 0.0009756507934071124
Validation loss = 0.001092398539185524
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015780135290697217
Validation loss = 0.0013293155934661627
Validation loss = 0.0012765835272148252
Validation loss = 0.001336337300017476
Validation loss = 0.001068859943188727
Validation loss = 0.0016803870676085353
Validation loss = 0.0010148941073566675
Validation loss = 0.001285627018660307
Validation loss = 0.0015498424181714654
Validation loss = 0.0013758782297372818
Validation loss = 0.001317389658652246
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023688373621553183
Validation loss = 0.0010538077913224697
Validation loss = 0.0017819070490077138
Validation loss = 0.001163147040642798
Validation loss = 0.00117971608415246
Validation loss = 0.0014231353998184204
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -39.6    |
| Iteration     | 58       |
| MaximumReturn | -0.109   |
| MinimumReturn | -205     |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001286422018893063
Validation loss = 0.001778805977664888
Validation loss = 0.0014741842169314623
Validation loss = 0.0012518103467300534
Validation loss = 0.0011260758619755507
Validation loss = 0.0012446962064132094
Validation loss = 0.00098127918317914
Validation loss = 0.0010993058094754815
Validation loss = 0.001219212543219328
Validation loss = 0.0016062157228589058
Validation loss = 0.0009801959386095405
Validation loss = 0.0010586022399365902
Validation loss = 0.0019239343237131834
Validation loss = 0.0010401069885119796
Validation loss = 0.0015415481757372618
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008941818960011005
Validation loss = 0.001276416820473969
Validation loss = 0.0010815153364092112
Validation loss = 0.0015626096865162253
Validation loss = 0.0017981450073421001
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017387280240654945
Validation loss = 0.000871152151376009
Validation loss = 0.001192130846902728
Validation loss = 0.0009611062123440206
Validation loss = 0.0009826960740610957
Validation loss = 0.0011817405465990305
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00105762155726552
Validation loss = 0.0012683435343205929
Validation loss = 0.0017371561843901873
Validation loss = 0.0012642090441659093
Validation loss = 0.0015193404396995902
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011260954197496176
Validation loss = 0.001319098169915378
Validation loss = 0.0010259884875267744
Validation loss = 0.0014160205610096455
Validation loss = 0.0028343633748590946
Validation loss = 0.0014538838295266032
Validation loss = 0.0020166272297501564
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -95.2    |
| Iteration     | 59       |
| MaximumReturn | -0.632   |
| MinimumReturn | -214     |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010423630010336637
Validation loss = 0.001142757711932063
Validation loss = 0.001232597162015736
Validation loss = 0.001050497405230999
Validation loss = 0.0013317539123818278
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000949390756431967
Validation loss = 0.0010825040517374873
Validation loss = 0.0009179794578813016
Validation loss = 0.000985040096566081
Validation loss = 0.0009423890733160079
Validation loss = 0.003207414411008358
Validation loss = 0.0010031398851424456
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017829060088843107
Validation loss = 0.0008748543914407492
Validation loss = 0.0008613417157903314
Validation loss = 0.0008030831231735647
Validation loss = 0.0009905763436108828
Validation loss = 0.0009341463446617126
Validation loss = 0.0007721528527326882
Validation loss = 0.0012544128112494946
Validation loss = 0.0010239226976409554
Validation loss = 0.0015913763782009482
Validation loss = 0.0011442310642451048
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014956035884097219
Validation loss = 0.0010328228818252683
Validation loss = 0.0014475692296400666
Validation loss = 0.0012978343293070793
Validation loss = 0.0013128062710165977
Validation loss = 0.0009704993572086096
Validation loss = 0.001241676276549697
Validation loss = 0.001398551743477583
Validation loss = 0.0010164175182580948
Validation loss = 0.0015719117363914847
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001496999990195036
Validation loss = 0.0011261095060035586
Validation loss = 0.0014797188341617584
Validation loss = 0.002560517517849803
Validation loss = 0.0009363400749862194
Validation loss = 0.0012770083267241716
Validation loss = 0.0012614454608410597
Validation loss = 0.0012346584117040038
Validation loss = 0.0010698611149564385
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -73.8    |
| Iteration     | 60       |
| MaximumReturn | -0.274   |
| MinimumReturn | -204     |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018756792414933443
Validation loss = 0.0013868320966139436
Validation loss = 0.0018617138266563416
Validation loss = 0.0014817473711445928
Validation loss = 0.0011507583549246192
Validation loss = 0.0013591033639386296
Validation loss = 0.001452482189051807
Validation loss = 0.001403955859132111
Validation loss = 0.0013217192608863115
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011600182624533772
Validation loss = 0.0012182072969153523
Validation loss = 0.0014284999342635274
Validation loss = 0.002178432885557413
Validation loss = 0.001218516263179481
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00207294849678874
Validation loss = 0.0015804871218279004
Validation loss = 0.0018635970773175359
Validation loss = 0.0012388840550556779
Validation loss = 0.0011129125487059355
Validation loss = 0.0010904509108513594
Validation loss = 0.0018395351944491267
Validation loss = 0.0013991703744977713
Validation loss = 0.0011930593755096197
Validation loss = 0.0011431262828409672
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010576220229268074
Validation loss = 0.0015369970351457596
Validation loss = 0.0018069923389703035
Validation loss = 0.0015056229894980788
Validation loss = 0.0014035128988325596
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010501006618142128
Validation loss = 0.0012866343604400754
Validation loss = 0.0016882545314729214
Validation loss = 0.0014154053060337901
Validation loss = 0.0026935283094644547
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -56.3    |
| Iteration     | 61       |
| MaximumReturn | -0.749   |
| MinimumReturn | -207     |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014445534907281399
Validation loss = 0.0012527572689577937
Validation loss = 0.00155043532140553
Validation loss = 0.0015038012061268091
Validation loss = 0.0011405425611883402
Validation loss = 0.0013036256423220038
Validation loss = 0.002173426328226924
Validation loss = 0.001953149912878871
Validation loss = 0.0016469658585265279
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013336065458133817
Validation loss = 0.001274254871532321
Validation loss = 0.0014482890255749226
Validation loss = 0.0020448858849704266
Validation loss = 0.0012918355641886592
Validation loss = 0.0013762933667749166
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013223005225881934
Validation loss = 0.0012217244366183877
Validation loss = 0.001082276226952672
Validation loss = 0.0018753407057374716
Validation loss = 0.0012929468648508191
Validation loss = 0.00198411219753325
Validation loss = 0.0011459322413429618
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016545859398320317
Validation loss = 0.0018351810285821557
Validation loss = 0.0015066936612129211
Validation loss = 0.0014469113666564226
Validation loss = 0.0013664255384355783
Validation loss = 0.0014398630009964108
Validation loss = 0.001236623851582408
Validation loss = 0.0014823430683463812
Validation loss = 0.001436439692042768
Validation loss = 0.0013445151271298528
Validation loss = 0.0010606343857944012
Validation loss = 0.0011886677239090204
Validation loss = 0.0014410801231861115
Validation loss = 0.0015673065790906549
Validation loss = 0.0015136790461838245
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001231512287631631
Validation loss = 0.0012745963176712394
Validation loss = 0.0017148201586678624
Validation loss = 0.0016471066046506166
Validation loss = 0.0011277132434770465
Validation loss = 0.0017029386945068836
Validation loss = 0.0012548222439363599
Validation loss = 0.0016889481339603662
Validation loss = 0.0010759818833321333
Validation loss = 0.0012645331444218755
Validation loss = 0.0015008299378678203
Validation loss = 0.0011561049614101648
Validation loss = 0.0010242998832836747
Validation loss = 0.0009587609092704952
Validation loss = 0.0011456627398729324
Validation loss = 0.0012005972675979137
Validation loss = 0.0009859777055680752
Validation loss = 0.0013529693242162466
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -46.5    |
| Iteration     | 62       |
| MaximumReturn | -0.963   |
| MinimumReturn | -109     |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010729050263762474
Validation loss = 0.0010282205184921622
Validation loss = 0.0010649316245689988
Validation loss = 0.001192629337310791
Validation loss = 0.0014720999170094728
Validation loss = 0.001012691529467702
Validation loss = 0.001197926001623273
Validation loss = 0.0011771454010158777
Validation loss = 0.0011680498719215393
Validation loss = 0.0014883410185575485
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011760744964703918
Validation loss = 0.0010234155924990773
Validation loss = 0.0012909786310046911
Validation loss = 0.0015622405335307121
Validation loss = 0.0012054599355906248
Validation loss = 0.0012154411524534225
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011859050719067454
Validation loss = 0.000990413362160325
Validation loss = 0.001704169437289238
Validation loss = 0.0010126844281330705
Validation loss = 0.003100503468886018
Validation loss = 0.0008898020605556667
Validation loss = 0.0012085635680705309
Validation loss = 0.0013163798721507192
Validation loss = 0.001583714154548943
Validation loss = 0.001162539585493505
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001349787344224751
Validation loss = 0.0010382711188867688
Validation loss = 0.0021189844701439142
Validation loss = 0.0012544256169348955
Validation loss = 0.0013079950585961342
Validation loss = 0.0014429743168875575
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011258184676989913
Validation loss = 0.0012379528488963842
Validation loss = 0.0014374874299392104
Validation loss = 0.0011293560964986682
Validation loss = 0.002003587083891034
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -66.1    |
| Iteration     | 63       |
| MaximumReturn | -1.37    |
| MinimumReturn | -180     |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010965346591547132
Validation loss = 0.0017565335147082806
Validation loss = 0.001305468613281846
Validation loss = 0.0012944102054461837
Validation loss = 0.0012625226518139243
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014303852804005146
Validation loss = 0.0014248342486098409
Validation loss = 0.0013519133208319545
Validation loss = 0.0011036524083465338
Validation loss = 0.0012371684424579144
Validation loss = 0.001153342891484499
Validation loss = 0.0016446716617792845
Validation loss = 0.0010894996812567115
Validation loss = 0.0011501419357955456
Validation loss = 0.0013391143875196576
Validation loss = 0.0010833216365426779
Validation loss = 0.0014624992618337274
Validation loss = 0.0012489528162404895
Validation loss = 0.0013019563630223274
Validation loss = 0.001056275679729879
Validation loss = 0.0012648262782022357
Validation loss = 0.0011862536193802953
Validation loss = 0.0013012280687689781
Validation loss = 0.0011140318820253015
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001085109543055296
Validation loss = 0.0012561407638713717
Validation loss = 0.0012269100407138467
Validation loss = 0.001374404993839562
Validation loss = 0.0011187675409018993
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016662091948091984
Validation loss = 0.001262417994439602
Validation loss = 0.0012784675927832723
Validation loss = 0.0010090257273986936
Validation loss = 0.0014414163306355476
Validation loss = 0.001307410653680563
Validation loss = 0.001218765857629478
Validation loss = 0.0013738037087023258
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012582167983055115
Validation loss = 0.001050734892487526
Validation loss = 0.0010174753842875361
Validation loss = 0.0011340426281094551
Validation loss = 0.0015327886212617159
Validation loss = 0.002015454461798072
Validation loss = 0.0010465157683938742
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -90.1    |
| Iteration     | 64       |
| MaximumReturn | -0.0799  |
| MinimumReturn | -194     |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014563228469341993
Validation loss = 0.0011574134696274996
Validation loss = 0.0010048765689134598
Validation loss = 0.0012232071021571755
Validation loss = 0.0012636656174436212
Validation loss = 0.001619854592718184
Validation loss = 0.0010390989482402802
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012420359998941422
Validation loss = 0.00126850965898484
Validation loss = 0.0011401486117392778
Validation loss = 0.0010244417935609818
Validation loss = 0.0010757609270513058
Validation loss = 0.001461456879042089
Validation loss = 0.0011661250609904528
Validation loss = 0.0011569333728402853
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002399850869551301
Validation loss = 0.0009552189148962498
Validation loss = 0.0012000951683148742
Validation loss = 0.0012299219379201531
Validation loss = 0.0010395343415439129
Validation loss = 0.0010632751509547234
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015943680191412568
Validation loss = 0.0017037278739735484
Validation loss = 0.0018023032462224364
Validation loss = 0.0012264660326763988
Validation loss = 0.0011585754109546542
Validation loss = 0.0013409571256488562
Validation loss = 0.0023497273214161396
Validation loss = 0.0012105325004085898
Validation loss = 0.001369285979308188
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011084374273195863
Validation loss = 0.001636852277442813
Validation loss = 0.0014244125923141837
Validation loss = 0.0010430450784042478
Validation loss = 0.0016327324556186795
Validation loss = 0.001180422492325306
Validation loss = 0.0019283018773421645
Validation loss = 0.0012453506933525205
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -66.2    |
| Iteration     | 65       |
| MaximumReturn | -0.333   |
| MinimumReturn | -177     |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001392561593092978
Validation loss = 0.001227655215188861
Validation loss = 0.0010843597119674087
Validation loss = 0.001176639343611896
Validation loss = 0.001253756694495678
Validation loss = 0.001065727206878364
Validation loss = 0.0010631646728143096
Validation loss = 0.0011387265985831618
Validation loss = 0.0011736502638086677
Validation loss = 0.0011407567653805017
Validation loss = 0.0010652620112523437
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011980129638686776
Validation loss = 0.0009294720948673785
Validation loss = 0.0009925165213644505
Validation loss = 0.0012129339156672359
Validation loss = 0.001174267614260316
Validation loss = 0.0019437113078311086
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010313650127500296
Validation loss = 0.0016632413025945425
Validation loss = 0.0014237751020118594
Validation loss = 0.001123072812333703
Validation loss = 0.0009940138552337885
Validation loss = 0.0010430690599605441
Validation loss = 0.0010909652337431908
Validation loss = 0.002238761866465211
Validation loss = 0.001043225172907114
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012112808180972934
Validation loss = 0.0014857238857075572
Validation loss = 0.0009491269593127072
Validation loss = 0.0011076143709942698
Validation loss = 0.0010132667375728488
Validation loss = 0.0014933089260011911
Validation loss = 0.0011272069532424212
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011699579190462828
Validation loss = 0.0009476789273321629
Validation loss = 0.001878903480246663
Validation loss = 0.0009726674179546535
Validation loss = 0.001563839497976005
Validation loss = 0.0008620488806627691
Validation loss = 0.0008395745535381138
Validation loss = 0.0013282023137435317
Validation loss = 0.0009491223026998341
Validation loss = 0.0009496940765529871
Validation loss = 0.0011092321947216988
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -11.5     |
| Iteration     | 66        |
| MaximumReturn | -0.000857 |
| MinimumReturn | -156      |
| TotalSamples  | 113288    |
-----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011786343529820442
Validation loss = 0.0010024415096268058
Validation loss = 0.0009245019755326211
Validation loss = 0.0014994742814451456
Validation loss = 0.0012977203587070107
Validation loss = 0.0010138091165572405
Validation loss = 0.0009731249301694334
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002578699728474021
Validation loss = 0.0011627755593508482
Validation loss = 0.0011106835445389152
Validation loss = 0.0014367550611495972
Validation loss = 0.00112096534576267
Validation loss = 0.0012078633299097419
Validation loss = 0.0010699090780690312
Validation loss = 0.0012565666111186147
Validation loss = 0.000919460435397923
Validation loss = 0.0009210401331074536
Validation loss = 0.0012458201963454485
Validation loss = 0.0016471750568598509
Validation loss = 0.001076573971658945
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013492918806150556
Validation loss = 0.0010815610876306891
Validation loss = 0.001118513522669673
Validation loss = 0.0010807124199345708
Validation loss = 0.0011669477680698037
Validation loss = 0.0009107216028496623
Validation loss = 0.0012907034251838923
Validation loss = 0.001283007557503879
Validation loss = 0.0010383442277088761
Validation loss = 0.001066837809048593
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00104469561483711
Validation loss = 0.0009493296965956688
Validation loss = 0.0012229038402438164
Validation loss = 0.0013518886407837272
Validation loss = 0.001344942138530314
Validation loss = 0.0010093837045133114
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014255227288231254
Validation loss = 0.0010755431139841676
Validation loss = 0.0010363521287217736
Validation loss = 0.0010190315078943968
Validation loss = 0.0013008933747187257
Validation loss = 0.0011322019854560494
Validation loss = 0.0010621106484904885
Validation loss = 0.0010771227534860373
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -107     |
| Iteration     | 67       |
| MaximumReturn | -0.255   |
| MinimumReturn | -209     |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000917455879971385
Validation loss = 0.001201988779939711
Validation loss = 0.0008283664938062429
Validation loss = 0.0007600198732689023
Validation loss = 0.0014811072032898664
Validation loss = 0.0008269839454442263
Validation loss = 0.0010077530751004815
Validation loss = 0.0010304645402356982
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013104361714795232
Validation loss = 0.0010812985710799694
Validation loss = 0.0016212373739108443
Validation loss = 0.0008592116646468639
Validation loss = 0.0007906571845524013
Validation loss = 0.0012379318941384554
Validation loss = 0.0013015518197789788
Validation loss = 0.0013490066630765796
Validation loss = 0.0012632692232728004
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010966522386297584
Validation loss = 0.0014482495607808232
Validation loss = 0.0010473697911947966
Validation loss = 0.0011056414805352688
Validation loss = 0.0010960972867906094
Validation loss = 0.0010644241701811552
Validation loss = 0.0008013924234546721
Validation loss = 0.0008621710003353655
Validation loss = 0.0009159276378341019
Validation loss = 0.0009243995882570744
Validation loss = 0.0008204615442082286
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008656395366415381
Validation loss = 0.001400548149831593
Validation loss = 0.0009293226758018136
Validation loss = 0.0012515595881268382
Validation loss = 0.0008695179712958634
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014653221005573869
Validation loss = 0.0009984179632738233
Validation loss = 0.0008661444298923016
Validation loss = 0.000983544043265283
Validation loss = 0.0009100234601646662
Validation loss = 0.0009378517279401422
Validation loss = 0.0012092177057638764
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -23.2    |
| Iteration     | 68       |
| MaximumReturn | -0.435   |
| MinimumReturn | -131     |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0023487901780754328
Validation loss = 0.0008643352775834501
Validation loss = 0.0009592087590135634
Validation loss = 0.0009775813668966293
Validation loss = 0.0009066681959666312
Validation loss = 0.0018712218152359128
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012491625966504216
Validation loss = 0.0010593585902824998
Validation loss = 0.0016600179951637983
Validation loss = 0.0010320735163986683
Validation loss = 0.0012236888287588954
Validation loss = 0.0008904606220312417
Validation loss = 0.0008829659200273454
Validation loss = 0.0008278301102109253
Validation loss = 0.0010792500106617808
Validation loss = 0.001200944883748889
Validation loss = 0.0009716714266687632
Validation loss = 0.0008342558867298067
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0023404082749038935
Validation loss = 0.0011038545053452253
Validation loss = 0.0010531141888350248
Validation loss = 0.00141990277916193
Validation loss = 0.0013063957449048758
Validation loss = 0.0008377080084756017
Validation loss = 0.0013002228224650025
Validation loss = 0.0014122051652520895
Validation loss = 0.0009762543486431241
Validation loss = 0.0013994411565363407
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009927189676091075
Validation loss = 0.0012142733903601766
Validation loss = 0.0009623878868296742
Validation loss = 0.0009507976938039064
Validation loss = 0.0012743412517011166
Validation loss = 0.0012050975346937776
Validation loss = 0.000854094629175961
Validation loss = 0.0009641587967053056
Validation loss = 0.0015557592269033194
Validation loss = 0.0010183637496083975
Validation loss = 0.0010320160072296858
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010622561676427722
Validation loss = 0.0010288110934197903
Validation loss = 0.0014691584510728717
Validation loss = 0.000933780858758837
Validation loss = 0.0011378780473023653
Validation loss = 0.0010232753120362759
Validation loss = 0.0011383031960576773
Validation loss = 0.0008932955097407103
Validation loss = 0.001386150368489325
Validation loss = 0.0008727056556381285
Validation loss = 0.0015647130785509944
Validation loss = 0.0009650304564274848
Validation loss = 0.0009421121794730425
Validation loss = 0.0010408266680315137
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0401  |
| Iteration     | 69       |
| MaximumReturn | -0.00102 |
| MinimumReturn | -0.851   |
| TotalSamples  | 118286   |
----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010487157851457596
Validation loss = 0.0013224533759057522
Validation loss = 0.0010778318392112851
Validation loss = 0.0016254282090812922
Validation loss = 0.0010594105115160346
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010105815017595887
Validation loss = 0.0009865745669230819
Validation loss = 0.0010194695787504315
Validation loss = 0.001627280842512846
Validation loss = 0.0009112588595598936
Validation loss = 0.0008109465125016868
Validation loss = 0.0009361489792354405
Validation loss = 0.001287243328988552
Validation loss = 0.0008827308192849159
Validation loss = 0.0009848505724221468
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010175877250730991
Validation loss = 0.0009113247506320477
Validation loss = 0.0008934747893363237
Validation loss = 0.0008643781766295433
Validation loss = 0.0013904396910220385
Validation loss = 0.0007913545123301446
Validation loss = 0.0007849098183214664
Validation loss = 0.001058470457792282
Validation loss = 0.0010654604993760586
Validation loss = 0.0009600145858712494
Validation loss = 0.0008855496416799724
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012064696056768298
Validation loss = 0.0014528926694765687
Validation loss = 0.0010334920370951295
Validation loss = 0.0018649873090907931
Validation loss = 0.001683671260252595
Validation loss = 0.0011868884321302176
Validation loss = 0.0010406716028228402
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008564354502595961
Validation loss = 0.0010843941709026694
Validation loss = 0.0014039279194548726
Validation loss = 0.0009166627423837781
Validation loss = 0.0010862979106605053
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -45.9    |
| Iteration     | 70       |
| MaximumReturn | -0.203   |
| MinimumReturn | -196     |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010691393399611115
Validation loss = 0.0012292794417589903
Validation loss = 0.0008263506460934877
Validation loss = 0.0009411208448000252
Validation loss = 0.0008897243533283472
Validation loss = 0.00077532121213153
Validation loss = 0.001030275714583695
Validation loss = 0.0013614698546007276
Validation loss = 0.00104765803553164
Validation loss = 0.0008343158988282084
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010125812841579318
Validation loss = 0.0009943974437192082
Validation loss = 0.0008674610289745033
Validation loss = 0.0011245447676628828
Validation loss = 0.0007939461502246559
Validation loss = 0.0008698547608219087
Validation loss = 0.001240897923707962
Validation loss = 0.0011703206691890955
Validation loss = 0.0009524860652163625
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007991004968062043
Validation loss = 0.000851124175824225
Validation loss = 0.0009482221212238073
Validation loss = 0.0009091931860893965
Validation loss = 0.001044045784510672
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011452421313151717
Validation loss = 0.0011057781521230936
Validation loss = 0.001187979127280414
Validation loss = 0.0009210245916619897
Validation loss = 0.0010288343764841557
Validation loss = 0.0014125151792541146
Validation loss = 0.0011598251294344664
Validation loss = 0.0010566826676949859
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009552543633617461
Validation loss = 0.0008550456841476262
Validation loss = 0.0010226041777059436
Validation loss = 0.0008447229629382491
Validation loss = 0.0009501026361249387
Validation loss = 0.001131333177909255
Validation loss = 0.0015845652669668198
Validation loss = 0.0011332976864650846
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -35      |
| Iteration     | 71       |
| MaximumReturn | -0.212   |
| MinimumReturn | -146     |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009570462279953063
Validation loss = 0.0008257125155068934
Validation loss = 0.0009392022038809955
Validation loss = 0.0010463222861289978
Validation loss = 0.001017084694467485
Validation loss = 0.0010133238974958658
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009541650069877505
Validation loss = 0.001056839362718165
Validation loss = 0.0008181718876585364
Validation loss = 0.0010380171006545424
Validation loss = 0.0009653930901549757
Validation loss = 0.0008166522602550685
Validation loss = 0.0009524495108053088
Validation loss = 0.0009525829809717834
Validation loss = 0.0009752992773428559
Validation loss = 0.0009117452427744865
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009404883021488786
Validation loss = 0.001375588239170611
Validation loss = 0.000853812147397548
Validation loss = 0.0009756095241755247
Validation loss = 0.001043418189510703
Validation loss = 0.0009135008440352976
Validation loss = 0.000797105545643717
Validation loss = 0.0009141684859059751
Validation loss = 0.0007697283872403204
Validation loss = 0.0011322583304718137
Validation loss = 0.0008979769190773368
Validation loss = 0.0009071928216144443
Validation loss = 0.0009100621100515127
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009379589464515448
Validation loss = 0.000982066267170012
Validation loss = 0.0008545771706849337
Validation loss = 0.0011896591167896986
Validation loss = 0.0008168866625055671
Validation loss = 0.0014624622417613864
Validation loss = 0.0010583974653854966
Validation loss = 0.0009277023491449654
Validation loss = 0.0012222315417602658
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011137163965031505
Validation loss = 0.0010579056106507778
Validation loss = 0.0010934628080576658
Validation loss = 0.0008655863930471241
Validation loss = 0.0008607945637777448
Validation loss = 0.0010406932560727
Validation loss = 0.0007598482188768685
Validation loss = 0.000937029195483774
Validation loss = 0.0008868506993167102
Validation loss = 0.0008963026339188218
Validation loss = 0.000924230960663408
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -134     |
| Iteration     | 72       |
| MaximumReturn | -4.12    |
| MinimumReturn | -207     |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010421853512525558
Validation loss = 0.0008534601656720042
Validation loss = 0.0007751983357593417
Validation loss = 0.001086564501747489
Validation loss = 0.0011031607864424586
Validation loss = 0.0010217365343123674
Validation loss = 0.0009360794210806489
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011477821972221136
Validation loss = 0.000776912085711956
Validation loss = 0.0008082583663053811
Validation loss = 0.0009045093902386725
Validation loss = 0.0011998256668448448
Validation loss = 0.0007853922434151173
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012018928537145257
Validation loss = 0.0011512957280501723
Validation loss = 0.0007942341035231948
Validation loss = 0.0010115290060639381
Validation loss = 0.0008104567532427609
Validation loss = 0.0007658839458599687
Validation loss = 0.0013184647541493177
Validation loss = 0.001142506254836917
Validation loss = 0.0008043229463510215
Validation loss = 0.0008981531136669219
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011663204059004784
Validation loss = 0.000993217690847814
Validation loss = 0.0009868537308648229
Validation loss = 0.0010129614965990186
Validation loss = 0.000780509493779391
Validation loss = 0.0009063324541784823
Validation loss = 0.0009192752186208963
Validation loss = 0.0007504645036533475
Validation loss = 0.0007552431197836995
Validation loss = 0.0014830625150352716
Validation loss = 0.0010924643138423562
Validation loss = 0.0008350176503881812
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009226355468854308
Validation loss = 0.0014994172379374504
Validation loss = 0.0009009019122458994
Validation loss = 0.0007890830747783184
Validation loss = 0.0010027147363871336
Validation loss = 0.0008886258583515882
Validation loss = 0.0009647454135119915
Validation loss = 0.0009989659301936626
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -99.1    |
| Iteration     | 73       |
| MaximumReturn | -0.52    |
| MinimumReturn | -221     |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010945041431114078
Validation loss = 0.0009230186697095633
Validation loss = 0.0009461045265197754
Validation loss = 0.0010983562096953392
Validation loss = 0.0008342282962985337
Validation loss = 0.0011071970220655203
Validation loss = 0.0010696692625060678
Validation loss = 0.0010486276587471366
Validation loss = 0.0009704120457172394
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00083001988241449
Validation loss = 0.0011465761344879866
Validation loss = 0.0008877081563696265
Validation loss = 0.00100362254306674
Validation loss = 0.001130383345298469
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008261071634478867
Validation loss = 0.0011311803245916963
Validation loss = 0.0007640343974344432
Validation loss = 0.0008678544545546174
Validation loss = 0.0008760999771766365
Validation loss = 0.0010005357908084989
Validation loss = 0.0009692844469100237
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014686256181448698
Validation loss = 0.001076177810318768
Validation loss = 0.0013699092669412494
Validation loss = 0.0010714377276599407
Validation loss = 0.001113208243623376
Validation loss = 0.0008273813291452825
Validation loss = 0.0009423128212802112
Validation loss = 0.0008673136471770704
Validation loss = 0.0008940824191085994
Validation loss = 0.0009451204678043723
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009201783686876297
Validation loss = 0.0008994362433440983
Validation loss = 0.001356426510028541
Validation loss = 0.0012747047003358603
Validation loss = 0.0009162117494270205
Validation loss = 0.00108935940079391
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -142     |
| Iteration     | 74       |
| MaximumReturn | -0.885   |
| MinimumReturn | -212     |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009870355715975165
Validation loss = 0.0008605854236520827
Validation loss = 0.0010286783799529076
Validation loss = 0.0008432971662841737
Validation loss = 0.0010664645815268159
Validation loss = 0.0010291339131072164
Validation loss = 0.0008553407969884574
Validation loss = 0.0008783769444562495
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001083932351320982
Validation loss = 0.0009288868750445545
Validation loss = 0.0008592961821705103
Validation loss = 0.0011895805364474654
Validation loss = 0.000903541105799377
Validation loss = 0.000812692625913769
Validation loss = 0.0009575635194778442
Validation loss = 0.0008817393681965768
Validation loss = 0.0010254621738567948
Validation loss = 0.0011573696974664927
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011747385142371058
Validation loss = 0.0010098362108692527
Validation loss = 0.001124559435993433
Validation loss = 0.0010152362519875169
Validation loss = 0.0012231674045324326
Validation loss = 0.0008666460053063929
Validation loss = 0.0008391131996177137
Validation loss = 0.0008230030653066933
Validation loss = 0.0008964433800429106
Validation loss = 0.0009823497384786606
Validation loss = 0.0008063761051744223
Validation loss = 0.0008532004430890083
Validation loss = 0.00093609414761886
Validation loss = 0.0008809363353066146
Validation loss = 0.0008062034612521529
Validation loss = 0.0008263722411356866
Validation loss = 0.0010030248668044806
Validation loss = 0.001039709779433906
Validation loss = 0.0010018263710662723
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000877287529874593
Validation loss = 0.0008738060132600367
Validation loss = 0.0008324855007231236
Validation loss = 0.0010581433307379484
Validation loss = 0.0011078089009970427
Validation loss = 0.0009312247857451439
Validation loss = 0.0009236260084435344
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000944139261264354
Validation loss = 0.0009867687476798892
Validation loss = 0.0009908622596412897
Validation loss = 0.0009939979063346982
Validation loss = 0.001104431808926165
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -66.9    |
| Iteration     | 75       |
| MaximumReturn | -0.0385  |
| MinimumReturn | -177     |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008735050796531141
Validation loss = 0.0010580261005088687
Validation loss = 0.0010969110298901796
Validation loss = 0.0008890837198123336
Validation loss = 0.0012579865287989378
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011308377142995596
Validation loss = 0.0009507900686003268
Validation loss = 0.001154092838987708
Validation loss = 0.0010641736444085836
Validation loss = 0.0010662442073225975
Validation loss = 0.0010291035287082195
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007934500463306904
Validation loss = 0.0008420573431067169
Validation loss = 0.0009445348987355828
Validation loss = 0.0008711478440091014
Validation loss = 0.0011244823690503836
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009081235621124506
Validation loss = 0.0010622908594086766
Validation loss = 0.0009877843549475074
Validation loss = 0.0009155416046269238
Validation loss = 0.0011420113733038306
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010497551411390305
Validation loss = 0.0008899335516616702
Validation loss = 0.0010360601590946317
Validation loss = 0.0009261564118787646
Validation loss = 0.0010979759972542524
Validation loss = 0.0009531747782602906
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -122     |
| Iteration     | 76       |
| MaximumReturn | -4.56    |
| MinimumReturn | -225     |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001210710033774376
Validation loss = 0.0010073694866150618
Validation loss = 0.0008633739780634642
Validation loss = 0.0009944216581061482
Validation loss = 0.0008290158002637327
Validation loss = 0.001016805530525744
Validation loss = 0.0010238692630082369
Validation loss = 0.0009760279790498316
Validation loss = 0.001117817941121757
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009236214100383222
Validation loss = 0.0011411457089707255
Validation loss = 0.0008901210385374725
Validation loss = 0.0009299427038058639
Validation loss = 0.0011514464858919382
Validation loss = 0.0009633431327529252
Validation loss = 0.0010202310513705015
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010625123977661133
Validation loss = 0.0008908507297746837
Validation loss = 0.0009023701422847807
Validation loss = 0.0009131698170676827
Validation loss = 0.0009738821536302567
Validation loss = 0.0009826085297390819
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011618012795224786
Validation loss = 0.0010169495362788439
Validation loss = 0.0009901707526296377
Validation loss = 0.0009933211840689182
Validation loss = 0.0008655562996864319
Validation loss = 0.0008806974510662258
Validation loss = 0.0012052538804709911
Validation loss = 0.0009207217954099178
Validation loss = 0.0009036642732098699
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009251335868611932
Validation loss = 0.0010150587186217308
Validation loss = 0.0008141587604768574
Validation loss = 0.0008628934738226235
Validation loss = 0.0009129696991294622
Validation loss = 0.0009174766018986702
Validation loss = 0.0009237704216502607
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -24.8    |
| Iteration     | 77       |
| MaximumReturn | -0.201   |
| MinimumReturn | -131     |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009538534795865417
Validation loss = 0.0008780499920248985
Validation loss = 0.0010425997897982597
Validation loss = 0.0009469566284678876
Validation loss = 0.0010273715015500784
Validation loss = 0.0011311203707009554
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013147823046892881
Validation loss = 0.0009235838078893721
Validation loss = 0.000990379718132317
Validation loss = 0.0010327991330996156
Validation loss = 0.0010939964558929205
Validation loss = 0.001050731516443193
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007867731619626284
Validation loss = 0.0008377819322049618
Validation loss = 0.0008677867008373141
Validation loss = 0.000978319556452334
Validation loss = 0.0015985487261787057
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009805811569094658
Validation loss = 0.0018023522570729256
Validation loss = 0.00125301128719002
Validation loss = 0.001207574037835002
Validation loss = 0.0007934141322039068
Validation loss = 0.0009009450441226363
Validation loss = 0.0009562162449583411
Validation loss = 0.001137976418249309
Validation loss = 0.0009066893253475428
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000916948658414185
Validation loss = 0.0009186584502458572
Validation loss = 0.0010667485184967518
Validation loss = 0.0008934919605962932
Validation loss = 0.0009979645255953074
Validation loss = 0.0009885681793093681
Validation loss = 0.0011298239696770906
Validation loss = 0.0008838853100314736
Validation loss = 0.0009594758739694953
Validation loss = 0.001020237454213202
Validation loss = 0.0009751711622811854
Validation loss = 0.0010102967498824
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -43.9    |
| Iteration     | 78       |
| MaximumReturn | -0.423   |
| MinimumReturn | -212     |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009187686955556273
Validation loss = 0.0009785544825717807
Validation loss = 0.0009139194153249264
Validation loss = 0.0010655263904482126
Validation loss = 0.000793333281762898
Validation loss = 0.0009401767747476697
Validation loss = 0.0009900638833642006
Validation loss = 0.0007762620225548744
Validation loss = 0.0008630567463114858
Validation loss = 0.0008654220146127045
Validation loss = 0.0009466607589274645
Validation loss = 0.0010210442123934627
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009213634766638279
Validation loss = 0.0009093874250538647
Validation loss = 0.0008053993224166334
Validation loss = 0.0009847123874351382
Validation loss = 0.0009432673105038702
Validation loss = 0.0008929811301641166
Validation loss = 0.001152709941379726
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007573767798021436
Validation loss = 0.0007659951806999743
Validation loss = 0.0008140225545503199
Validation loss = 0.00072214484680444
Validation loss = 0.0012967792572453618
Validation loss = 0.0008956452365964651
Validation loss = 0.0014009792357683182
Validation loss = 0.0008516342495568097
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009263346437364817
Validation loss = 0.001221023965626955
Validation loss = 0.0009658736526034772
Validation loss = 0.0008431753376498818
Validation loss = 0.0011645628837868571
Validation loss = 0.0009609237313270569
Validation loss = 0.0010701268911361694
Validation loss = 0.0009024282917380333
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000914321921300143
Validation loss = 0.0010876007145270705
Validation loss = 0.0007999629015102983
Validation loss = 0.000943693972658366
Validation loss = 0.0009258993086405098
Validation loss = 0.0009982158662751317
Validation loss = 0.0008216686546802521
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -47.6    |
| Iteration     | 79       |
| MaximumReturn | -0.804   |
| MinimumReturn | -176     |
| TotalSamples  | 134946   |
----------------------------
