Logging to experiments/half_cheetah/control-affine/halfcheetah_seed4321
Print configuration .....
{'env_name': 'half_cheetah', 'random_seeds': [4321, 2314, 2341, 3421], 'save_variables': False, 'model_save_dir': '/tmp/half_cheetah_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 40, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5645334720611572
Validation loss = 0.13976511359214783
Validation loss = 0.10225415229797363
Validation loss = 0.09134647250175476
Validation loss = 0.08635418117046356
Validation loss = 0.08550363779067993
Validation loss = 0.08590371906757355
Validation loss = 0.08291386067867279
Validation loss = 0.08011949062347412
Validation loss = 0.0862606093287468
Validation loss = 0.07789620757102966
Validation loss = 0.08201391994953156
Validation loss = 0.07726442068815231
Validation loss = 0.07675325870513916
Validation loss = 0.0761043056845665
Validation loss = 0.07535018026828766
Validation loss = 0.07817176729440689
Validation loss = 0.07594026625156403
Validation loss = 0.07663622498512268
Validation loss = 0.08163231611251831
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5712026357650757
Validation loss = 0.14066989719867706
Validation loss = 0.10447223484516144
Validation loss = 0.09170541167259216
Validation loss = 0.0918150320649147
Validation loss = 0.08507169783115387
Validation loss = 0.0820838063955307
Validation loss = 0.08129705488681793
Validation loss = 0.08444914221763611
Validation loss = 0.07890789955854416
Validation loss = 0.07984481751918793
Validation loss = 0.07599853724241257
Validation loss = 0.07707914710044861
Validation loss = 0.07614500820636749
Validation loss = 0.07680211216211319
Validation loss = 0.07315310835838318
Validation loss = 0.07533738017082214
Validation loss = 0.07388478517532349
Validation loss = 0.07911916077136993
Validation loss = 0.07782892882823944
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.597446084022522
Validation loss = 0.14086802303791046
Validation loss = 0.10147929191589355
Validation loss = 0.08986017853021622
Validation loss = 0.08420681953430176
Validation loss = 0.0840056836605072
Validation loss = 0.08398106694221497
Validation loss = 0.08032585680484772
Validation loss = 0.07785750925540924
Validation loss = 0.07883863896131516
Validation loss = 0.076298788189888
Validation loss = 0.07834304869174957
Validation loss = 0.07487455755472183
Validation loss = 0.08372163772583008
Validation loss = 0.07521355152130127
Validation loss = 0.07473808526992798
Validation loss = 0.0736515074968338
Validation loss = 0.07220052182674408
Validation loss = 0.08787791430950165
Validation loss = 0.07356388121843338
Validation loss = 0.078509122133255
Validation loss = 0.07175511866807938
Validation loss = 0.0794162005186081
Validation loss = 0.0707974061369896
Validation loss = 0.07259298115968704
Validation loss = 0.07180827111005783
Validation loss = 0.07123889029026031
Validation loss = 0.07894111424684525
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.592383623123169
Validation loss = 0.14278987050056458
Validation loss = 0.10247214138507843
Validation loss = 0.09167551249265671
Validation loss = 0.08990450203418732
Validation loss = 0.08685486018657684
Validation loss = 0.08315132558345795
Validation loss = 0.08251763880252838
Validation loss = 0.09135434776544571
Validation loss = 0.08022228628396988
Validation loss = 0.08053191751241684
Validation loss = 0.07619574666023254
Validation loss = 0.07737700641155243
Validation loss = 0.08518044650554657
Validation loss = 0.076503686606884
Validation loss = 0.0764937698841095
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5716816186904907
Validation loss = 0.1395028531551361
Validation loss = 0.10172636806964874
Validation loss = 0.09061577916145325
Validation loss = 0.08842463791370392
Validation loss = 0.08357488363981247
Validation loss = 0.08135047554969788
Validation loss = 0.08516863733530045
Validation loss = 0.07719773799180984
Validation loss = 0.07621575891971588
Validation loss = 0.08865992724895477
Validation loss = 0.075797438621521
Validation loss = 0.07760173082351685
Validation loss = 0.07506410777568817
Validation loss = 0.07363332808017731
Validation loss = 0.08172497153282166
Validation loss = 0.07312062382698059
Validation loss = 0.07539378106594086
Validation loss = 0.07355424761772156
Validation loss = 0.07627435028553009
Validation loss = 0.07322348654270172
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -376     |
| Iteration     | 0        |
| MaximumReturn | -317     |
| MinimumReturn | -432     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.19576556980609894
Validation loss = 0.13566021621227264
Validation loss = 0.12662500143051147
Validation loss = 0.12335965037345886
Validation loss = 0.12332414090633392
Validation loss = 0.12256080657243729
Validation loss = 0.12977249920368195
Validation loss = 0.11902685463428497
Validation loss = 0.12000079452991486
Validation loss = 0.12072344124317169
Validation loss = 0.12127718329429626
Validation loss = 0.11773854494094849
Validation loss = 0.12527254223823547
Validation loss = 0.12044443190097809
Validation loss = 0.13028880953788757
Validation loss = 0.1211158037185669
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.18650920689105988
Validation loss = 0.13540448248386383
Validation loss = 0.1265871226787567
Validation loss = 0.12320602685213089
Validation loss = 0.1242174431681633
Validation loss = 0.12654078006744385
Validation loss = 0.11836090683937073
Validation loss = 0.1191062480211258
Validation loss = 0.12078270316123962
Validation loss = 0.1174791008234024
Validation loss = 0.12421558797359467
Validation loss = 0.11889153718948364
Validation loss = 0.11830432713031769
Validation loss = 0.12125822901725769
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.18659202754497528
Validation loss = 0.13757696747779846
Validation loss = 0.12489096075296402
Validation loss = 0.12466014921665192
Validation loss = 0.11994793266057968
Validation loss = 0.1190396100282669
Validation loss = 0.11962945759296417
Validation loss = 0.13132832944393158
Validation loss = 0.1203562468290329
Validation loss = 0.11751426011323929
Validation loss = 0.12020687758922577
Validation loss = 0.11991758644580841
Validation loss = 0.11876419186592102
Validation loss = 0.11928486824035645
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.18909165263175964
Validation loss = 0.1384667009115219
Validation loss = 0.1291327029466629
Validation loss = 0.12442809343338013
Validation loss = 0.12221692502498627
Validation loss = 0.1183936595916748
Validation loss = 0.11950401961803436
Validation loss = 0.1227625161409378
Validation loss = 0.11764806509017944
Validation loss = 0.13313528895378113
Validation loss = 0.11923966556787491
Validation loss = 0.12406166642904282
Validation loss = 0.12342928349971771
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.19164738059043884
Validation loss = 0.13801711797714233
Validation loss = 0.12526768445968628
Validation loss = 0.11984063684940338
Validation loss = 0.12197462469339371
Validation loss = 0.11850149929523468
Validation loss = 0.1183595135807991
Validation loss = 0.12118332087993622
Validation loss = 0.11953888833522797
Validation loss = 0.11949083209037781
Validation loss = 0.11616997420787811
Validation loss = 0.1229461282491684
Validation loss = 0.12031298875808716
Validation loss = 0.11921271681785583
Validation loss = 0.11786539852619171
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -351     |
| Iteration     | 1        |
| MaximumReturn | -302     |
| MinimumReturn | -417     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12185018509626389
Validation loss = 0.11378741264343262
Validation loss = 0.11291158199310303
Validation loss = 0.11311739683151245
Validation loss = 0.11742191761732101
Validation loss = 0.11838240176439285
Validation loss = 0.11584722995758057
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1328139752149582
Validation loss = 0.11329749971628189
Validation loss = 0.11313647031784058
Validation loss = 0.11382126808166504
Validation loss = 0.12047338485717773
Validation loss = 0.1175348162651062
Validation loss = 0.11380187422037125
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.12404417991638184
Validation loss = 0.11294124275445938
Validation loss = 0.11292564868927002
Validation loss = 0.1225157305598259
Validation loss = 0.11424694210290909
Validation loss = 0.11602618545293808
Validation loss = 0.11617618799209595
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.13363780081272125
Validation loss = 0.11448201537132263
Validation loss = 0.11515635997056961
Validation loss = 0.12101093679666519
Validation loss = 0.11747536063194275
Validation loss = 0.11657753586769104
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11886986345052719
Validation loss = 0.11198728531599045
Validation loss = 0.11249551922082901
Validation loss = 0.11788048595190048
Validation loss = 0.1178562268614769
Validation loss = 0.11652185767889023
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -83.6    |
| Iteration     | 2        |
| MaximumReturn | -60      |
| MinimumReturn | -101     |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12104694545269012
Validation loss = 0.11513613164424896
Validation loss = 0.10870503634214401
Validation loss = 0.11163797974586487
Validation loss = 0.11532488465309143
Validation loss = 0.1109980046749115
Validation loss = 0.1092003881931305
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.12354987859725952
Validation loss = 0.11202101409435272
Validation loss = 0.1089380756020546
Validation loss = 0.11077205836772919
Validation loss = 0.11033674329519272
Validation loss = 0.11593623459339142
Validation loss = 0.10956518352031708
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.12223058938980103
Validation loss = 0.1171267181634903
Validation loss = 0.11065146327018738
Validation loss = 0.11406520009040833
Validation loss = 0.10964670777320862
Validation loss = 0.11121787130832672
Validation loss = 0.10878497362136841
Validation loss = 0.11501245200634003
Validation loss = 0.1095718964934349
Validation loss = 0.11287026107311249
Validation loss = 0.11210683733224869
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12086392194032669
Validation loss = 0.11014938354492188
Validation loss = 0.11391697824001312
Validation loss = 0.10964369773864746
Validation loss = 0.11476612836122513
Validation loss = 0.11214055120944977
Validation loss = 0.1136690005660057
Validation loss = 0.11524106562137604
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1237206980586052
Validation loss = 0.11250757426023483
Validation loss = 0.11222877353429794
Validation loss = 0.11210153251886368
Validation loss = 0.10878727585077286
Validation loss = 0.11716295033693314
Validation loss = 0.11259344220161438
Validation loss = 0.11090104281902313
Validation loss = 0.11382432281970978
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -593     |
| Iteration     | 3        |
| MaximumReturn | -422     |
| MinimumReturn | -743     |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.15295660495758057
Validation loss = 0.12092015892267227
Validation loss = 0.12167469412088394
Validation loss = 0.12227878719568253
Validation loss = 0.12400953471660614
Validation loss = 0.12210625410079956
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14541158080101013
Validation loss = 0.12179914861917496
Validation loss = 0.12162840366363525
Validation loss = 0.12520891427993774
Validation loss = 0.1218089610338211
Validation loss = 0.12722237408161163
Validation loss = 0.12434270232915878
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.157112717628479
Validation loss = 0.11932335048913956
Validation loss = 0.11892956495285034
Validation loss = 0.12366654723882675
Validation loss = 0.1265600174665451
Validation loss = 0.12317268550395966
Validation loss = 0.12296811491250992
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.15619000792503357
Validation loss = 0.12065581977367401
Validation loss = 0.12339556217193604
Validation loss = 0.12481747567653656
Validation loss = 0.12388627231121063
Validation loss = 0.12588146328926086
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.15864954888820648
Validation loss = 0.11961599439382553
Validation loss = 0.12331271171569824
Validation loss = 0.12236994504928589
Validation loss = 0.12111236900091171
Validation loss = 0.12383513152599335
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -140     |
| Iteration     | 4        |
| MaximumReturn | 29.3     |
| MinimumReturn | -343     |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12669378519058228
Validation loss = 0.11518839001655579
Validation loss = 0.11571717262268066
Validation loss = 0.11601971834897995
Validation loss = 0.11449450254440308
Validation loss = 0.12081995606422424
Validation loss = 0.11399868130683899
Validation loss = 0.11421188712120056
Validation loss = 0.11677467823028564
Validation loss = 0.11861732602119446
Validation loss = 0.11819525808095932
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.13094200193881989
Validation loss = 0.11606090515851974
Validation loss = 0.12079466134309769
Validation loss = 0.11690249294042587
Validation loss = 0.11756830662488937
Validation loss = 0.11373624950647354
Validation loss = 0.11620649695396423
Validation loss = 0.11455479264259338
Validation loss = 0.11817994713783264
Validation loss = 0.11705433577299118
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1260804831981659
Validation loss = 0.11774318665266037
Validation loss = 0.1141311451792717
Validation loss = 0.11528081446886063
Validation loss = 0.11788472533226013
Validation loss = 0.11535996943712234
Validation loss = 0.1170010045170784
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12747006118297577
Validation loss = 0.11772885918617249
Validation loss = 0.11459261924028397
Validation loss = 0.11267822235822678
Validation loss = 0.11690572649240494
Validation loss = 0.1170203909277916
Validation loss = 0.11482558399438858
Validation loss = 0.11588743329048157
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.12443885207176208
Validation loss = 0.11755340546369553
Validation loss = 0.11401599645614624
Validation loss = 0.11500009149312973
Validation loss = 0.11690816283226013
Validation loss = 0.11242097616195679
Validation loss = 0.11643164604902267
Validation loss = 0.11187109351158142
Validation loss = 0.11477328091859818
Validation loss = 0.11649312824010849
Validation loss = 0.11763137578964233
Validation loss = 0.11817419528961182
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 185      |
| Iteration     | 5        |
| MaximumReturn | 480      |
| MinimumReturn | -382     |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12500575184822083
Validation loss = 0.11079633235931396
Validation loss = 0.11086765676736832
Validation loss = 0.1083485409617424
Validation loss = 0.10925864428281784
Validation loss = 0.10835226625204086
Validation loss = 0.10880900174379349
Validation loss = 0.11269747465848923
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11751668155193329
Validation loss = 0.10863561928272247
Validation loss = 0.10744819790124893
Validation loss = 0.10500501096248627
Validation loss = 0.10834608227014542
Validation loss = 0.10852056741714478
Validation loss = 0.10790908336639404
Validation loss = 0.10873176157474518
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11773096770048141
Validation loss = 0.10685031861066818
Validation loss = 0.11374985426664352
Validation loss = 0.1106625348329544
Validation loss = 0.11179038137197495
Validation loss = 0.10883410274982452
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12144975364208221
Validation loss = 0.1085900142788887
Validation loss = 0.110021211206913
Validation loss = 0.11140535026788712
Validation loss = 0.11339227855205536
Validation loss = 0.10751264542341232
Validation loss = 0.10837452113628387
Validation loss = 0.11085125058889389
Validation loss = 0.10778721421957016
Validation loss = 0.1085708737373352
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11607649177312851
Validation loss = 0.11319059878587723
Validation loss = 0.11011292040348053
Validation loss = 0.10775727778673172
Validation loss = 0.10931086540222168
Validation loss = 0.10889049619436264
Validation loss = 0.10983676463365555
Validation loss = 0.10778724402189255
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -414     |
| Iteration     | 6        |
| MaximumReturn | -20      |
| MinimumReturn | -568     |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11013992875814438
Validation loss = 0.10410010069608688
Validation loss = 0.10520689189434052
Validation loss = 0.1031273752450943
Validation loss = 0.10291445255279541
Validation loss = 0.1062697097659111
Validation loss = 0.1064053550362587
Validation loss = 0.10538673400878906
Validation loss = 0.10298030078411102
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11122557520866394
Validation loss = 0.10234683752059937
Validation loss = 0.10467296838760376
Validation loss = 0.10061104595661163
Validation loss = 0.1043267473578453
Validation loss = 0.10534760355949402
Validation loss = 0.10317246615886688
Validation loss = 0.10212121158838272
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11348555982112885
Validation loss = 0.10417290031909943
Validation loss = 0.10493981093168259
Validation loss = 0.10277421772480011
Validation loss = 0.10489654541015625
Validation loss = 0.1067371740937233
Validation loss = 0.10974109172821045
Validation loss = 0.10586095601320267
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11198212206363678
Validation loss = 0.10394246876239777
Validation loss = 0.10647185146808624
Validation loss = 0.10661070793867111
Validation loss = 0.10469591617584229
Validation loss = 0.10473248362541199
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.109712615609169
Validation loss = 0.10328523814678192
Validation loss = 0.10143773257732391
Validation loss = 0.1028512567281723
Validation loss = 0.10259445756673813
Validation loss = 0.10186441242694855
Validation loss = 0.10297125577926636
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -235     |
| Iteration     | 7        |
| MaximumReturn | 146      |
| MinimumReturn | -557     |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10576020926237106
Validation loss = 0.09942420572042465
Validation loss = 0.09758982062339783
Validation loss = 0.09664985537528992
Validation loss = 0.0975710079073906
Validation loss = 0.09865730255842209
Validation loss = 0.09872136265039444
Validation loss = 0.09769861400127411
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10087352246046066
Validation loss = 0.09660746157169342
Validation loss = 0.09753473848104477
Validation loss = 0.09585443139076233
Validation loss = 0.09721624851226807
Validation loss = 0.09701890498399734
Validation loss = 0.09659013152122498
Validation loss = 0.09482952207326889
Validation loss = 0.09406605362892151
Validation loss = 0.09768036007881165
Validation loss = 0.09451097995042801
Validation loss = 0.09482767432928085
Validation loss = 0.0950779914855957
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10526309162378311
Validation loss = 0.09670707583427429
Validation loss = 0.09861251711845398
Validation loss = 0.09887037426233292
Validation loss = 0.09577829390764236
Validation loss = 0.09873493015766144
Validation loss = 0.09543386101722717
Validation loss = 0.09797096997499466
Validation loss = 0.0965638980269432
Validation loss = 0.09856853634119034
Validation loss = 0.09934820979833603
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10682621598243713
Validation loss = 0.09754318743944168
Validation loss = 0.09630191326141357
Validation loss = 0.09685399383306503
Validation loss = 0.09952889382839203
Validation loss = 0.0983174741268158
Validation loss = 0.09653426706790924
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10342467576265335
Validation loss = 0.09533964842557907
Validation loss = 0.09441599249839783
Validation loss = 0.09637381136417389
Validation loss = 0.09408491104841232
Validation loss = 0.09695355594158173
Validation loss = 0.09549280256032944
Validation loss = 0.09429239481687546
Validation loss = 0.0936829224228859
Validation loss = 0.09783923625946045
Validation loss = 0.09749017655849457
Validation loss = 0.09373173117637634
Validation loss = 0.0964134931564331
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -297     |
| Iteration     | 8        |
| MaximumReturn | -206     |
| MinimumReturn | -408     |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10231515020132065
Validation loss = 0.09804102033376694
Validation loss = 0.09687697887420654
Validation loss = 0.098668172955513
Validation loss = 0.0978424996137619
Validation loss = 0.09686724841594696
Validation loss = 0.09735351800918579
Validation loss = 0.09728296101093292
Validation loss = 0.09610002487897873
Validation loss = 0.10007230192422867
Validation loss = 0.09709763526916504
Validation loss = 0.0977262333035469
Validation loss = 0.09752057492733002
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10209651291370392
Validation loss = 0.09452230483293533
Validation loss = 0.09702388197183609
Validation loss = 0.09736788272857666
Validation loss = 0.09609729796648026
Validation loss = 0.09677670896053314
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10197895765304565
Validation loss = 0.09758828580379486
Validation loss = 0.09718333184719086
Validation loss = 0.0977729856967926
Validation loss = 0.09807334840297699
Validation loss = 0.09870433807373047
Validation loss = 0.09759041666984558
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10315164178609848
Validation loss = 0.09800269454717636
Validation loss = 0.09560397267341614
Validation loss = 0.09977583587169647
Validation loss = 0.09780494123697281
Validation loss = 0.09829267114400864
Validation loss = 0.09549539536237717
Validation loss = 0.0969596728682518
Validation loss = 0.09729346632957458
Validation loss = 0.09788377583026886
Validation loss = 0.09665927290916443
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09914012253284454
Validation loss = 0.09500765055418015
Validation loss = 0.09671272337436676
Validation loss = 0.09554900974035263
Validation loss = 0.09753593057394028
Validation loss = 0.09685469418764114
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -391     |
| Iteration     | 9        |
| MaximumReturn | 116      |
| MinimumReturn | -576     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10339999198913574
Validation loss = 0.0962313637137413
Validation loss = 0.09567660838365555
Validation loss = 0.0968315377831459
Validation loss = 0.09585822373628616
Validation loss = 0.09785234928131104
Validation loss = 0.09895174950361252
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10625596344470978
Validation loss = 0.09695082157850266
Validation loss = 0.09953918308019638
Validation loss = 0.095371775329113
Validation loss = 0.09948117285966873
Validation loss = 0.09720936417579651
Validation loss = 0.09802768379449844
Validation loss = 0.09526976943016052
Validation loss = 0.09619736671447754
Validation loss = 0.09708953648805618
Validation loss = 0.09690201282501221
Validation loss = 0.09740903228521347
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10606207698583603
Validation loss = 0.0979781374335289
Validation loss = 0.09616341441869736
Validation loss = 0.0972912386059761
Validation loss = 0.09717480838298798
Validation loss = 0.09800195693969727
Validation loss = 0.09673107415437698
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10436021536588669
Validation loss = 0.09854662418365479
Validation loss = 0.09477020055055618
Validation loss = 0.09706990420818329
Validation loss = 0.10111749917268753
Validation loss = 0.09919141978025436
Validation loss = 0.0992509201169014
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10670206695795059
Validation loss = 0.09804832935333252
Validation loss = 0.10144755244255066
Validation loss = 0.09782452881336212
Validation loss = 0.09833268821239471
Validation loss = 0.09914404898881912
Validation loss = 0.0997404083609581
Validation loss = 0.09790296852588654
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -576     |
| Iteration     | 10       |
| MaximumReturn | -537     |
| MinimumReturn | -606     |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10239464044570923
Validation loss = 0.0973622128367424
Validation loss = 0.09729820489883423
Validation loss = 0.09665930271148682
Validation loss = 0.0974200963973999
Validation loss = 0.09571541100740433
Validation loss = 0.09919595718383789
Validation loss = 0.09742919355630875
Validation loss = 0.09851888567209244
Validation loss = 0.0983298048377037
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09967542439699173
Validation loss = 0.0975186750292778
Validation loss = 0.09847988933324814
Validation loss = 0.09821051359176636
Validation loss = 0.09747152775526047
Validation loss = 0.09762230515480042
Validation loss = 0.09805095195770264
Validation loss = 0.09956947714090347
Validation loss = 0.10094770044088364
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09778822213411331
Validation loss = 0.0985151156783104
Validation loss = 0.0990648865699768
Validation loss = 0.1027882918715477
Validation loss = 0.10083669424057007
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10163835436105728
Validation loss = 0.09872540086507797
Validation loss = 0.09758827835321426
Validation loss = 0.09818583726882935
Validation loss = 0.09836199879646301
Validation loss = 0.1009342148900032
Validation loss = 0.10049328207969666
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1017296314239502
Validation loss = 0.10097137838602066
Validation loss = 0.1018298864364624
Validation loss = 0.09964346885681152
Validation loss = 0.09969216585159302
Validation loss = 0.09951131790876389
Validation loss = 0.10088545083999634
Validation loss = 0.09950828552246094
Validation loss = 0.10249637812376022
Validation loss = 0.10321251302957535
Validation loss = 0.10320645570755005
Validation loss = 0.09908485412597656
Validation loss = 0.10025551170110703
Validation loss = 0.1032949686050415
Validation loss = 0.09982667118310928
Validation loss = 0.09938616305589676
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -573     |
| Iteration     | 11       |
| MaximumReturn | -532     |
| MinimumReturn | -602     |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09943492710590363
Validation loss = 0.09764492511749268
Validation loss = 0.09821712970733643
Validation loss = 0.09540541470050812
Validation loss = 0.09627553075551987
Validation loss = 0.09816218167543411
Validation loss = 0.09784741699695587
Validation loss = 0.09530662000179291
Validation loss = 0.09716220945119858
Validation loss = 0.09683462232351303
Validation loss = 0.09783346951007843
Validation loss = 0.09756962209939957
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10189588367938995
Validation loss = 0.09790021926164627
Validation loss = 0.09689251333475113
Validation loss = 0.0990009754896164
Validation loss = 0.09852362424135208
Validation loss = 0.09706961363554001
Validation loss = 0.0973716601729393
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10127584636211395
Validation loss = 0.10101071000099182
Validation loss = 0.09935209155082703
Validation loss = 0.0979946032166481
Validation loss = 0.09927494823932648
Validation loss = 0.10027771443128586
Validation loss = 0.0997907742857933
Validation loss = 0.10002200305461884
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10040505230426788
Validation loss = 0.09818840026855469
Validation loss = 0.09790241718292236
Validation loss = 0.09810522943735123
Validation loss = 0.09862769395112991
Validation loss = 0.10097038000822067
Validation loss = 0.0965220183134079
Validation loss = 0.09669073671102524
Validation loss = 0.1010478064417839
Validation loss = 0.09994065761566162
Validation loss = 0.10101265460252762
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10210561007261276
Validation loss = 0.09919065982103348
Validation loss = 0.09918803721666336
Validation loss = 0.10576299577951431
Validation loss = 0.0985897034406662
Validation loss = 0.09945403784513474
Validation loss = 0.10044801980257034
Validation loss = 0.09786830842494965
Validation loss = 0.09782592207193375
Validation loss = 0.09933023154735565
Validation loss = 0.09876048564910889
Validation loss = 0.09999917447566986
Validation loss = 0.09978650510311127
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -603     |
| Iteration     | 12       |
| MaximumReturn | -590     |
| MinimumReturn | -608     |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09925608336925507
Validation loss = 0.09879780560731888
Validation loss = 0.09712894260883331
Validation loss = 0.0966409221291542
Validation loss = 0.09724953025579453
Validation loss = 0.09919463098049164
Validation loss = 0.09820550680160522
Validation loss = 0.09690015763044357
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10148240625858307
Validation loss = 0.09996455907821655
Validation loss = 0.09773921221494675
Validation loss = 0.09779984503984451
Validation loss = 0.10178489983081818
Validation loss = 0.09940224885940552
Validation loss = 0.09858199208974838
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10512296110391617
Validation loss = 0.10074015706777573
Validation loss = 0.09720005095005035
Validation loss = 0.1002989187836647
Validation loss = 0.10240457952022552
Validation loss = 0.09811865538358688
Validation loss = 0.09732779115438461
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09944546222686768
Validation loss = 0.09864557534456253
Validation loss = 0.100070521235466
Validation loss = 0.09926145523786545
Validation loss = 0.09854792803525925
Validation loss = 0.09923452883958817
Validation loss = 0.10031784325838089
Validation loss = 0.10065162181854248
Validation loss = 0.10177751630544662
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09817313402891159
Validation loss = 0.09756352752447128
Validation loss = 0.10122889280319214
Validation loss = 0.10018493980169296
Validation loss = 0.0972994938492775
Validation loss = 0.10094042867422104
Validation loss = 0.1003900021314621
Validation loss = 0.09797435253858566
Validation loss = 0.09832429140806198
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -545     |
| Iteration     | 13       |
| MaximumReturn | -427     |
| MinimumReturn | -610     |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10221753269433975
Validation loss = 0.09615760296583176
Validation loss = 0.09833517670631409
Validation loss = 0.09846758842468262
Validation loss = 0.09716558456420898
Validation loss = 0.09784063696861267
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10154889523983002
Validation loss = 0.1017782986164093
Validation loss = 0.09903299063444138
Validation loss = 0.10087879747152328
Validation loss = 0.09903910011053085
Validation loss = 0.10302175581455231
Validation loss = 0.09998586028814316
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10147235542535782
Validation loss = 0.1012687236070633
Validation loss = 0.10110903531312943
Validation loss = 0.1034865751862526
Validation loss = 0.09960608184337616
Validation loss = 0.10002829134464264
Validation loss = 0.10090408474206924
Validation loss = 0.10099737346172333
Validation loss = 0.09994310885667801
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10249143093824387
Validation loss = 0.1001177504658699
Validation loss = 0.1006699874997139
Validation loss = 0.10117661952972412
Validation loss = 0.0994727835059166
Validation loss = 0.10194476693868637
Validation loss = 0.10049352049827576
Validation loss = 0.1003275141119957
Validation loss = 0.10030926764011383
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10477963835000992
Validation loss = 0.10068003088235855
Validation loss = 0.10169719159603119
Validation loss = 0.09859403967857361
Validation loss = 0.10113389790058136
Validation loss = 0.10273316502571106
Validation loss = 0.10003361850976944
Validation loss = 0.10069543868303299
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -466     |
| Iteration     | 14       |
| MaximumReturn | -370     |
| MinimumReturn | -605     |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10292819887399673
Validation loss = 0.09525957703590393
Validation loss = 0.09777054190635681
Validation loss = 0.0977126806974411
Validation loss = 0.09830974042415619
Validation loss = 0.09712869673967361
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10132470726966858
Validation loss = 0.10150261968374252
Validation loss = 0.09861312806606293
Validation loss = 0.09689772129058838
Validation loss = 0.09929025173187256
Validation loss = 0.09904606640338898
Validation loss = 0.09946535527706146
Validation loss = 0.09782963991165161
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10313712060451508
Validation loss = 0.0980743020772934
Validation loss = 0.09828884899616241
Validation loss = 0.09836269915103912
Validation loss = 0.09927305579185486
Validation loss = 0.09951046109199524
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10378876328468323
Validation loss = 0.09973248839378357
Validation loss = 0.10101927071809769
Validation loss = 0.10078853368759155
Validation loss = 0.10270512849092484
Validation loss = 0.09897839277982712
Validation loss = 0.10153710842132568
Validation loss = 0.09876160323619843
Validation loss = 0.10012561827898026
Validation loss = 0.0996396541595459
Validation loss = 0.10142451524734497
Validation loss = 0.10125555098056793
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10188422352075577
Validation loss = 0.10004393011331558
Validation loss = 0.09714609384536743
Validation loss = 0.10107405483722687
Validation loss = 0.10099756717681885
Validation loss = 0.10169515013694763
Validation loss = 0.09942897409200668
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -298     |
| Iteration     | 15       |
| MaximumReturn | 399      |
| MinimumReturn | -558     |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11050094664096832
Validation loss = 0.1064782440662384
Validation loss = 0.10488085448741913
Validation loss = 0.10770678520202637
Validation loss = 0.10440226644277573
Validation loss = 0.10764431208372116
Validation loss = 0.10784681141376495
Validation loss = 0.10382651537656784
Validation loss = 0.10588015615940094
Validation loss = 0.104413241147995
Validation loss = 0.10901805013418198
Validation loss = 0.10382475703954697
Validation loss = 0.1075667142868042
Validation loss = 0.10584060102701187
Validation loss = 0.10621467232704163
Validation loss = 0.10522427409887314
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11367350816726685
Validation loss = 0.10659684985876083
Validation loss = 0.10674257576465607
Validation loss = 0.10472584515810013
Validation loss = 0.10795175284147263
Validation loss = 0.11186152696609497
Validation loss = 0.10758116096258163
Validation loss = 0.1087999939918518
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11427389085292816
Validation loss = 0.10485529154539108
Validation loss = 0.10658852756023407
Validation loss = 0.10496149212121964
Validation loss = 0.10714689642190933
Validation loss = 0.10724431276321411
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11460524797439575
Validation loss = 0.10429687798023224
Validation loss = 0.11037707328796387
Validation loss = 0.10796000808477402
Validation loss = 0.11156146228313446
Validation loss = 0.10914412140846252
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11416970193386078
Validation loss = 0.10541428625583649
Validation loss = 0.10825186967849731
Validation loss = 0.11083682626485825
Validation loss = 0.10927106440067291
Validation loss = 0.1079351007938385
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -222     |
| Iteration     | 16       |
| MaximumReturn | 229      |
| MinimumReturn | -527     |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11172844469547272
Validation loss = 0.10416636615991592
Validation loss = 0.10636567324399948
Validation loss = 0.10483410954475403
Validation loss = 0.10296067595481873
Validation loss = 0.10184346139431
Validation loss = 0.10232080519199371
Validation loss = 0.10312869399785995
Validation loss = 0.10304398834705353
Validation loss = 0.1041865274310112
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10845037549734116
Validation loss = 0.10628751665353775
Validation loss = 0.10741469264030457
Validation loss = 0.10613563656806946
Validation loss = 0.11048894375562668
Validation loss = 0.10751274228096008
Validation loss = 0.10707627981901169
Validation loss = 0.1120399758219719
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10896357893943787
Validation loss = 0.10411545634269714
Validation loss = 0.1047586128115654
Validation loss = 0.10152408480644226
Validation loss = 0.10199194401502609
Validation loss = 0.10249987244606018
Validation loss = 0.10270614922046661
Validation loss = 0.10315731167793274
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11961640417575836
Validation loss = 0.11047428846359253
Validation loss = 0.1092802882194519
Validation loss = 0.10730167478322983
Validation loss = 0.11055198311805725
Validation loss = 0.10752275586128235
Validation loss = 0.10593599826097488
Validation loss = 0.1076987087726593
Validation loss = 0.1095883697271347
Validation loss = 0.10918334126472473
Validation loss = 0.1052137017250061
Validation loss = 0.1092013269662857
Validation loss = 0.10971256345510483
Validation loss = 0.10614795982837677
Validation loss = 0.10901568084955215
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10933080315589905
Validation loss = 0.10465926676988602
Validation loss = 0.10673664510250092
Validation loss = 0.10648917406797409
Validation loss = 0.10572300851345062
Validation loss = 0.1083320826292038
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -341     |
| Iteration     | 17       |
| MaximumReturn | 89.5     |
| MinimumReturn | -518     |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10880710184574127
Validation loss = 0.10256782919168472
Validation loss = 0.10094742476940155
Validation loss = 0.10244211554527283
Validation loss = 0.10406295955181122
Validation loss = 0.10286364704370499
Validation loss = 0.10264579206705093
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10998621582984924
Validation loss = 0.10315591841936111
Validation loss = 0.10515913367271423
Validation loss = 0.10795169323682785
Validation loss = 0.10478194057941437
Validation loss = 0.10350514948368073
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1059187576174736
Validation loss = 0.10115630179643631
Validation loss = 0.10227096080780029
Validation loss = 0.10112033784389496
Validation loss = 0.102015919983387
Validation loss = 0.102913998067379
Validation loss = 0.10159647464752197
Validation loss = 0.10208908468484879
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11188572645187378
Validation loss = 0.10618569701910019
Validation loss = 0.10739680379629135
Validation loss = 0.10969134420156479
Validation loss = 0.11236829310655594
Validation loss = 0.10636742413043976
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11053527146577835
Validation loss = 0.1014586091041565
Validation loss = 0.10310953110456467
Validation loss = 0.10609247535467148
Validation loss = 0.10423451662063599
Validation loss = 0.1028100848197937
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -320     |
| Iteration     | 18       |
| MaximumReturn | -121     |
| MinimumReturn | -479     |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10368351638317108
Validation loss = 0.10032297670841217
Validation loss = 0.10511498153209686
Validation loss = 0.10494959354400635
Validation loss = 0.10582952201366425
Validation loss = 0.10102017968893051
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10677442699670792
Validation loss = 0.10200253874063492
Validation loss = 0.10359923541545868
Validation loss = 0.10438098758459091
Validation loss = 0.10124917328357697
Validation loss = 0.10219740867614746
Validation loss = 0.10352538526058197
Validation loss = 0.10206179320812225
Validation loss = 0.10436828434467316
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10356087982654572
Validation loss = 0.0991060733795166
Validation loss = 0.09825842082500458
Validation loss = 0.10017891228199005
Validation loss = 0.09925849735736847
Validation loss = 0.09900818765163422
Validation loss = 0.09888668358325958
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11000464111566544
Validation loss = 0.10246524959802628
Validation loss = 0.10748934745788574
Validation loss = 0.10640133917331696
Validation loss = 0.10617747157812119
Validation loss = 0.10170745849609375
Validation loss = 0.10969595611095428
Validation loss = 0.10322270542383194
Validation loss = 0.10153152793645859
Validation loss = 0.10620348155498505
Validation loss = 0.10200110822916031
Validation loss = 0.10485078394412994
Validation loss = 0.10559449344873428
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10320649296045303
Validation loss = 0.09996246546506882
Validation loss = 0.101345494389534
Validation loss = 0.10110882669687271
Validation loss = 0.10139676183462143
Validation loss = 0.100874163210392
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -382     |
| Iteration     | 19       |
| MaximumReturn | -221     |
| MinimumReturn | -471     |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11238578706979752
Validation loss = 0.0996169000864029
Validation loss = 0.10262239724397659
Validation loss = 0.10178209841251373
Validation loss = 0.10094402730464935
Validation loss = 0.0998276099562645
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10251758247613907
Validation loss = 0.10125470161437988
Validation loss = 0.10208287835121155
Validation loss = 0.10069440305233002
Validation loss = 0.09958485513925552
Validation loss = 0.10189428925514221
Validation loss = 0.10062724351882935
Validation loss = 0.1000930592417717
Validation loss = 0.1004892960190773
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10464509576559067
Validation loss = 0.09971410781145096
Validation loss = 0.09938593953847885
Validation loss = 0.09916314482688904
Validation loss = 0.10073423385620117
Validation loss = 0.09818135201931
Validation loss = 0.1003023311495781
Validation loss = 0.09845846891403198
Validation loss = 0.09994439780712128
Validation loss = 0.09830901771783829
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10857041925191879
Validation loss = 0.10068809241056442
Validation loss = 0.10542915016412735
Validation loss = 0.10329324752092361
Validation loss = 0.10225209593772888
Validation loss = 0.10034996271133423
Validation loss = 0.10091353952884674
Validation loss = 0.10107597708702087
Validation loss = 0.1036691814661026
Validation loss = 0.10251162201166153
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10356452316045761
Validation loss = 0.10224474221467972
Validation loss = 0.10012167692184448
Validation loss = 0.10008300840854645
Validation loss = 0.09663009643554688
Validation loss = 0.1002316102385521
Validation loss = 0.10075686872005463
Validation loss = 0.09955503791570663
Validation loss = 0.1002245619893074
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -341     |
| Iteration     | 20       |
| MaximumReturn | -149     |
| MinimumReturn | -484     |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10912737250328064
Validation loss = 0.10155107825994492
Validation loss = 0.10536081343889236
Validation loss = 0.10202570259571075
Validation loss = 0.10333312302827835
Validation loss = 0.10482752323150635
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10395217686891556
Validation loss = 0.10017853230237961
Validation loss = 0.10323960334062576
Validation loss = 0.10182318836450577
Validation loss = 0.1026650220155716
Validation loss = 0.10066204518079758
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10249069333076477
Validation loss = 0.09893731772899628
Validation loss = 0.09767515957355499
Validation loss = 0.097433902323246
Validation loss = 0.09732525050640106
Validation loss = 0.0974331647157669
Validation loss = 0.09879428893327713
Validation loss = 0.09716741740703583
Validation loss = 0.09711385518312454
Validation loss = 0.09604127705097198
Validation loss = 0.0967758446931839
Validation loss = 0.09637776017189026
Validation loss = 0.0966225117444992
Validation loss = 0.09727615863084793
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10591704398393631
Validation loss = 0.099605493247509
Validation loss = 0.09706656634807587
Validation loss = 0.0993117094039917
Validation loss = 0.10026136040687561
Validation loss = 0.09953156113624573
Validation loss = 0.0987904742360115
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1060427725315094
Validation loss = 0.09793708473443985
Validation loss = 0.09596467018127441
Validation loss = 0.09834916144609451
Validation loss = 0.09796102344989777
Validation loss = 0.09902172535657883
Validation loss = 0.10205627232789993
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -283     |
| Iteration     | 21       |
| MaximumReturn | -77.3    |
| MinimumReturn | -560     |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10455621778964996
Validation loss = 0.10406417399644852
Validation loss = 0.10091590881347656
Validation loss = 0.09966026991605759
Validation loss = 0.1025143712759018
Validation loss = 0.10161542892456055
Validation loss = 0.0999561995267868
Validation loss = 0.10139428824186325
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10990975052118301
Validation loss = 0.1018892154097557
Validation loss = 0.10096371918916702
Validation loss = 0.10113954544067383
Validation loss = 0.10004965215921402
Validation loss = 0.09888581931591034
Validation loss = 0.1000530794262886
Validation loss = 0.09859069436788559
Validation loss = 0.09955185651779175
Validation loss = 0.100544273853302
Validation loss = 0.102107472717762
Validation loss = 0.09814608842134476
Validation loss = 0.10374969244003296
Validation loss = 0.1015547662973404
Validation loss = 0.10217422991991043
Validation loss = 0.0997077152132988
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10245673358440399
Validation loss = 0.09757998585700989
Validation loss = 0.09859056770801544
Validation loss = 0.09854947030544281
Validation loss = 0.10012959688901901
Validation loss = 0.09845113754272461
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10862831771373749
Validation loss = 0.09570664167404175
Validation loss = 0.09820015728473663
Validation loss = 0.10005601495504379
Validation loss = 0.09897401928901672
Validation loss = 0.09996286779642105
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10102541744709015
Validation loss = 0.09841575473546982
Validation loss = 0.10041992366313934
Validation loss = 0.100047267973423
Validation loss = 0.10121425241231918
Validation loss = 0.09957332164049149
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -268     |
| Iteration     | 22       |
| MaximumReturn | 62.8     |
| MinimumReturn | -397     |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10331977158784866
Validation loss = 0.09931770712137222
Validation loss = 0.10039371252059937
Validation loss = 0.09861421585083008
Validation loss = 0.09745582193136215
Validation loss = 0.09924641251564026
Validation loss = 0.09692618995904922
Validation loss = 0.09820574522018433
Validation loss = 0.09947565197944641
Validation loss = 0.10048544406890869
Validation loss = 0.09635638445615768
Validation loss = 0.09718113392591476
Validation loss = 0.09955593198537827
Validation loss = 0.09746985882520676
Validation loss = 0.10081285238265991
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10149768739938736
Validation loss = 0.0978948101401329
Validation loss = 0.0986805334687233
Validation loss = 0.09535744041204453
Validation loss = 0.09725522249937057
Validation loss = 0.09892655164003372
Validation loss = 0.09873533248901367
Validation loss = 0.10125702619552612
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10109692811965942
Validation loss = 0.09648102521896362
Validation loss = 0.09451741725206375
Validation loss = 0.0954064428806305
Validation loss = 0.09608525037765503
Validation loss = 0.0950738862156868
Validation loss = 0.0960855558514595
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10054752975702286
Validation loss = 0.09989716857671738
Validation loss = 0.0963025614619255
Validation loss = 0.09744372218847275
Validation loss = 0.09938442707061768
Validation loss = 0.09673527628183365
Validation loss = 0.09742685407400131
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10302398353815079
Validation loss = 0.0996198058128357
Validation loss = 0.10032065957784653
Validation loss = 0.09759775549173355
Validation loss = 0.09689762443304062
Validation loss = 0.09720244258642197
Validation loss = 0.10362305492162704
Validation loss = 0.10246232897043228
Validation loss = 0.10208725929260254
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -193     |
| Iteration     | 23       |
| MaximumReturn | 465      |
| MinimumReturn | -529     |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10086644440889359
Validation loss = 0.10192476212978363
Validation loss = 0.09813021868467331
Validation loss = 0.09581515938043594
Validation loss = 0.09986969083547592
Validation loss = 0.10146256536245346
Validation loss = 0.0973116084933281
Validation loss = 0.09651031345129013
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09918040037155151
Validation loss = 0.0964389219880104
Validation loss = 0.09884220361709595
Validation loss = 0.09756357967853546
Validation loss = 0.0974307730793953
Validation loss = 0.09756794571876526
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09894334524869919
Validation loss = 0.09571018815040588
Validation loss = 0.09589765220880508
Validation loss = 0.09427790343761444
Validation loss = 0.0934467688202858
Validation loss = 0.0950070321559906
Validation loss = 0.09389717876911163
Validation loss = 0.09517202526330948
Validation loss = 0.0942472368478775
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10104040801525116
Validation loss = 0.09590448439121246
Validation loss = 0.09458848088979721
Validation loss = 0.09425196796655655
Validation loss = 0.09711101651191711
Validation loss = 0.09615811705589294
Validation loss = 0.09396311640739441
Validation loss = 0.09404200315475464
Validation loss = 0.0965014398097992
Validation loss = 0.0948389321565628
Validation loss = 0.09531150013208389
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10026843845844269
Validation loss = 0.09639736264944077
Validation loss = 0.1002238392829895
Validation loss = 0.09481120109558105
Validation loss = 0.09540946036577225
Validation loss = 0.09844282269477844
Validation loss = 0.09929636120796204
Validation loss = 0.09822170436382294
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -292     |
| Iteration     | 24       |
| MaximumReturn | 14.6     |
| MinimumReturn | -507     |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.20011399686336517
Validation loss = 0.2524251341819763
Validation loss = 0.22179245948791504
Validation loss = 0.2546732723712921
Validation loss = 0.1986783891916275
Validation loss = 0.2526005208492279
Validation loss = 0.21866042912006378
Validation loss = 0.26620805263519287
Validation loss = 0.23215720057487488
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.22546464204788208
Validation loss = 0.25605475902557373
Validation loss = 0.2601989507675171
Validation loss = 0.2368539422750473
Validation loss = 0.24711844325065613
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.25107133388519287
Validation loss = 0.2971366345882416
Validation loss = 0.32260844111442566
Validation loss = 0.2899756133556366
Validation loss = 0.2906121611595154
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2451946884393692
Validation loss = 0.2568429410457611
Validation loss = 0.2650321125984192
Validation loss = 0.26079365611076355
Validation loss = 0.25189873576164246
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.23666851222515106
Validation loss = 0.23363065719604492
Validation loss = 0.2503376603126526
Validation loss = 0.2416950762271881
Validation loss = 0.23948179185390472
Validation loss = 0.22925560176372528
Validation loss = 0.25085222721099854
Validation loss = 0.25356385111808777
Validation loss = 0.2127029299736023
Validation loss = 0.23897837102413177
Validation loss = 0.2457253783941269
Validation loss = 0.2626682221889496
Validation loss = 0.24998758733272552
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -394     |
| Iteration     | 25       |
| MaximumReturn | -339     |
| MinimumReturn | -495     |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.177020862698555
Validation loss = 0.21890953183174133
Validation loss = 0.19147680699825287
Validation loss = 0.20273153483867645
Validation loss = 0.19372159242630005
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2055358588695526
Validation loss = 0.22278310358524323
Validation loss = 0.21049045026302338
Validation loss = 0.2260511964559555
Validation loss = 0.2128247618675232
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.257644385099411
Validation loss = 0.2725215554237366
Validation loss = 0.25837084650993347
Validation loss = 0.27666231989860535
Validation loss = 0.24325333535671234
Validation loss = 0.26510629057884216
Validation loss = 0.27319762110710144
Validation loss = 0.29550042748451233
Validation loss = 0.29542604088783264
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.17898482084274292
Validation loss = 0.19825389981269836
Validation loss = 0.19466827809810638
Validation loss = 0.21756376326084137
Validation loss = 0.20196831226348877
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.19092749059200287
Validation loss = 0.2232842594385147
Validation loss = 0.2196376770734787
Validation loss = 0.21039845049381256
Validation loss = 0.22958184778690338
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -397     |
| Iteration     | 26       |
| MaximumReturn | -277     |
| MinimumReturn | -511     |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.17347362637519836
Validation loss = 0.20205549895763397
Validation loss = 0.2088669091463089
Validation loss = 0.22429929673671722
Validation loss = 0.20615282654762268
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.18890264630317688
Validation loss = 0.20153410732746124
Validation loss = 0.2164231389760971
Validation loss = 0.20714548230171204
Validation loss = 0.21895337104797363
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2639712989330292
Validation loss = 0.2867036461830139
Validation loss = 0.2732630670070648
Validation loss = 0.2801235318183899
Validation loss = 0.3075699508190155
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1887940913438797
Validation loss = 0.20176835358142853
Validation loss = 0.19713261723518372
Validation loss = 0.20614631474018097
Validation loss = 0.21436917781829834
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1963014304637909
Validation loss = 0.21549466252326965
Validation loss = 0.2282082736492157
Validation loss = 0.21540677547454834
Validation loss = 0.22179411351680756
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -285     |
| Iteration     | 27       |
| MaximumReturn | -94.3    |
| MinimumReturn | -394     |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2082689106464386
Validation loss = 0.22149862349033356
Validation loss = 0.22188827395439148
Validation loss = 0.22648721933364868
Validation loss = 0.24274128675460815
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.18481644988059998
Validation loss = 0.21407625079154968
Validation loss = 0.19104701280593872
Validation loss = 0.19823338091373444
Validation loss = 0.21462060511112213
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2601046860218048
Validation loss = 0.28199347853660583
Validation loss = 0.2994445562362671
Validation loss = 0.27311959862709045
Validation loss = 0.26022234559059143
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.20233100652694702
Validation loss = 0.17840199172496796
Validation loss = 0.18830887973308563
Validation loss = 0.1945352703332901
Validation loss = 0.20667724311351776
Validation loss = 0.2296096235513687
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.19471164047718048
Validation loss = 0.220510795712471
Validation loss = 0.21418942511081696
Validation loss = 0.22927241027355194
Validation loss = 0.23423048853874207
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -336     |
| Iteration     | 28       |
| MaximumReturn | -93.1    |
| MinimumReturn | -438     |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.21017180383205414
Validation loss = 0.20567937195301056
Validation loss = 0.20512886345386505
Validation loss = 0.2049873173236847
Validation loss = 0.21847863495349884
Validation loss = 0.20132341980934143
Validation loss = 0.2370205521583557
Validation loss = 0.2162262350320816
Validation loss = 0.22300297021865845
Validation loss = 0.24411088228225708
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.18707583844661713
Validation loss = 0.1864546686410904
Validation loss = 0.18364477157592773
Validation loss = 0.1900196224451065
Validation loss = 0.18560558557510376
Validation loss = 0.1757558435201645
Validation loss = 0.1626536250114441
Validation loss = 0.17693974077701569
Validation loss = 0.19228526949882507
Validation loss = 0.18124233186244965
Validation loss = 0.1800486296415329
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2705918550491333
Validation loss = 0.25085729360580444
Validation loss = 0.2944853901863098
Validation loss = 0.2705346643924713
Validation loss = 0.2675977647304535
Validation loss = 0.2538284361362457
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.20188942551612854
Validation loss = 0.2124437540769577
Validation loss = 0.2083727866411209
Validation loss = 0.205170676112175
Validation loss = 0.19663581252098083
Validation loss = 0.21968244016170502
Validation loss = 0.21906761825084686
Validation loss = 0.2321390062570572
Validation loss = 0.22472278773784637
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.19401860237121582
Validation loss = 0.21459931135177612
Validation loss = 0.21202781796455383
Validation loss = 0.2312590479850769
Validation loss = 0.21355554461479187
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -429     |
| Iteration     | 29       |
| MaximumReturn | -345     |
| MinimumReturn | -481     |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.20554278790950775
Validation loss = 0.24034085869789124
Validation loss = 0.2506471872329712
Validation loss = 0.22139295935630798
Validation loss = 0.23045547306537628
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.17569559812545776
Validation loss = 0.18558236956596375
Validation loss = 0.17229340970516205
Validation loss = 0.18347123265266418
Validation loss = 0.1860397905111313
Validation loss = 0.1840110421180725
Validation loss = 0.16955910623073578
Validation loss = 0.16937944293022156
Validation loss = 0.19203408062458038
Validation loss = 0.18299232423305511
Validation loss = 0.19358843564987183
Validation loss = 0.19306014478206635
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.22865287959575653
Validation loss = 0.27059227228164673
Validation loss = 0.28703078627586365
Validation loss = 0.2792728841304779
Validation loss = 0.26124781370162964
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.21187549829483032
Validation loss = 0.20834149420261383
Validation loss = 0.21994076669216156
Validation loss = 0.2240467071533203
Validation loss = 0.21315501630306244
Validation loss = 0.21833351254463196
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.22525392472743988
Validation loss = 0.2210414558649063
Validation loss = 0.21080169081687927
Validation loss = 0.2257005125284195
Validation loss = 0.2177192121744156
Validation loss = 0.20840193331241608
Validation loss = 0.2278485745191574
Validation loss = 0.21541911363601685
Validation loss = 0.214265838265419
Validation loss = 0.22813297808170319
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -402     |
| Iteration     | 30       |
| MaximumReturn | -240     |
| MinimumReturn | -572     |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.20078769326210022
Validation loss = 0.20485489070415497
Validation loss = 0.21946701407432556
Validation loss = 0.21855223178863525
Validation loss = 0.21986770629882812
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.18492399156093597
Validation loss = 0.1883101463317871
Validation loss = 0.17206650972366333
Validation loss = 0.17703968286514282
Validation loss = 0.18803521990776062
Validation loss = 0.1955437958240509
Validation loss = 0.19466757774353027
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.25445103645324707
Validation loss = 0.2713567316532135
Validation loss = 0.251984566450119
Validation loss = 0.26720869541168213
Validation loss = 0.2764386236667633
Validation loss = 0.27535197138786316
Validation loss = 0.27063941955566406
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2129928022623062
Validation loss = 0.208462655544281
Validation loss = 0.20573818683624268
Validation loss = 0.2126166969537735
Validation loss = 0.2104993462562561
Validation loss = 0.20128248631954193
Validation loss = 0.19684207439422607
Validation loss = 0.20888179540634155
Validation loss = 0.20193463563919067
Validation loss = 0.20281319320201874
Validation loss = 0.18897488713264465
Validation loss = 0.20203843712806702
Validation loss = 0.22081074118614197
Validation loss = 0.20073577761650085
Validation loss = 0.22008618712425232
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.20268863439559937
Validation loss = 0.2091142237186432
Validation loss = 0.2037144899368286
Validation loss = 0.21736222505569458
Validation loss = 0.21623285114765167
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -369     |
| Iteration     | 31       |
| MaximumReturn | -250     |
| MinimumReturn | -447     |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.17961063981056213
Validation loss = 0.19867226481437683
Validation loss = 0.200739786028862
Validation loss = 0.1970808207988739
Validation loss = 0.20718899369239807
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.17833766341209412
Validation loss = 0.18357592821121216
Validation loss = 0.17064133286476135
Validation loss = 0.19125840067863464
Validation loss = 0.19460751116275787
Validation loss = 0.18766279518604279
Validation loss = 0.19946938753128052
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.22623059153556824
Validation loss = 0.25182390213012695
Validation loss = 0.2502656877040863
Validation loss = 0.24178732931613922
Validation loss = 0.2581467032432556
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.17310717701911926
Validation loss = 0.1891186386346817
Validation loss = 0.2012319713830948
Validation loss = 0.2016322761774063
Validation loss = 0.20142808556556702
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.167402982711792
Validation loss = 0.20290948450565338
Validation loss = 0.2037927210330963
Validation loss = 0.20472945272922516
Validation loss = 0.21012760698795319
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -482     |
| Iteration     | 32       |
| MaximumReturn | -322     |
| MinimumReturn | -578     |
| TotalSamples  | 136000   |
----------------------------
