Logging to experiments/invertedPendulum/IPO01/Tue-01-Nov-2022-09-49-35-PM-CDT_invertedPendulum_trpo_iteration_20_seed3214
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6956216096878052
Validation loss = 0.5977414846420288
Validation loss = 0.5360386967658997
Validation loss = 0.5177541971206665
Validation loss = 0.5030562281608582
Validation loss = 0.49562525749206543
Validation loss = 0.49515044689178467
Validation loss = 0.47865402698516846
Validation loss = 0.48003461956977844
Validation loss = 0.46729370951652527
Validation loss = 0.46684011816978455
Validation loss = 0.46988940238952637
Validation loss = 0.4571792185306549
Validation loss = 0.4553026556968689
Validation loss = 0.4543513357639313
Validation loss = 0.44920215010643005
Validation loss = 0.4561905562877655
Validation loss = 0.4529317319393158
Validation loss = 0.4372744858264923
Validation loss = 0.43914923071861267
Validation loss = 0.4410395920276642
Validation loss = 0.4328499734401703
Validation loss = 0.4290759861469269
Validation loss = 0.43569833040237427
Validation loss = 0.44031646847724915
Validation loss = 0.4208655059337616
Validation loss = 0.42391645908355713
Validation loss = 0.41255006194114685
Validation loss = 0.41610896587371826
Validation loss = 0.4171798527240753
Validation loss = 0.4032512903213501
Validation loss = 0.4129064083099365
Validation loss = 0.40186360478401184
Validation loss = 0.40507972240448
Validation loss = 0.39639440178871155
Validation loss = 0.3921636939048767
Validation loss = 0.39448341727256775
Validation loss = 0.4063732326030731
Validation loss = 0.39466702938079834
Validation loss = 0.39435887336730957
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6845117211341858
Validation loss = 0.5857095122337341
Validation loss = 0.5611237287521362
Validation loss = 0.5448700785636902
Validation loss = 0.5171868801116943
Validation loss = 0.5170645117759705
Validation loss = 0.4996013939380646
Validation loss = 0.49981799721717834
Validation loss = 0.48504066467285156
Validation loss = 0.48079684376716614
Validation loss = 0.4787127375602722
Validation loss = 0.4790755808353424
Validation loss = 0.4829416871070862
Validation loss = 0.4637821614742279
Validation loss = 0.4684031903743744
Validation loss = 0.458281010389328
Validation loss = 0.4662121832370758
Validation loss = 0.4557360112667084
Validation loss = 0.4444800317287445
Validation loss = 0.44559353590011597
Validation loss = 0.4411044120788574
Validation loss = 0.4329199492931366
Validation loss = 0.4380805194377899
Validation loss = 0.42114922404289246
Validation loss = 0.4255460798740387
Validation loss = 0.42549440264701843
Validation loss = 0.4235047399997711
Validation loss = 0.4151235818862915
Validation loss = 0.4212937355041504
Validation loss = 0.40789100527763367
Validation loss = 0.41063395142555237
Validation loss = 0.40417641401290894
Validation loss = 0.40050381422042847
Validation loss = 0.4083997309207916
Validation loss = 0.40781447291374207
Validation loss = 0.40015965700149536
Validation loss = 0.39362573623657227
Validation loss = 0.4097122550010681
Validation loss = 0.3947299122810364
Validation loss = 0.3995886743068695
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6931738257408142
Validation loss = 0.5702232718467712
Validation loss = 0.5298213362693787
Validation loss = 0.5108362436294556
Validation loss = 0.5083984732627869
Validation loss = 0.5018057227134705
Validation loss = 0.4852539002895355
Validation loss = 0.4787931442260742
Validation loss = 0.4702198803424835
Validation loss = 0.47474411129951477
Validation loss = 0.47475796937942505
Validation loss = 0.4640403389930725
Validation loss = 0.4622859060764313
Validation loss = 0.472669392824173
Validation loss = 0.46442973613739014
Validation loss = 0.4593915641307831
Validation loss = 0.44763630628585815
Validation loss = 0.44849592447280884
Validation loss = 0.437375545501709
Validation loss = 0.441439688205719
Validation loss = 0.4437299072742462
Validation loss = 0.4361182451248169
Validation loss = 0.4221065640449524
Validation loss = 0.4210242033004761
Validation loss = 0.4227447211742401
Validation loss = 0.4158174395561218
Validation loss = 0.4113554358482361
Validation loss = 0.4086384177207947
Validation loss = 0.4107680320739746
Validation loss = 0.4047565460205078
Validation loss = 0.40384751558303833
Validation loss = 0.39815470576286316
Validation loss = 0.39558279514312744
Validation loss = 0.40938860177993774
Validation loss = 0.40139755606651306
Validation loss = 0.40091702342033386
Validation loss = 0.4023442566394806
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6996592879295349
Validation loss = 0.5750104784965515
Validation loss = 0.5385436415672302
Validation loss = 0.5172352194786072
Validation loss = 0.5160489678382874
Validation loss = 0.495564341545105
Validation loss = 0.4925723373889923
Validation loss = 0.48986729979515076
Validation loss = 0.48236244916915894
Validation loss = 0.48079797625541687
Validation loss = 0.4662707448005676
Validation loss = 0.465922087430954
Validation loss = 0.462591290473938
Validation loss = 0.4559366703033447
Validation loss = 0.46271759271621704
Validation loss = 0.4451756477355957
Validation loss = 0.4399136006832123
Validation loss = 0.4433280825614929
Validation loss = 0.44456830620765686
Validation loss = 0.42893800139427185
Validation loss = 0.4276781678199768
Validation loss = 0.4265883266925812
Validation loss = 0.4217301607131958
Validation loss = 0.4286109209060669
Validation loss = 0.4179363548755646
Validation loss = 0.4171658456325531
Validation loss = 0.41976478695869446
Validation loss = 0.42704248428344727
Validation loss = 0.4069322943687439
Validation loss = 0.4155067205429077
Validation loss = 0.4148934781551361
Validation loss = 0.4025217294692993
Validation loss = 0.40158209204673767
Validation loss = 0.40718963742256165
Validation loss = 0.396070659160614
Validation loss = 0.39169278740882874
Validation loss = 0.3949334919452667
Validation loss = 0.4041757881641388
Validation loss = 0.3989488482475281
Validation loss = 0.40434327721595764
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6885439157485962
Validation loss = 0.586671769618988
Validation loss = 0.5315351486206055
Validation loss = 0.5205086469650269
Validation loss = 0.5105649828910828
Validation loss = 0.5053294897079468
Validation loss = 0.49479949474334717
Validation loss = 0.49402564764022827
Validation loss = 0.48423537611961365
Validation loss = 0.48103752732276917
Validation loss = 0.4730687737464905
Validation loss = 0.47016623616218567
Validation loss = 0.46310001611709595
Validation loss = 0.5237737894058228
Validation loss = 0.45488440990448
Validation loss = 0.4570980370044708
Validation loss = 0.46469956636428833
Validation loss = 0.4425238072872162
Validation loss = 0.4428154230117798
Validation loss = 0.4290636479854584
Validation loss = 0.42979270219802856
Validation loss = 0.43298208713531494
Validation loss = 0.4170409142971039
Validation loss = 0.4323572516441345
Validation loss = 0.42136311531066895
Validation loss = 0.4257756173610687
Validation loss = 0.4210934638977051
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -55.9    |
| Iteration     | 0        |
| MaximumReturn | -0.144   |
| MinimumReturn | -104     |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6268330216407776
Validation loss = 0.5006223320960999
Validation loss = 0.47890499234199524
Validation loss = 0.46660229563713074
Validation loss = 0.45072734355926514
Validation loss = 0.44244587421417236
Validation loss = 0.42987626791000366
Validation loss = 0.4060421586036682
Validation loss = 0.4030759036540985
Validation loss = 0.39840149879455566
Validation loss = 0.39829790592193604
Validation loss = 0.4092051684856415
Validation loss = 0.39388033747673035
Validation loss = 0.39212727546691895
Validation loss = 0.3959346413612366
Validation loss = 0.3885541260242462
Validation loss = 0.3869844377040863
Validation loss = 0.40035250782966614
Validation loss = 0.387165367603302
Validation loss = 0.39636602997779846
Validation loss = 0.3905051350593567
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6038886308670044
Validation loss = 0.49584290385246277
Validation loss = 0.4664072096347809
Validation loss = 0.4406619071960449
Validation loss = 0.4299594759941101
Validation loss = 0.4230211079120636
Validation loss = 0.40862154960632324
Validation loss = 0.4225231111049652
Validation loss = 0.43947166204452515
Validation loss = 0.39712563157081604
Validation loss = 0.39182108640670776
Validation loss = 0.3989385962486267
Validation loss = 0.3868100345134735
Validation loss = 0.3870353102684021
Validation loss = 0.3856322467327118
Validation loss = 0.39176324009895325
Validation loss = 0.38993677496910095
Validation loss = 0.38211482763290405
Validation loss = 0.39148879051208496
Validation loss = 0.38907724618911743
Validation loss = 0.39899972081184387
Validation loss = 0.39118489623069763
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.638087272644043
Validation loss = 0.4896399676799774
Validation loss = 0.4617334008216858
Validation loss = 0.4378279149532318
Validation loss = 0.4277132451534271
Validation loss = 0.42513352632522583
Validation loss = 0.4130474627017975
Validation loss = 0.41474756598472595
Validation loss = 0.4100593626499176
Validation loss = 0.3963308036327362
Validation loss = 0.39117833971977234
Validation loss = 0.39123842120170593
Validation loss = 0.3974737823009491
Validation loss = 0.3944415748119354
Validation loss = 0.3959563970565796
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6257781386375427
Validation loss = 0.4884146749973297
Validation loss = 0.4510428309440613
Validation loss = 0.4444553256034851
Validation loss = 0.41944295167922974
Validation loss = 0.4111159145832062
Validation loss = 0.40861305594444275
Validation loss = 0.403635710477829
Validation loss = 0.394526869058609
Validation loss = 0.3970467746257782
Validation loss = 0.3917068541049957
Validation loss = 0.39014044404029846
Validation loss = 0.3899255096912384
Validation loss = 0.3869458734989166
Validation loss = 0.3791702687740326
Validation loss = 0.39137208461761475
Validation loss = 0.39859625697135925
Validation loss = 0.37525641918182373
Validation loss = 0.3756910264492035
Validation loss = 0.3782559335231781
Validation loss = 0.38017916679382324
Validation loss = 0.37689197063446045
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5726931095123291
Validation loss = 0.4720532298088074
Validation loss = 0.4503214359283447
Validation loss = 0.4403589069843292
Validation loss = 0.4189416468143463
Validation loss = 0.40850603580474854
Validation loss = 0.4012371301651001
Validation loss = 0.39688050746917725
Validation loss = 0.39448487758636475
Validation loss = 0.397114634513855
Validation loss = 0.3878510594367981
Validation loss = 0.3989168107509613
Validation loss = 0.38834404945373535
Validation loss = 0.38159385323524475
Validation loss = 0.38178303837776184
Validation loss = 0.39263632893562317
Validation loss = 0.37861505150794983
Validation loss = 0.411923348903656
Validation loss = 0.39871981739997864
Validation loss = 0.3809341788291931
Validation loss = 0.37884286046028137
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.158   |
| Iteration     | 1        |
| MaximumReturn | -0.0624  |
| MinimumReturn | -0.51    |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5379432439804077
Validation loss = 0.4293178915977478
Validation loss = 0.40548086166381836
Validation loss = 0.39540308713912964
Validation loss = 0.4033602476119995
Validation loss = 0.3948836326599121
Validation loss = 0.3965839743614197
Validation loss = 0.388541042804718
Validation loss = 0.3924435079097748
Validation loss = 0.38996580243110657
Validation loss = 0.39501309394836426
Validation loss = 0.3800394833087921
Validation loss = 0.38925981521606445
Validation loss = 0.3814012408256531
Validation loss = 0.3880181908607483
Validation loss = 0.3959707021713257
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5429608821868896
Validation loss = 0.43280959129333496
Validation loss = 0.4041755795478821
Validation loss = 0.3900858461856842
Validation loss = 0.3902166485786438
Validation loss = 0.390693724155426
Validation loss = 0.38996875286102295
Validation loss = 0.3925771713256836
Validation loss = 0.38325467705726624
Validation loss = 0.39078348875045776
Validation loss = 0.3837938904762268
Validation loss = 0.3908691108226776
Validation loss = 0.37732964754104614
Validation loss = 0.394203245639801
Validation loss = 0.38711410760879517
Validation loss = 0.385101854801178
Validation loss = 0.39127910137176514
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5091599822044373
Validation loss = 0.428676962852478
Validation loss = 0.4068780541419983
Validation loss = 0.4001655876636505
Validation loss = 0.3929344117641449
Validation loss = 0.3861045837402344
Validation loss = 0.3868784010410309
Validation loss = 0.40023595094680786
Validation loss = 0.39035430550575256
Validation loss = 0.3818219304084778
Validation loss = 0.38786423206329346
Validation loss = 0.38402312994003296
Validation loss = 0.3955514430999756
Validation loss = 0.3935127556324005
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5181788802146912
Validation loss = 0.4224114418029785
Validation loss = 0.4041588604450226
Validation loss = 0.39866676926612854
Validation loss = 0.39259934425354004
Validation loss = 0.3916238844394684
Validation loss = 0.38969874382019043
Validation loss = 0.3794524669647217
Validation loss = 0.37909063696861267
Validation loss = 0.39159882068634033
Validation loss = 0.38267117738723755
Validation loss = 0.38168442249298096
Validation loss = 0.38895732164382935
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5391989946365356
Validation loss = 0.43999242782592773
Validation loss = 0.4229588508605957
Validation loss = 0.40936940908432007
Validation loss = 0.40313297510147095
Validation loss = 0.3914179801940918
Validation loss = 0.392325222492218
Validation loss = 0.3864598274230957
Validation loss = 0.3888813853263855
Validation loss = 0.39200592041015625
Validation loss = 0.3832734227180481
Validation loss = 0.40332627296447754
Validation loss = 0.3884809613227844
Validation loss = 0.3937438726425171
Validation loss = 0.3883224427700043
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.1    |
| Iteration     | 2        |
| MaximumReturn | -0.0275  |
| MinimumReturn | -62.3    |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4632561206817627
Validation loss = 0.3970831334590912
Validation loss = 0.38621386885643005
Validation loss = 0.38995423913002014
Validation loss = 0.386094331741333
Validation loss = 0.3808915615081787
Validation loss = 0.37218335270881653
Validation loss = 0.3798491954803467
Validation loss = 0.37489762902259827
Validation loss = 0.38015666604042053
Validation loss = 0.38051676750183105
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4429563283920288
Validation loss = 0.3872855603694916
Validation loss = 0.39092838764190674
Validation loss = 0.37715473771095276
Validation loss = 0.38441380858421326
Validation loss = 0.37718692421913147
Validation loss = 0.37927737832069397
Validation loss = 0.3706468641757965
Validation loss = 0.3779248893260956
Validation loss = 0.38870155811309814
Validation loss = 0.37572845816612244
Validation loss = 0.390196293592453
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4649397134780884
Validation loss = 0.40264323353767395
Validation loss = 0.38912293314933777
Validation loss = 0.39546823501586914
Validation loss = 0.3840097486972809
Validation loss = 0.38105854392051697
Validation loss = 0.3797895908355713
Validation loss = 0.3816371262073517
Validation loss = 0.38292133808135986
Validation loss = 0.3788374960422516
Validation loss = 0.3833247423171997
Validation loss = 0.3730679452419281
Validation loss = 0.37722182273864746
Validation loss = 0.38103368878364563
Validation loss = 0.3837507963180542
Validation loss = 0.38880062103271484
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4653499126434326
Validation loss = 0.39582133293151855
Validation loss = 0.3853873014450073
Validation loss = 0.38553550839424133
Validation loss = 0.38009169697761536
Validation loss = 0.3855765759944916
Validation loss = 0.37795528769493103
Validation loss = 0.3715169429779053
Validation loss = 0.3749576508998871
Validation loss = 0.3912590742111206
Validation loss = 0.3854498565196991
Validation loss = 0.38339439034461975
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4529581069946289
Validation loss = 0.39228692650794983
Validation loss = 0.38235923647880554
Validation loss = 0.37872114777565
Validation loss = 0.3844550549983978
Validation loss = 0.37617263197898865
Validation loss = 0.38308942317962646
Validation loss = 0.3757055699825287
Validation loss = 0.3853978216648102
Validation loss = 0.3843628168106079
Validation loss = 0.37804651260375977
Validation loss = 0.3870801031589508
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.057   |
| Iteration     | 3        |
| MaximumReturn | -0.0142  |
| MinimumReturn | -0.151   |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.39065128564834595
Validation loss = 0.38697177171707153
Validation loss = 0.3826742172241211
Validation loss = 0.3886633813381195
Validation loss = 0.38568615913391113
Validation loss = 0.38266149163246155
Validation loss = 0.3914669454097748
Validation loss = 0.3847748637199402
Validation loss = 0.3883935809135437
Validation loss = 0.3908342719078064
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.41297298669815063
Validation loss = 0.3821467161178589
Validation loss = 0.38819149136543274
Validation loss = 0.39370253682136536
Validation loss = 0.3890810012817383
Validation loss = 0.38135653734207153
Validation loss = 0.3871748447418213
Validation loss = 0.3857117295265198
Validation loss = 0.38903525471687317
Validation loss = 0.40244588255882263
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4193264842033386
Validation loss = 0.3902907967567444
Validation loss = 0.3829926550388336
Validation loss = 0.38641050457954407
Validation loss = 0.3776313364505768
Validation loss = 0.38809359073638916
Validation loss = 0.38866299390792847
Validation loss = 0.3965696692466736
Validation loss = 0.3907937705516815
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.44215479493141174
Validation loss = 0.39184075593948364
Validation loss = 0.3866010010242462
Validation loss = 0.38437268137931824
Validation loss = 0.3827180564403534
Validation loss = 0.38699260354042053
Validation loss = 0.38787660002708435
Validation loss = 0.3965970277786255
Validation loss = 0.39043450355529785
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4067554473876953
Validation loss = 0.3795262575149536
Validation loss = 0.39119529724121094
Validation loss = 0.38412976264953613
Validation loss = 0.38535061478614807
Validation loss = 0.38442137837409973
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.8    |
| Iteration     | 4        |
| MaximumReturn | -0.094   |
| MinimumReturn | -37.3    |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.40854114294052124
Validation loss = 0.3919453024864197
Validation loss = 0.3927531838417053
Validation loss = 0.40253502130508423
Validation loss = 0.3922751843929291
Validation loss = 0.3926292061805725
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.41291314363479614
Validation loss = 0.40462446212768555
Validation loss = 0.3994824290275574
Validation loss = 0.4019063115119934
Validation loss = 0.4022715091705322
Validation loss = 0.392774373292923
Validation loss = 0.4049130380153656
Validation loss = 0.4044976830482483
Validation loss = 0.40224409103393555
Validation loss = 0.4012671113014221
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4025399088859558
Validation loss = 0.385588675737381
Validation loss = 0.3915140628814697
Validation loss = 0.3994629383087158
Validation loss = 0.3920122981071472
Validation loss = 0.40173205733299255
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.41289979219436646
Validation loss = 0.39442500472068787
Validation loss = 0.39238041639328003
Validation loss = 0.391658753156662
Validation loss = 0.3944370150566101
Validation loss = 0.40946483612060547
Validation loss = 0.39152461290359497
Validation loss = 0.39204710721969604
Validation loss = 0.40577536821365356
Validation loss = 0.3912457227706909
Validation loss = 0.39691656827926636
Validation loss = 0.40331482887268066
Validation loss = 0.40146899223327637
Validation loss = 0.39633095264434814
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4219488501548767
Validation loss = 0.39111945033073425
Validation loss = 0.38777661323547363
Validation loss = 0.3907816708087921
Validation loss = 0.39395782351493835
Validation loss = 0.3940897583961487
Validation loss = 0.3943324089050293
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -41.5    |
| Iteration     | 5        |
| MaximumReturn | -0.246   |
| MinimumReturn | -90.3    |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.40119463205337524
Validation loss = 0.39787474274635315
Validation loss = 0.3943844139575958
Validation loss = 0.4102417528629303
Validation loss = 0.4021124839782715
Validation loss = 0.4034790098667145
Validation loss = 0.40995287895202637
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4028477072715759
Validation loss = 0.4055477976799011
Validation loss = 0.4073379635810852
Validation loss = 0.3997999131679535
Validation loss = 0.4066346287727356
Validation loss = 0.40866637229919434
Validation loss = 0.4141351282596588
Validation loss = 0.4027041494846344
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4158574044704437
Validation loss = 0.40093231201171875
Validation loss = 0.39832180738449097
Validation loss = 0.40637239813804626
Validation loss = 0.3911212086677551
Validation loss = 0.4010995030403137
Validation loss = 0.3949339985847473
Validation loss = 0.40405264496803284
Validation loss = 0.3982918858528137
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4182766079902649
Validation loss = 0.41064518690109253
Validation loss = 0.40828341245651245
Validation loss = 0.406451940536499
Validation loss = 0.4056435525417328
Validation loss = 0.4103458523750305
Validation loss = 0.40527018904685974
Validation loss = 0.4162918031215668
Validation loss = 0.4125986099243164
Validation loss = 0.41343408823013306
Validation loss = 0.4108545184135437
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.42169827222824097
Validation loss = 0.3953511118888855
Validation loss = 0.4117927551269531
Validation loss = 0.40115275979042053
Validation loss = 0.40028685331344604
Validation loss = 0.3999699056148529
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.423   |
| Iteration     | 6        |
| MaximumReturn | -0.04    |
| MinimumReturn | -2.42    |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.42449331283569336
Validation loss = 0.41105416417121887
Validation loss = 0.41575464606285095
Validation loss = 0.4140513837337494
Validation loss = 0.4170701205730438
Validation loss = 0.41261813044548035
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4310801923274994
Validation loss = 0.41673216223716736
Validation loss = 0.4239788353443146
Validation loss = 0.4149149954319
Validation loss = 0.4246778190135956
Validation loss = 0.45086362957954407
Validation loss = 0.4202575385570526
Validation loss = 0.4296952784061432
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.41412588953971863
Validation loss = 0.41021981835365295
Validation loss = 0.40482985973358154
Validation loss = 0.414887934923172
Validation loss = 0.42332497239112854
Validation loss = 0.4074784219264984
Validation loss = 0.42790094017982483
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.42971131205558777
Validation loss = 0.4161433279514313
Validation loss = 0.4207037389278412
Validation loss = 0.4300014078617096
Validation loss = 0.4233495891094208
Validation loss = 0.4240904152393341
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4361239969730377
Validation loss = 0.4116971790790558
Validation loss = 0.41525936126708984
Validation loss = 0.4203326404094696
Validation loss = 0.40748825669288635
Validation loss = 0.4166752099990845
Validation loss = 0.41368749737739563
Validation loss = 0.42733481526374817
Validation loss = 0.42459550499916077
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.8     |
| Iteration     | 7        |
| MaximumReturn | -0.0293  |
| MinimumReturn | -21.8    |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.41605281829833984
Validation loss = 0.4121379554271698
Validation loss = 0.4180157780647278
Validation loss = 0.4106065630912781
Validation loss = 0.4060164988040924
Validation loss = 0.40337568521499634
Validation loss = 0.41419652104377747
Validation loss = 0.4185348451137543
Validation loss = 0.4113397002220154
Validation loss = 0.4184097945690155
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4304060935974121
Validation loss = 0.4167521893978119
Validation loss = 0.4205358922481537
Validation loss = 0.4143020808696747
Validation loss = 0.4222407639026642
Validation loss = 0.4153706729412079
Validation loss = 0.4186522960662842
Validation loss = 0.42005929350852966
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.41656896471977234
Validation loss = 0.4143182933330536
Validation loss = 0.423745334148407
Validation loss = 0.42682865262031555
Validation loss = 0.4144287407398224
Validation loss = 0.40990716218948364
Validation loss = 0.4237164855003357
Validation loss = 0.4129752516746521
Validation loss = 0.41327354311943054
Validation loss = 0.4202210009098053
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4337615370750427
Validation loss = 0.4198322594165802
Validation loss = 0.4214975833892822
Validation loss = 0.41519561409950256
Validation loss = 0.41013532876968384
Validation loss = 0.41967445611953735
Validation loss = 0.41742342710494995
Validation loss = 0.42238765954971313
Validation loss = 0.42426013946533203
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4120935797691345
Validation loss = 0.4094577431678772
Validation loss = 0.40994077920913696
Validation loss = 0.40732690691947937
Validation loss = 0.40603160858154297
Validation loss = 0.41164687275886536
Validation loss = 0.4215570092201233
Validation loss = 0.41018441319465637
Validation loss = 0.4126898944377899
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -41.5    |
| Iteration     | 8        |
| MaximumReturn | -0.378   |
| MinimumReturn | -58.7    |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.41809016466140747
Validation loss = 0.41480982303619385
Validation loss = 0.43725064396858215
Validation loss = 0.42836177349090576
Validation loss = 0.4211365580558777
Validation loss = 0.4354664385318756
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.42602646350860596
Validation loss = 0.42956918478012085
Validation loss = 0.43130412697792053
Validation loss = 0.4253203272819519
Validation loss = 0.4285550117492676
Validation loss = 0.4348432719707489
Validation loss = 0.4386645555496216
Validation loss = 0.43183577060699463
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4238807260990143
Validation loss = 0.4343245029449463
Validation loss = 0.41859984397888184
Validation loss = 0.42954498529434204
Validation loss = 0.4251035451889038
Validation loss = 0.4326935112476349
Validation loss = 0.4289146065711975
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.42652201652526855
Validation loss = 0.4346007704734802
Validation loss = 0.43664711713790894
Validation loss = 0.43031927943229675
Validation loss = 0.4352555274963379
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4246103763580322
Validation loss = 0.41708076000213623
Validation loss = 0.42285388708114624
Validation loss = 0.42510297894477844
Validation loss = 0.42988455295562744
Validation loss = 0.43018949031829834
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.145   |
| Iteration     | 9        |
| MaximumReturn | -0.0167  |
| MinimumReturn | -0.981   |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4301803708076477
Validation loss = 0.42981889843940735
Validation loss = 0.42626330256462097
Validation loss = 0.4341050982475281
Validation loss = 0.4351801872253418
Validation loss = 0.43353021144866943
Validation loss = 0.43510687351226807
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4464303255081177
Validation loss = 0.43757352232933044
Validation loss = 0.45472070574760437
Validation loss = 0.4509342908859253
Validation loss = 0.44516289234161377
Validation loss = 0.45379018783569336
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.44054335355758667
Validation loss = 0.4303986132144928
Validation loss = 0.43378838896751404
Validation loss = 0.43678534030914307
Validation loss = 0.43655651807785034
Validation loss = 0.4378717839717865
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4374334514141083
Validation loss = 0.44059693813323975
Validation loss = 0.4509763717651367
Validation loss = 0.44297081232070923
Validation loss = 0.43874311447143555
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.43917936086654663
Validation loss = 0.43481171131134033
Validation loss = 0.4323764443397522
Validation loss = 0.43679383397102356
Validation loss = 0.440959095954895
Validation loss = 0.44538137316703796
Validation loss = 0.4368407428264618
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.336   |
| Iteration     | 10       |
| MaximumReturn | -0.0361  |
| MinimumReturn | -1.4     |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.43615442514419556
Validation loss = 0.43851906061172485
Validation loss = 0.4325043559074402
Validation loss = 0.4317881166934967
Validation loss = 0.4315977990627289
Validation loss = 0.4381070137023926
Validation loss = 0.4368213713169098
Validation loss = 0.44343701004981995
Validation loss = 0.44824665784835815
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.44325223565101624
Validation loss = 0.44608059525489807
Validation loss = 0.45127588510513306
Validation loss = 0.443342924118042
Validation loss = 0.4436371326446533
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4356728494167328
Validation loss = 0.4348798394203186
Validation loss = 0.4379808306694031
Validation loss = 0.44673484563827515
Validation loss = 0.44508472084999084
Validation loss = 0.44359689950942993
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4454723000526428
Validation loss = 0.440694659948349
Validation loss = 0.43756818771362305
Validation loss = 0.4455563426017761
Validation loss = 0.4490925669670105
Validation loss = 0.4570193886756897
Validation loss = 0.4547947347164154
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4450863003730774
Validation loss = 0.4473605751991272
Validation loss = 0.43837517499923706
Validation loss = 0.4526166319847107
Validation loss = 0.4448452889919281
Validation loss = 0.4500334858894348
Validation loss = 0.44557324051856995
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.121   |
| Iteration     | 11       |
| MaximumReturn | -0.0261  |
| MinimumReturn | -0.475   |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4463832974433899
Validation loss = 0.44628143310546875
Validation loss = 0.45197540521621704
Validation loss = 0.4501771926879883
Validation loss = 0.4534969925880432
Validation loss = 0.44751185178756714
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4527588486671448
Validation loss = 0.45242175459861755
Validation loss = 0.45006775856018066
Validation loss = 0.4607137143611908
Validation loss = 0.45869702100753784
Validation loss = 0.4617782235145569
Validation loss = 0.4594420790672302
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4438174366950989
Validation loss = 0.44138890504837036
Validation loss = 0.4468681216239929
Validation loss = 0.450190007686615
Validation loss = 0.46555295586586
Validation loss = 0.46066227555274963
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.45063886046409607
Validation loss = 0.453021764755249
Validation loss = 0.4526042938232422
Validation loss = 0.45796123147010803
Validation loss = 0.46230190992355347
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4491605758666992
Validation loss = 0.4486326277256012
Validation loss = 0.45371013879776
Validation loss = 0.4573766589164734
Validation loss = 0.4548337459564209
Validation loss = 0.4593174457550049
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.349   |
| Iteration     | 12       |
| MaximumReturn | -0.0545  |
| MinimumReturn | -0.798   |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4476167559623718
Validation loss = 0.44779834151268005
Validation loss = 0.45284852385520935
Validation loss = 0.45223140716552734
Validation loss = 0.44803380966186523
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.46056070923805237
Validation loss = 0.45427340269088745
Validation loss = 0.45534420013427734
Validation loss = 0.45748549699783325
Validation loss = 0.45347702503204346
Validation loss = 0.45858001708984375
Validation loss = 0.4634127914905548
Validation loss = 0.4578344523906708
Validation loss = 0.47349464893341064
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4509647488594055
Validation loss = 0.4433649182319641
Validation loss = 0.4518986940383911
Validation loss = 0.45821431279182434
Validation loss = 0.4534339904785156
Validation loss = 0.46036073565483093
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.46230629086494446
Validation loss = 0.4496307075023651
Validation loss = 0.451400488615036
Validation loss = 0.4564538896083832
Validation loss = 0.459930419921875
Validation loss = 0.4556572735309601
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4584527015686035
Validation loss = 0.45517903566360474
Validation loss = 0.44858476519584656
Validation loss = 0.4551468789577484
Validation loss = 0.46568557620048523
Validation loss = 0.4526735544204712
Validation loss = 0.45851513743400574
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.373   |
| Iteration     | 13       |
| MaximumReturn | -0.0574  |
| MinimumReturn | -1.03    |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.45847758650779724
Validation loss = 0.45821234583854675
Validation loss = 0.45710596442222595
Validation loss = 0.4550900161266327
Validation loss = 0.4556001126766205
Validation loss = 0.4603644907474518
Validation loss = 0.4557349383831024
Validation loss = 0.4659212827682495
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4681346118450165
Validation loss = 0.46787285804748535
Validation loss = 0.46471691131591797
Validation loss = 0.4659358263015747
Validation loss = 0.4636955261230469
Validation loss = 0.46802768111228943
Validation loss = 0.47946831583976746
Validation loss = 0.4797604978084564
Validation loss = 0.47787949442863464
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4501115381717682
Validation loss = 0.4674113094806671
Validation loss = 0.4572507441043854
Validation loss = 0.4551060199737549
Validation loss = 0.45868992805480957
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4596419334411621
Validation loss = 0.4591163098812103
Validation loss = 0.45945313572883606
Validation loss = 0.46444961428642273
Validation loss = 0.45707765221595764
Validation loss = 0.4669916331768036
Validation loss = 0.4635298252105713
Validation loss = 0.46880316734313965
Validation loss = 0.4738819897174835
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4580681025981903
Validation loss = 0.46589115262031555
Validation loss = 0.46312272548675537
Validation loss = 0.469743013381958
Validation loss = 0.4650131165981293
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.11    |
| Iteration     | 14       |
| MaximumReturn | -0.0963  |
| MinimumReturn | -14.4    |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.45964255928993225
Validation loss = 0.4558399021625519
Validation loss = 0.46210214495658875
Validation loss = 0.4663030803203583
Validation loss = 0.45923516154289246
Validation loss = 0.46656543016433716
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.47022831439971924
Validation loss = 0.4729202687740326
Validation loss = 0.4803452491760254
Validation loss = 0.48566529154777527
Validation loss = 0.4805830419063568
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.45499661564826965
Validation loss = 0.46034497022628784
Validation loss = 0.45252880454063416
Validation loss = 0.4606572687625885
Validation loss = 0.45938169956207275
Validation loss = 0.4797128140926361
Validation loss = 0.46328336000442505
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.47917842864990234
Validation loss = 0.4636446237564087
Validation loss = 0.46544286608695984
Validation loss = 0.4800805449485779
Validation loss = 0.4828680753707886
Validation loss = 0.4710513651371002
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.46272996068000793
Validation loss = 0.46789294481277466
Validation loss = 0.464969664812088
Validation loss = 0.4738650918006897
Validation loss = 0.46648186445236206
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.976   |
| Iteration     | 15       |
| MaximumReturn | -0.0539  |
| MinimumReturn | -10.7    |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.46414846181869507
Validation loss = 0.4700922966003418
Validation loss = 0.47212454676628113
Validation loss = 0.48291411995887756
Validation loss = 0.473187655210495
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4799823462963104
Validation loss = 0.4876267910003662
Validation loss = 0.48086658120155334
Validation loss = 0.4927864968776703
Validation loss = 0.48827287554740906
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.46141746640205383
Validation loss = 0.4640379548072815
Validation loss = 0.46540138125419617
Validation loss = 0.47869056463241577
Validation loss = 0.48298782110214233
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4758414328098297
Validation loss = 0.47639018297195435
Validation loss = 0.4734324514865875
Validation loss = 0.4875243008136749
Validation loss = 0.4861043393611908
Validation loss = 0.48261985182762146
Validation loss = 0.49288177490234375
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.47240251302719116
Validation loss = 0.46276989579200745
Validation loss = 0.4768048822879791
Validation loss = 0.4754190444946289
Validation loss = 0.4728294014930725
Validation loss = 0.4787169098854065
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -18.4    |
| Iteration     | 16       |
| MaximumReturn | -0.0328  |
| MinimumReturn | -112     |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.465538889169693
Validation loss = 0.4728200137615204
Validation loss = 0.47068125009536743
Validation loss = 0.4692172408103943
Validation loss = 0.47410228848457336
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.48199203610420227
Validation loss = 0.47969117760658264
Validation loss = 0.48605063557624817
Validation loss = 0.49408358335494995
Validation loss = 0.48235833644866943
Validation loss = 0.49939581751823425
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.46372973918914795
Validation loss = 0.47338929772377014
Validation loss = 0.47177547216415405
Validation loss = 0.4767034351825714
Validation loss = 0.4755982756614685
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4753958284854889
Validation loss = 0.4847486615180969
Validation loss = 0.4787306487560272
Validation loss = 0.4833504259586334
Validation loss = 0.4870463013648987
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.47045618295669556
Validation loss = 0.4784897267818451
Validation loss = 0.47564080357551575
Validation loss = 0.48818114399909973
Validation loss = 0.48289331793785095
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10.3    |
| Iteration     | 17       |
| MaximumReturn | -0.0698  |
| MinimumReturn | -50.8    |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4662216305732727
Validation loss = 0.46829476952552795
Validation loss = 0.48291853070259094
Validation loss = 0.4853510856628418
Validation loss = 0.4908005893230438
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.47639065980911255
Validation loss = 0.49108612537384033
Validation loss = 0.4858871102333069
Validation loss = 0.5015466809272766
Validation loss = 0.4988666772842407
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4736696779727936
Validation loss = 0.46913930773735046
Validation loss = 0.47431808710098267
Validation loss = 0.48465222120285034
Validation loss = 0.47952425479888916
Validation loss = 0.4792707562446594
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.47949913144111633
Validation loss = 0.4834136664867401
Validation loss = 0.48114094138145447
Validation loss = 0.4863225519657135
Validation loss = 0.5014306306838989
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.47735950350761414
Validation loss = 0.4799498915672302
Validation loss = 0.48655328154563904
Validation loss = 0.48483484983444214
Validation loss = 0.48576000332832336
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.79    |
| Iteration     | 18       |
| MaximumReturn | -0.0421  |
| MinimumReturn | -42      |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4840080142021179
Validation loss = 0.4808793067932129
Validation loss = 0.48287320137023926
Validation loss = 0.4940076470375061
Validation loss = 0.5028899312019348
Validation loss = 0.49349433183670044
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.490843802690506
Validation loss = 0.4932890236377716
Validation loss = 0.5083770751953125
Validation loss = 0.49782848358154297
Validation loss = 0.5054880380630493
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.48463836312294006
Validation loss = 0.4866950809955597
Validation loss = 0.48374807834625244
Validation loss = 0.4886274039745331
Validation loss = 0.4901139736175537
Validation loss = 0.4944709837436676
Validation loss = 0.5089007616043091
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49208205938339233
Validation loss = 0.4953533411026001
Validation loss = 0.4932340979576111
Validation loss = 0.5016533136367798
Validation loss = 0.49825215339660645
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4881014823913574
Validation loss = 0.49185511469841003
Validation loss = 0.49121004343032837
Validation loss = 0.4936830997467041
Validation loss = 0.4990984797477722
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.77    |
| Iteration     | 19       |
| MaximumReturn | -0.114   |
| MinimumReturn | -14.3    |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.500374972820282
Validation loss = 0.5066351890563965
Validation loss = 0.5039000511169434
Validation loss = 0.504896342754364
Validation loss = 0.49854016304016113
Validation loss = 0.5334073305130005
Validation loss = 0.5182071924209595
Validation loss = 0.5122791528701782
Validation loss = 0.5204991698265076
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5159499049186707
Validation loss = 0.5232959985733032
Validation loss = 0.5164522528648376
Validation loss = 0.5298066139221191
Validation loss = 0.5290225148200989
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5131985545158386
Validation loss = 0.5057320594787598
Validation loss = 0.5158031582832336
Validation loss = 0.5171213150024414
Validation loss = 0.5119332671165466
Validation loss = 0.5202938318252563
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5266196131706238
Validation loss = 0.512362539768219
Validation loss = 0.5203795433044434
Validation loss = 0.5270018577575684
Validation loss = 0.5386252999305725
Validation loss = 0.5397661328315735
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5113713145256042
Validation loss = 0.5269588232040405
Validation loss = 0.5173678994178772
Validation loss = 0.5116361975669861
Validation loss = 0.5175061821937561
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -27.7    |
| Iteration     | 20       |
| MaximumReturn | -0.0381  |
| MinimumReturn | -80.2    |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5420938730239868
Validation loss = 0.5227673053741455
Validation loss = 0.5287749767303467
Validation loss = 0.5265087485313416
Validation loss = 0.5333428978919983
Validation loss = 0.5330978631973267
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5127962231636047
Validation loss = 0.5201294422149658
Validation loss = 0.5278319716453552
Validation loss = 0.5268770456314087
Validation loss = 0.52799391746521
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.514540433883667
Validation loss = 0.5199548602104187
Validation loss = 0.5240141153335571
Validation loss = 0.5274372696876526
Validation loss = 0.529731810092926
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5294813513755798
Validation loss = 0.5272707939147949
Validation loss = 0.5304466485977173
Validation loss = 0.5306494235992432
Validation loss = 0.5382921695709229
Validation loss = 0.5402528047561646
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5174708962440491
Validation loss = 0.5184409022331238
Validation loss = 0.5176804065704346
Validation loss = 0.5215990543365479
Validation loss = 0.5167936682701111
Validation loss = 0.5313553214073181
Validation loss = 0.532451868057251
Validation loss = 0.5290310382843018
Validation loss = 0.5298357009887695
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.94    |
| Iteration     | 21       |
| MaximumReturn | -0.0974  |
| MinimumReturn | -17      |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5327143669128418
Validation loss = 0.5475866198539734
Validation loss = 0.53786700963974
Validation loss = 0.5362794995307922
Validation loss = 0.5431156754493713
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5261917114257812
Validation loss = 0.5302985310554504
Validation loss = 0.5352342128753662
Validation loss = 0.5298773050308228
Validation loss = 0.5309401750564575
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5420777201652527
Validation loss = 0.5404047966003418
Validation loss = 0.526699423789978
Validation loss = 0.5287142395973206
Validation loss = 0.5276365876197815
Validation loss = 0.5368679165840149
Validation loss = 0.5433080792427063
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5328028202056885
Validation loss = 0.5358555316925049
Validation loss = 0.5328553915023804
Validation loss = 0.5456667542457581
Validation loss = 0.5522536039352417
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5381017923355103
Validation loss = 0.5326597094535828
Validation loss = 0.5351671576499939
Validation loss = 0.5411331653594971
Validation loss = 0.540999174118042
Validation loss = 0.5472143292427063
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.365   |
| Iteration     | 22       |
| MaximumReturn | -0.0515  |
| MinimumReturn | -0.748   |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5337192416191101
Validation loss = 0.5360778570175171
Validation loss = 0.5390563011169434
Validation loss = 0.5377956032752991
Validation loss = 0.5418354868888855
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5317818522453308
Validation loss = 0.5220741033554077
Validation loss = 0.5333770513534546
Validation loss = 0.5319054126739502
Validation loss = 0.5312689542770386
Validation loss = 0.5490809082984924
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5345584154129028
Validation loss = 0.5395249724388123
Validation loss = 0.5443068146705627
Validation loss = 0.5428975820541382
Validation loss = 0.5458812713623047
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5427432656288147
Validation loss = 0.5293275117874146
Validation loss = 0.5440734624862671
Validation loss = 0.5476649403572083
Validation loss = 0.5504375696182251
Validation loss = 0.5488866567611694
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5413955450057983
Validation loss = 0.5333338975906372
Validation loss = 0.5447878837585449
Validation loss = 0.5356019735336304
Validation loss = 0.556472897529602
Validation loss = 0.5446299314498901
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.104   |
| Iteration     | 23       |
| MaximumReturn | -0.0241  |
| MinimumReturn | -0.546   |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5507680773735046
Validation loss = 0.5421582460403442
Validation loss = 0.5427067875862122
Validation loss = 0.5542813539505005
Validation loss = 0.5538503527641296
Validation loss = 0.5567905306816101
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5275018811225891
Validation loss = 0.5342275500297546
Validation loss = 0.5354956984519958
Validation loss = 0.5501744151115417
Validation loss = 0.5357433557510376
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5443869829177856
Validation loss = 0.5468277335166931
Validation loss = 0.5421114563941956
Validation loss = 0.5640349984169006
Validation loss = 0.5631260275840759
Validation loss = 0.5545213222503662
Validation loss = 0.5563094615936279
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5489649772644043
Validation loss = 0.5522019267082214
Validation loss = 0.5449116230010986
Validation loss = 0.5510770678520203
Validation loss = 0.553802490234375
Validation loss = 0.5503793358802795
Validation loss = 0.5595266819000244
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5502015352249146
Validation loss = 0.5470197200775146
Validation loss = 0.5534650087356567
Validation loss = 0.5522739887237549
Validation loss = 0.5681738257408142
Validation loss = 0.5504863262176514
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.553   |
| Iteration     | 24       |
| MaximumReturn | -0.0216  |
| MinimumReturn | -3.7     |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5494001507759094
Validation loss = 0.5521071553230286
Validation loss = 0.5469328761100769
Validation loss = 0.5643526315689087
Validation loss = 0.5554415583610535
Validation loss = 0.5624818801879883
Validation loss = 0.5624361634254456
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.529942512512207
Validation loss = 0.5376531481742859
Validation loss = 0.5358862280845642
Validation loss = 0.5489506125450134
Validation loss = 0.5436622500419617
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5496703386306763
Validation loss = 0.5515592098236084
Validation loss = 0.5522540211677551
Validation loss = 0.5481070280075073
Validation loss = 0.5558972954750061
Validation loss = 0.5627087354660034
Validation loss = 0.5639798641204834
Validation loss = 0.5631604194641113
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.559736430644989
Validation loss = 0.5536572337150574
Validation loss = 0.5556389689445496
Validation loss = 0.5596913695335388
Validation loss = 0.5585412383079529
Validation loss = 0.5688461661338806
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5616349577903748
Validation loss = 0.5519195795059204
Validation loss = 0.5506224036216736
Validation loss = 0.5581053495407104
Validation loss = 0.5530438423156738
Validation loss = 0.5637426376342773
Validation loss = 0.5668759942054749
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.316   |
| Iteration     | 25       |
| MaximumReturn | -0.0239  |
| MinimumReturn | -3.05    |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5721287727355957
Validation loss = 0.5602786540985107
Validation loss = 0.5562250018119812
Validation loss = 0.5515062212944031
Validation loss = 0.5646734833717346
Validation loss = 0.5694587230682373
Validation loss = 0.5677279233932495
Validation loss = 0.5682182312011719
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.539180338382721
Validation loss = 0.5323342084884644
Validation loss = 0.5365065932273865
Validation loss = 0.5492656230926514
Validation loss = 0.5413417220115662
Validation loss = 0.555372416973114
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5508423447608948
Validation loss = 0.5553138852119446
Validation loss = 0.5643609762191772
Validation loss = 0.564164400100708
Validation loss = 0.5687315464019775
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5612922310829163
Validation loss = 0.5511842370033264
Validation loss = 0.5593605041503906
Validation loss = 0.5540147423744202
Validation loss = 0.5597666501998901
Validation loss = 0.5736205577850342
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5565253496170044
Validation loss = 0.5528690218925476
Validation loss = 0.5596781373023987
Validation loss = 0.5623514652252197
Validation loss = 0.5655347108840942
Validation loss = 0.5702678561210632
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.48    |
| Iteration     | 26       |
| MaximumReturn | -0.0403  |
| MinimumReturn | -45.1    |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5626198053359985
Validation loss = 0.5731215476989746
Validation loss = 0.5630865693092346
Validation loss = 0.5673701763153076
Validation loss = 0.5704693794250488
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5369414687156677
Validation loss = 0.5426734685897827
Validation loss = 0.5425219535827637
Validation loss = 0.539375364780426
Validation loss = 0.5529695749282837
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5510863065719604
Validation loss = 0.552500307559967
Validation loss = 0.564740777015686
Validation loss = 0.5532697439193726
Validation loss = 0.559482991695404
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5577228665351868
Validation loss = 0.5550701022148132
Validation loss = 0.5623740553855896
Validation loss = 0.5585923790931702
Validation loss = 0.5648069977760315
Validation loss = 0.5607008934020996
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5534297823905945
Validation loss = 0.5673595070838928
Validation loss = 0.5495688319206238
Validation loss = 0.5539964437484741
Validation loss = 0.5752367973327637
Validation loss = 0.5630977153778076
Validation loss = 0.5672717094421387
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -77.1    |
| Iteration     | 27       |
| MaximumReturn | -0.259   |
| MinimumReturn | -155     |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5575733184814453
Validation loss = 0.5428774952888489
Validation loss = 0.5462233424186707
Validation loss = 0.5550148487091064
Validation loss = 0.5499247908592224
Validation loss = 0.5516418814659119
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5263158679008484
Validation loss = 0.5357894897460938
Validation loss = 0.5375276207923889
Validation loss = 0.5405245423316956
Validation loss = 0.5535064935684204
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5507102608680725
Validation loss = 0.5369460582733154
Validation loss = 0.5583415627479553
Validation loss = 0.5492185950279236
Validation loss = 0.5546412467956543
Validation loss = 0.5637167096138
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5446865558624268
Validation loss = 0.5468137860298157
Validation loss = 0.5303575992584229
Validation loss = 0.5473389625549316
Validation loss = 0.557098925113678
Validation loss = 0.561970055103302
Validation loss = 0.5649054050445557
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5591399669647217
Validation loss = 0.546816349029541
Validation loss = 0.5434253811836243
Validation loss = 0.5579962730407715
Validation loss = 0.5642634034156799
Validation loss = 0.5593394637107849
Validation loss = 0.5726349353790283
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -128     |
| Iteration     | 28       |
| MaximumReturn | -86.8    |
| MinimumReturn | -162     |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5300262570381165
Validation loss = 0.5308231711387634
Validation loss = 0.5464066863059998
Validation loss = 0.5390524864196777
Validation loss = 0.5539413690567017
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.526521623134613
Validation loss = 0.5234731435775757
Validation loss = 0.5251972079277039
Validation loss = 0.5355657339096069
Validation loss = 0.536870539188385
Validation loss = 0.5361219644546509
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5310730338096619
Validation loss = 0.5360553860664368
Validation loss = 0.5439359545707703
Validation loss = 0.5387621521949768
Validation loss = 0.5434287786483765
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5329169631004333
Validation loss = 0.5412575602531433
Validation loss = 0.5476622581481934
Validation loss = 0.5417299270629883
Validation loss = 0.5479252934455872
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.529565691947937
Validation loss = 0.5368377566337585
Validation loss = 0.5485933423042297
Validation loss = 0.5434539318084717
Validation loss = 0.5539355278015137
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.6     |
| Iteration     | 29       |
| MaximumReturn | -0.0294  |
| MinimumReturn | -88.9    |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5373116135597229
Validation loss = 0.5523691773414612
Validation loss = 0.5462170243263245
Validation loss = 0.5416048169136047
Validation loss = 0.5561965703964233
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5416086912155151
Validation loss = 0.5374603867530823
Validation loss = 0.5332106947898865
Validation loss = 0.5353856086730957
Validation loss = 0.5397530198097229
Validation loss = 0.5396673679351807
Validation loss = 0.5435238480567932
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5439480543136597
Validation loss = 0.5511224269866943
Validation loss = 0.5362748503684998
Validation loss = 0.5607039928436279
Validation loss = 0.5448577404022217
Validation loss = 0.5464469790458679
Validation loss = 0.5423424243927002
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5376899838447571
Validation loss = 0.5406394004821777
Validation loss = 0.5486365556716919
Validation loss = 0.5489411354064941
Validation loss = 0.5416963696479797
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5527125000953674
Validation loss = 0.5495019555091858
Validation loss = 0.5518180727958679
Validation loss = 0.5746895670890808
Validation loss = 0.5530581474304199
Validation loss = 0.5554863810539246
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.86    |
| Iteration     | 30       |
| MaximumReturn | -0.115   |
| MinimumReturn | -2.24    |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5392852425575256
Validation loss = 0.5496970415115356
Validation loss = 0.5507079362869263
Validation loss = 0.5409013032913208
Validation loss = 0.5580127239227295
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5426653027534485
Validation loss = 0.5441759824752808
Validation loss = 0.5391809940338135
Validation loss = 0.5429036021232605
Validation loss = 0.5483608841896057
Validation loss = 0.544226348400116
Validation loss = 0.549344003200531
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5441721677780151
Validation loss = 0.5519219636917114
Validation loss = 0.5434727072715759
Validation loss = 0.5500824451446533
Validation loss = 0.5486512780189514
Validation loss = 0.5546162128448486
Validation loss = 0.5672009587287903
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5402342081069946
Validation loss = 0.535667359828949
Validation loss = 0.5460110902786255
Validation loss = 0.5489997267723083
Validation loss = 0.5521463751792908
Validation loss = 0.5499544739723206
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5465951561927795
Validation loss = 0.547161877155304
Validation loss = 0.5528873801231384
Validation loss = 0.5551154017448425
Validation loss = 0.5530992150306702
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.1    |
| Iteration     | 31       |
| MaximumReturn | -0.163   |
| MinimumReturn | -86.9    |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5415359139442444
Validation loss = 0.5317967534065247
Validation loss = 0.5415652990341187
Validation loss = 0.556730329990387
Validation loss = 0.5446469783782959
Validation loss = 0.5423454642295837
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5266674757003784
Validation loss = 0.5350042581558228
Validation loss = 0.5372886061668396
Validation loss = 0.5468425750732422
Validation loss = 0.5481418967247009
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5446822047233582
Validation loss = 0.5439735054969788
Validation loss = 0.5415175557136536
Validation loss = 0.5376835465431213
Validation loss = 0.542113721370697
Validation loss = 0.5445700883865356
Validation loss = 0.5434094667434692
Validation loss = 0.5453070998191833
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5514407753944397
Validation loss = 0.5375086069107056
Validation loss = 0.5378797054290771
Validation loss = 0.544161856174469
Validation loss = 0.543187141418457
Validation loss = 0.5625689625740051
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5577062964439392
Validation loss = 0.5440229177474976
Validation loss = 0.5569396018981934
Validation loss = 0.5635440945625305
Validation loss = 0.5480073690414429
Validation loss = 0.552788257598877
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.61    |
| Iteration     | 32       |
| MaximumReturn | -0.0928  |
| MinimumReturn | -28.3    |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5481005311012268
Validation loss = 0.540063202381134
Validation loss = 0.5382766127586365
Validation loss = 0.5444004535675049
Validation loss = 0.5691998600959778
Validation loss = 0.5454888939857483
Validation loss = 0.5495187640190125
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5373926758766174
Validation loss = 0.5373767614364624
Validation loss = 0.5481608510017395
Validation loss = 0.5467554926872253
Validation loss = 0.5429611206054688
Validation loss = 0.5520573854446411
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.543838620185852
Validation loss = 0.5516175031661987
Validation loss = 0.5419466495513916
Validation loss = 0.5517750382423401
Validation loss = 0.5608200430870056
Validation loss = 0.5567153096199036
Validation loss = 0.5502523183822632
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5378736853599548
Validation loss = 0.5404090285301208
Validation loss = 0.5424348711967468
Validation loss = 0.5540069341659546
Validation loss = 0.5503827333450317
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5481029152870178
Validation loss = 0.5516899824142456
Validation loss = 0.5480824112892151
Validation loss = 0.5511117577552795
Validation loss = 0.5513213872909546
Validation loss = 0.5581751465797424
Validation loss = 0.5639024376869202
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -64.6    |
| Iteration     | 33       |
| MaximumReturn | -0.192   |
| MinimumReturn | -149     |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5432969331741333
Validation loss = 0.5401314496994019
Validation loss = 0.5474593639373779
Validation loss = 0.5469827055931091
Validation loss = 0.5488014221191406
Validation loss = 0.5531390905380249
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5363494753837585
Validation loss = 0.5387337803840637
Validation loss = 0.5490481853485107
Validation loss = 0.5425414443016052
Validation loss = 0.5439916253089905
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5446646809577942
Validation loss = 0.560009777545929
Validation loss = 0.5534946322441101
Validation loss = 0.5533786416053772
Validation loss = 0.5680785179138184
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5430641770362854
Validation loss = 0.5495331287384033
Validation loss = 0.5492029786109924
Validation loss = 0.5527756810188293
Validation loss = 0.555031418800354
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5542886257171631
Validation loss = 0.5490918755531311
Validation loss = 0.5627267360687256
Validation loss = 0.5588846802711487
Validation loss = 0.5627392530441284
Validation loss = 0.5581753253936768
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.375   |
| Iteration     | 34       |
| MaximumReturn | -0.107   |
| MinimumReturn | -1.1     |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5615854263305664
Validation loss = 0.5456159114837646
Validation loss = 0.5565448999404907
Validation loss = 0.556808352470398
Validation loss = 0.5585158467292786
Validation loss = 0.5574290752410889
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5512052774429321
Validation loss = 0.5417051315307617
Validation loss = 0.5471333265304565
Validation loss = 0.5491775870323181
Validation loss = 0.5509335994720459
Validation loss = 0.5581952929496765
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.566191554069519
Validation loss = 0.5577687621116638
Validation loss = 0.5578874945640564
Validation loss = 0.5556920170783997
Validation loss = 0.5555369257926941
Validation loss = 0.5711787343025208
Validation loss = 0.5801745057106018
Validation loss = 0.559860110282898
Validation loss = 0.5713065266609192
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5556299686431885
Validation loss = 0.5537577271461487
Validation loss = 0.5510392785072327
Validation loss = 0.5583150386810303
Validation loss = 0.5514822006225586
Validation loss = 0.5561368465423584
Validation loss = 0.5515463948249817
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5590441823005676
Validation loss = 0.5583817958831787
Validation loss = 0.5619612336158752
Validation loss = 0.5653029084205627
Validation loss = 0.5715559124946594
Validation loss = 0.5681714415550232
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.59    |
| Iteration     | 35       |
| MaximumReturn | -0.0827  |
| MinimumReturn | -27.1    |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5564597249031067
Validation loss = 0.5485725998878479
Validation loss = 0.5561531186103821
Validation loss = 0.5604023337364197
Validation loss = 0.5586134195327759
Validation loss = 0.5732870697975159
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5611510872840881
Validation loss = 0.5553281903266907
Validation loss = 0.5528045892715454
Validation loss = 0.5641904473304749
Validation loss = 0.5576954483985901
Validation loss = 0.5579386949539185
Validation loss = 0.5735105872154236
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5687962174415588
Validation loss = 0.5599737167358398
Validation loss = 0.5607988834381104
Validation loss = 0.562961220741272
Validation loss = 0.5676001906394958
Validation loss = 0.5642129778862
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5517862439155579
Validation loss = 0.5586642026901245
Validation loss = 0.5624776482582092
Validation loss = 0.5599404573440552
Validation loss = 0.5546562671661377
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5703887343406677
Validation loss = 0.5660521984100342
Validation loss = 0.5659917593002319
Validation loss = 0.5641117095947266
Validation loss = 0.5734313726425171
Validation loss = 0.5645162463188171
Validation loss = 0.5685886740684509
Validation loss = 0.5706800222396851
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -93.8    |
| Iteration     | 36       |
| MaximumReturn | -24.8    |
| MinimumReturn | -156     |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5619710087776184
Validation loss = 0.5586974620819092
Validation loss = 0.5585982203483582
Validation loss = 0.566907525062561
Validation loss = 0.5574508309364319
Validation loss = 0.5629531145095825
Validation loss = 0.5680939555168152
Validation loss = 0.5770376920700073
Validation loss = 0.5690373778343201
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5478513836860657
Validation loss = 0.5544769763946533
Validation loss = 0.5555961728096008
Validation loss = 0.5604156851768494
Validation loss = 0.561177134513855
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.557762861251831
Validation loss = 0.562462568283081
Validation loss = 0.5707600712776184
Validation loss = 0.570544421672821
Validation loss = 0.5658763647079468
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5487558245658875
Validation loss = 0.5576298236846924
Validation loss = 0.5580487251281738
Validation loss = 0.5607028603553772
Validation loss = 0.5630643963813782
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5585682988166809
Validation loss = 0.5643051266670227
Validation loss = 0.5647861957550049
Validation loss = 0.5730718970298767
Validation loss = 0.5686758756637573
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.1    |
| Iteration     | 37       |
| MaximumReturn | -0.134   |
| MinimumReturn | -153     |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5650182962417603
Validation loss = 0.5642089247703552
Validation loss = 0.5674680471420288
Validation loss = 0.5727882385253906
Validation loss = 0.5749458074569702
Validation loss = 0.570868968963623
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5535776615142822
Validation loss = 0.5561753511428833
Validation loss = 0.5616265535354614
Validation loss = 0.5637398958206177
Validation loss = 0.5604866743087769
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5689971446990967
Validation loss = 0.5652547478675842
Validation loss = 0.5671648383140564
Validation loss = 0.5639462471008301
Validation loss = 0.5622038841247559
Validation loss = 0.5845407247543335
Validation loss = 0.571653425693512
Validation loss = 0.5748196840286255
Validation loss = 0.5780038833618164
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5618057250976562
Validation loss = 0.5619380474090576
Validation loss = 0.5580644607543945
Validation loss = 0.5628230571746826
Validation loss = 0.5609982013702393
Validation loss = 0.5703200101852417
Validation loss = 0.5672423839569092
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5738222002983093
Validation loss = 0.575499415397644
Validation loss = 0.5619459748268127
Validation loss = 0.5661402940750122
Validation loss = 0.5767568349838257
Validation loss = 0.5675466060638428
Validation loss = 0.572658121585846
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -9.77    |
| Iteration     | 38       |
| MaximumReturn | -0.129   |
| MinimumReturn | -145     |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5721210241317749
Validation loss = 0.5653551816940308
Validation loss = 0.5787047147750854
Validation loss = 0.5723492503166199
Validation loss = 0.5770414471626282
Validation loss = 0.5828414559364319
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.557838499546051
Validation loss = 0.560935378074646
Validation loss = 0.559058666229248
Validation loss = 0.5606072545051575
Validation loss = 0.5687732696533203
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5708480477333069
Validation loss = 0.5712498426437378
Validation loss = 0.5670086741447449
Validation loss = 0.5723254084587097
Validation loss = 0.5766783356666565
Validation loss = 0.5736608505249023
Validation loss = 0.5795568227767944
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5646049976348877
Validation loss = 0.5630773305892944
Validation loss = 0.5750764012336731
Validation loss = 0.5666415095329285
Validation loss = 0.5659542083740234
Validation loss = 0.5761657357215881
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5773791074752808
Validation loss = 0.5753783583641052
Validation loss = 0.5797077417373657
Validation loss = 0.5846393704414368
Validation loss = 0.5783289074897766
Validation loss = 0.577877938747406
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.56    |
| Iteration     | 39       |
| MaximumReturn | -0.0903  |
| MinimumReturn | -19.7    |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5801848769187927
Validation loss = 0.5715044736862183
Validation loss = 0.575273334980011
Validation loss = 0.5767331719398499
Validation loss = 0.5721240043640137
Validation loss = 0.5858496427536011
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5637580156326294
Validation loss = 0.5689483880996704
Validation loss = 0.5697529315948486
Validation loss = 0.5680781602859497
Validation loss = 0.5772908329963684
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5785393118858337
Validation loss = 0.5692301392555237
Validation loss = 0.5799422860145569
Validation loss = 0.56831294298172
Validation loss = 0.5745555758476257
Validation loss = 0.5842769742012024
Validation loss = 0.5790134072303772
Validation loss = 0.5824036598205566
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5798421502113342
Validation loss = 0.5662103891372681
Validation loss = 0.5703462362289429
Validation loss = 0.5764554738998413
Validation loss = 0.5748233795166016
Validation loss = 0.5740785002708435
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.581073522567749
Validation loss = 0.5804258584976196
Validation loss = 0.5781352519989014
Validation loss = 0.5770706534385681
Validation loss = 0.5735255479812622
Validation loss = 0.5804890990257263
Validation loss = 0.5861930847167969
Validation loss = 0.5824349522590637
Validation loss = 0.5807686448097229
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.12    |
| Iteration     | 40       |
| MaximumReturn | -0.0771  |
| MinimumReturn | -16.3    |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5791890025138855
Validation loss = 0.5719112753868103
Validation loss = 0.5817192196846008
Validation loss = 0.5799693465232849
Validation loss = 0.5776305198669434
Validation loss = 0.5815525054931641
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5733450651168823
Validation loss = 0.5665677785873413
Validation loss = 0.5743368268013
Validation loss = 0.5769229531288147
Validation loss = 0.5731692910194397
Validation loss = 0.5833549499511719
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5764899253845215
Validation loss = 0.5793672800064087
Validation loss = 0.5771353840827942
Validation loss = 0.5786019563674927
Validation loss = 0.5820122957229614
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5890504717826843
Validation loss = 0.5679402351379395
Validation loss = 0.5788397192955017
Validation loss = 0.5702959299087524
Validation loss = 0.5772197246551514
Validation loss = 0.5810950398445129
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5818192362785339
Validation loss = 0.5792618989944458
Validation loss = 0.5769332647323608
Validation loss = 0.5837843418121338
Validation loss = 0.5833523869514465
Validation loss = 0.5872582197189331
Validation loss = 0.5865678787231445
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.347   |
| Iteration     | 41       |
| MaximumReturn | -0.129   |
| MinimumReturn | -0.965   |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5836216807365417
Validation loss = 0.5741344094276428
Validation loss = 0.5764002799987793
Validation loss = 0.5796002745628357
Validation loss = 0.5876690745353699
Validation loss = 0.5916720628738403
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5782482028007507
Validation loss = 0.5672292709350586
Validation loss = 0.5773621201515198
Validation loss = 0.5741119384765625
Validation loss = 0.5779741406440735
Validation loss = 0.5812687873840332
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5862011909484863
Validation loss = 0.586990237236023
Validation loss = 0.5812956094741821
Validation loss = 0.5803000330924988
Validation loss = 0.5780693888664246
Validation loss = 0.5771205425262451
Validation loss = 0.5884906053543091
Validation loss = 0.5901409387588501
Validation loss = 0.5864430665969849
Validation loss = 0.589480996131897
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5868306159973145
Validation loss = 0.5807027220726013
Validation loss = 0.5699191093444824
Validation loss = 0.5725458264350891
Validation loss = 0.5785581469535828
Validation loss = 0.574582040309906
Validation loss = 0.592220664024353
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.583263099193573
Validation loss = 0.5915406942367554
Validation loss = 0.5842322707176208
Validation loss = 0.5902115702629089
Validation loss = 0.5865811109542847
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.327   |
| Iteration     | 42       |
| MaximumReturn | -0.0943  |
| MinimumReturn | -0.889   |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5773138403892517
Validation loss = 0.5771100521087646
Validation loss = 0.5843244194984436
Validation loss = 0.5802781581878662
Validation loss = 0.5925605297088623
Validation loss = 0.5859251022338867
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5718377828598022
Validation loss = 0.5740892887115479
Validation loss = 0.5831971764564514
Validation loss = 0.581048309803009
Validation loss = 0.574993908405304
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5899646282196045
Validation loss = 0.5817620754241943
Validation loss = 0.5940301418304443
Validation loss = 0.5865322351455688
Validation loss = 0.5859804749488831
Validation loss = 0.6009764671325684
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5872886180877686
Validation loss = 0.5828338861465454
Validation loss = 0.5831524133682251
Validation loss = 0.5838391780853271
Validation loss = 0.5804630517959595
Validation loss = 0.585816502571106
Validation loss = 0.5966232419013977
Validation loss = 0.5877395868301392
Validation loss = 0.5928647518157959
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5799229741096497
Validation loss = 0.5808007717132568
Validation loss = 0.5801600813865662
Validation loss = 0.6078794598579407
Validation loss = 0.5821359157562256
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -31.3    |
| Iteration     | 43       |
| MaximumReturn | -0.326   |
| MinimumReturn | -86.3    |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5854881405830383
Validation loss = 0.5755375623703003
Validation loss = 0.5757853984832764
Validation loss = 0.5808444023132324
Validation loss = 0.5853613018989563
Validation loss = 0.5843313336372375
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5786666870117188
Validation loss = 0.5747591257095337
Validation loss = 0.5779838562011719
Validation loss = 0.5777235627174377
Validation loss = 0.5791482329368591
Validation loss = 0.582423985004425
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5818254351615906
Validation loss = 0.5841188430786133
Validation loss = 0.5901025533676147
Validation loss = 0.5872195959091187
Validation loss = 0.5901443362236023
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5838592648506165
Validation loss = 0.5845809578895569
Validation loss = 0.5845322608947754
Validation loss = 0.5872772932052612
Validation loss = 0.5893217325210571
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5803175568580627
Validation loss = 0.5876151323318481
Validation loss = 0.5864173769950867
Validation loss = 0.5846647024154663
Validation loss = 0.5936902761459351
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26.2    |
| Iteration     | 44       |
| MaximumReturn | -0.322   |
| MinimumReturn | -117     |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5752626657485962
Validation loss = 0.5741559267044067
Validation loss = 0.5794001817703247
Validation loss = 0.5839149355888367
Validation loss = 0.5823893547058105
Validation loss = 0.5961642265319824
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5782948136329651
Validation loss = 0.5753045678138733
Validation loss = 0.5801023840904236
Validation loss = 0.5801101922988892
Validation loss = 0.5841787457466125
Validation loss = 0.5818044543266296
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5818155407905579
Validation loss = 0.5763254761695862
Validation loss = 0.5843789577484131
Validation loss = 0.5902423858642578
Validation loss = 0.5868141651153564
Validation loss = 0.5928682088851929
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5859410166740417
Validation loss = 0.5846636295318604
Validation loss = 0.5825459957122803
Validation loss = 0.5899059176445007
Validation loss = 0.5971716642379761
Validation loss = 0.591264009475708
Validation loss = 0.5964717268943787
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5819487571716309
Validation loss = 0.5832090973854065
Validation loss = 0.579632043838501
Validation loss = 0.5911080241203308
Validation loss = 0.5862252116203308
Validation loss = 0.5883219242095947
Validation loss = 0.59011310338974
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.415   |
| Iteration     | 45       |
| MaximumReturn | -0.0688  |
| MinimumReturn | -1.13    |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.587873637676239
Validation loss = 0.5813922882080078
Validation loss = 0.5859830975532532
Validation loss = 0.5856776833534241
Validation loss = 0.5871137380599976
Validation loss = 0.5893588662147522
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5903867483139038
Validation loss = 0.5821917653083801
Validation loss = 0.5871516466140747
Validation loss = 0.5876768827438354
Validation loss = 0.5905798673629761
Validation loss = 0.5957022309303284
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5896888375282288
Validation loss = 0.5859366655349731
Validation loss = 0.5883663892745972
Validation loss = 0.5953835248947144
Validation loss = 0.5932329893112183
Validation loss = 0.5938740968704224
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6067773699760437
Validation loss = 0.5873373746871948
Validation loss = 0.5851194262504578
Validation loss = 0.5956573486328125
Validation loss = 0.5956742167472839
Validation loss = 0.5895137786865234
Validation loss = 0.5906095504760742
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5893282890319824
Validation loss = 0.5822575092315674
Validation loss = 0.5892153978347778
Validation loss = 0.5999760031700134
Validation loss = 0.5957375764846802
Validation loss = 0.5939215421676636
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.275   |
| Iteration     | 46       |
| MaximumReturn | -0.0867  |
| MinimumReturn | -0.748   |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5812094807624817
Validation loss = 0.5870921015739441
Validation loss = 0.5854318737983704
Validation loss = 0.5901254415512085
Validation loss = 0.5941068530082703
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5892859697341919
Validation loss = 0.5811399221420288
Validation loss = 0.583501935005188
Validation loss = 0.5929931402206421
Validation loss = 0.5938552618026733
Validation loss = 0.5924140810966492
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5929768681526184
Validation loss = 0.5882616639137268
Validation loss = 0.5921767354011536
Validation loss = 0.5925984382629395
Validation loss = 0.5933414697647095
Validation loss = 0.5938920974731445
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5940081477165222
Validation loss = 0.5899664759635925
Validation loss = 0.5942651033401489
Validation loss = 0.5923601388931274
Validation loss = 0.5985989570617676
Validation loss = 0.5943115949630737
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5937963724136353
Validation loss = 0.5921580195426941
Validation loss = 0.5876337885856628
Validation loss = 0.5852441191673279
Validation loss = 0.5956645607948303
Validation loss = 0.5953923463821411
Validation loss = 0.5927057266235352
Validation loss = 0.6006484031677246
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.73    |
| Iteration     | 47       |
| MaximumReturn | -0.12    |
| MinimumReturn | -43.7    |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5848672986030579
Validation loss = 0.5842635035514832
Validation loss = 0.582995593547821
Validation loss = 0.5878728628158569
Validation loss = 0.5831269025802612
Validation loss = 0.5915297269821167
Validation loss = 0.5937265157699585
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5825424194335938
Validation loss = 0.5879623293876648
Validation loss = 0.5940716862678528
Validation loss = 0.5856499671936035
Validation loss = 0.5937391519546509
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5909925699234009
Validation loss = 0.5876268744468689
Validation loss = 0.5868610143661499
Validation loss = 0.5887821912765503
Validation loss = 0.5898719429969788
Validation loss = 0.5846520662307739
Validation loss = 0.5958400368690491
Validation loss = 0.5942904353141785
Validation loss = 0.5933694839477539
Validation loss = 0.5968480706214905
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5933646559715271
Validation loss = 0.5888533592224121
Validation loss = 0.5882669687271118
Validation loss = 0.6002117395401001
Validation loss = 0.5982415080070496
Validation loss = 0.5941206216812134
Validation loss = 0.5948454141616821
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5935843586921692
Validation loss = 0.5919552445411682
Validation loss = 0.5916040539741516
Validation loss = 0.5909903645515442
Validation loss = 0.5928547382354736
Validation loss = 0.5960649847984314
Validation loss = 0.5971617698669434
Validation loss = 0.5944183468818665
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.29    |
| Iteration     | 48       |
| MaximumReturn | -0.117   |
| MinimumReturn | -68.6    |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5863450169563293
Validation loss = 0.6022952198982239
Validation loss = 0.5901286602020264
Validation loss = 0.5885640382766724
Validation loss = 0.589213490486145
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5932289361953735
Validation loss = 0.5866044163703918
Validation loss = 0.5897122025489807
Validation loss = 0.5848332643508911
Validation loss = 0.5904567241668701
Validation loss = 0.5998894572257996
Validation loss = 0.5948909521102905
Validation loss = 0.5972223877906799
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5916640758514404
Validation loss = 0.5922836661338806
Validation loss = 0.604112446308136
Validation loss = 0.5948923826217651
Validation loss = 0.591599702835083
Validation loss = 0.6031253933906555
Validation loss = 0.5956918001174927
Validation loss = 0.6090744733810425
Validation loss = 0.6027789115905762
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.591711163520813
Validation loss = 0.5934281945228577
Validation loss = 0.5985449552536011
Validation loss = 0.5934877991676331
Validation loss = 0.5939542651176453
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6023476123809814
Validation loss = 0.5996103286743164
Validation loss = 0.5970805883407593
Validation loss = 0.5936684608459473
Validation loss = 0.5969793200492859
Validation loss = 0.5987163186073303
Validation loss = 0.5974283218383789
Validation loss = 0.5995998382568359
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -9.23    |
| Iteration     | 49       |
| MaximumReturn | -0.225   |
| MinimumReturn | -66.2    |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5889412760734558
Validation loss = 0.5865728259086609
Validation loss = 0.5872810482978821
Validation loss = 0.5908206701278687
Validation loss = 0.5947022438049316
Validation loss = 0.5934492945671082
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5845454335212708
Validation loss = 0.5887818336486816
Validation loss = 0.5913960337638855
Validation loss = 0.593927800655365
Validation loss = 0.5950103998184204
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5999777913093567
Validation loss = 0.58785480260849
Validation loss = 0.6017630696296692
Validation loss = 0.5957997441291809
Validation loss = 0.5966519117355347
Validation loss = 0.6010115146636963
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.585853636264801
Validation loss = 0.6033176183700562
Validation loss = 0.5865578651428223
Validation loss = 0.5924687385559082
Validation loss = 0.5870315432548523
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5994317531585693
Validation loss = 0.5920297503471375
Validation loss = 0.6004087924957275
Validation loss = 0.5956432819366455
Validation loss = 0.5965414047241211
Validation loss = 0.6025789976119995
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -18.3    |
| Iteration     | 50       |
| MaximumReturn | -0.253   |
| MinimumReturn | -83.7    |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5837899446487427
Validation loss = 0.5917136669158936
Validation loss = 0.5884852409362793
Validation loss = 0.5937303304672241
Validation loss = 0.5978891849517822
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5916176438331604
Validation loss = 0.594195544719696
Validation loss = 0.5888338088989258
Validation loss = 0.5962218642234802
Validation loss = 0.5944139361381531
Validation loss = 0.5971556305885315
Validation loss = 0.5930875539779663
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6202501654624939
Validation loss = 0.5922366976737976
Validation loss = 0.5990502834320068
Validation loss = 0.6056356430053711
Validation loss = 0.605863094329834
Validation loss = 0.5998455286026001
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5926389098167419
Validation loss = 0.5984839200973511
Validation loss = 0.5952066779136658
Validation loss = 0.5910751819610596
Validation loss = 0.5946044921875
Validation loss = 0.5987128019332886
Validation loss = 0.6007792353630066
Validation loss = 0.5986433625221252
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6033511757850647
Validation loss = 0.6015408635139465
Validation loss = 0.599879264831543
Validation loss = 0.5989668369293213
Validation loss = 0.5984055399894714
Validation loss = 0.6057457327842712
Validation loss = 0.6059522032737732
Validation loss = 0.6033627986907959
Validation loss = 0.6046546697616577
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.4     |
| Iteration     | 51       |
| MaximumReturn | -0.0698  |
| MinimumReturn | -17.2    |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5881384611129761
Validation loss = 0.5852020382881165
Validation loss = 0.5883009433746338
Validation loss = 0.5978996157646179
Validation loss = 0.5962464213371277
Validation loss = 0.6000864505767822
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6015522480010986
Validation loss = 0.5962820649147034
Validation loss = 0.592609703540802
Validation loss = 0.5931406617164612
Validation loss = 0.5944060683250427
Validation loss = 0.5994215607643127
Validation loss = 0.5976929068565369
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5967791080474854
Validation loss = 0.5943498611450195
Validation loss = 0.5957940220832825
Validation loss = 0.5946776270866394
Validation loss = 0.605907678604126
Validation loss = 0.6093966960906982
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6033958196640015
Validation loss = 0.5889298915863037
Validation loss = 0.5910577178001404
Validation loss = 0.5996578931808472
Validation loss = 0.6002737283706665
Validation loss = 0.5985988974571228
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.606753408908844
Validation loss = 0.5951133370399475
Validation loss = 0.6028332114219666
Validation loss = 0.6009545922279358
Validation loss = 0.6016817092895508
Validation loss = 0.6060831546783447
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.843   |
| Iteration     | 52       |
| MaximumReturn | -0.2     |
| MinimumReturn | -3.07    |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5874484777450562
Validation loss = 0.5966838002204895
Validation loss = 0.5861132740974426
Validation loss = 0.5903630256652832
Validation loss = 0.5918971300125122
Validation loss = 0.5965544581413269
Validation loss = 0.6021481156349182
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5954577326774597
Validation loss = 0.5887079834938049
Validation loss = 0.5947088599205017
Validation loss = 0.5956605076789856
Validation loss = 0.6067283749580383
Validation loss = 0.5972615480422974
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6029945611953735
Validation loss = 0.5981792211532593
Validation loss = 0.5992119908332825
Validation loss = 0.600128173828125
Validation loss = 0.6059986352920532
Validation loss = 0.6028728485107422
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5945794582366943
Validation loss = 0.5974092483520508
Validation loss = 0.5949234962463379
Validation loss = 0.5982187390327454
Validation loss = 0.6041596531867981
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6105709671974182
Validation loss = 0.5958054661750793
Validation loss = 0.6024000644683838
Validation loss = 0.6041496992111206
Validation loss = 0.6071920990943909
Validation loss = 0.6086649298667908
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10.9    |
| Iteration     | 53       |
| MaximumReturn | -0.0937  |
| MinimumReturn | -64.9    |
| TotalSamples  | 91630    |
----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5983094573020935
Validation loss = 0.5898759961128235
Validation loss = 0.5910583734512329
Validation loss = 0.5979317426681519
Validation loss = 0.5944573879241943
Validation loss = 0.5969617366790771
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5979685187339783
Validation loss = 0.5952259302139282
Validation loss = 0.597228467464447
Validation loss = 0.5957803726196289
Validation loss = 0.5972949266433716
Validation loss = 0.5995767712593079
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.596876323223114
Validation loss = 0.5983063578605652
Validation loss = 0.5968876481056213
Validation loss = 0.59811931848526
Validation loss = 0.6039499640464783
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5946707725524902
Validation loss = 0.6011292338371277
Validation loss = 0.5967854857444763
Validation loss = 0.5965756177902222
Validation loss = 0.601422131061554
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5991387963294983
Validation loss = 0.6034570336341858
Validation loss = 0.608613908290863
Validation loss = 0.6063528060913086
Validation loss = 0.6001531481742859
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -28.6    |
| Iteration     | 54       |
| MaximumReturn | -0.32    |
| MinimumReturn | -134     |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.600826621055603
Validation loss = 0.5982136726379395
Validation loss = 0.6023259162902832
Validation loss = 0.5921116471290588
Validation loss = 0.5990033149719238
Validation loss = 0.5930896997451782
Validation loss = 0.596156120300293
Validation loss = 0.6060774922370911
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5942994952201843
Validation loss = 0.5931395888328552
Validation loss = 0.5926905274391174
Validation loss = 0.5990496277809143
Validation loss = 0.5979859828948975
Validation loss = 0.594519853591919
Validation loss = 0.597686767578125
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5964147448539734
Validation loss = 0.5939971804618835
Validation loss = 0.6012409329414368
Validation loss = 0.5975728631019592
Validation loss = 0.6005516648292542
Validation loss = 0.6001178026199341
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6030896306037903
Validation loss = 0.5936459302902222
Validation loss = 0.5927789807319641
Validation loss = 0.5999221801757812
Validation loss = 0.5984272360801697
Validation loss = 0.5978200435638428
Validation loss = 0.6064130663871765
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5939920544624329
Validation loss = 0.5950770974159241
Validation loss = 0.6031655669212341
Validation loss = 0.5963128805160522
Validation loss = 0.6044471263885498
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.08    |
| Iteration     | 55       |
| MaximumReturn | -0.199   |
| MinimumReturn | -43.6    |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.598527729511261
Validation loss = 0.5943658947944641
Validation loss = 0.5914652943611145
Validation loss = 0.5921849608421326
Validation loss = 0.6067199110984802
Validation loss = 0.6001799702644348
Validation loss = 0.5986210703849792
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5938470959663391
Validation loss = 0.5942204594612122
Validation loss = 0.5957145094871521
Validation loss = 0.5961666703224182
Validation loss = 0.5925561785697937
Validation loss = 0.5984862446784973
Validation loss = 0.608433723449707
Validation loss = 0.5983923077583313
Validation loss = 0.6054708361625671
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5975220203399658
Validation loss = 0.5984264016151428
Validation loss = 0.5985321402549744
Validation loss = 0.6007030010223389
Validation loss = 0.6070008873939514
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6004368662834167
Validation loss = 0.5893381237983704
Validation loss = 0.6011210083961487
Validation loss = 0.6013975739479065
Validation loss = 0.5985444784164429
Validation loss = 0.5957404971122742
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5992735624313354
Validation loss = 0.607649028301239
Validation loss = 0.6015714406967163
Validation loss = 0.5993878245353699
Validation loss = 0.607624351978302
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.72    |
| Iteration     | 56       |
| MaximumReturn | -0.289   |
| MinimumReturn | -101     |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5958648324012756
Validation loss = 0.6024599671363831
Validation loss = 0.601511538028717
Validation loss = 0.5965501666069031
Validation loss = 0.6015586256980896
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5975914001464844
Validation loss = 0.5962142944335938
Validation loss = 0.605908215045929
Validation loss = 0.6025371551513672
Validation loss = 0.6024811863899231
Validation loss = 0.6051422953605652
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5892848372459412
Validation loss = 0.5891888737678528
Validation loss = 0.5938593745231628
Validation loss = 0.5997421145439148
Validation loss = 0.6014657616615295
Validation loss = 0.5972452759742737
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5961551070213318
Validation loss = 0.6026431918144226
Validation loss = 0.596167802810669
Validation loss = 0.5983893275260925
Validation loss = 0.5960593819618225
Validation loss = 0.6050781607627869
Validation loss = 0.6022675633430481
Validation loss = 0.5988556742668152
Validation loss = 0.604896605014801
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5982614159584045
Validation loss = 0.5980919003486633
Validation loss = 0.6050987839698792
Validation loss = 0.5989185571670532
Validation loss = 0.6038839221000671
Validation loss = 0.5975240468978882
Validation loss = 0.6049183011054993
Validation loss = 0.6027540564537048
Validation loss = 0.6099014282226562
Validation loss = 0.6042765974998474
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -51.5    |
| Iteration     | 57       |
| MaximumReturn | -0.311   |
| MinimumReturn | -122     |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5932782888412476
Validation loss = 0.5968749523162842
Validation loss = 0.5961974859237671
Validation loss = 0.5918068289756775
Validation loss = 0.5936076045036316
Validation loss = 0.6000169515609741
Validation loss = 0.5960192680358887
Validation loss = 0.5999733805656433
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5959975123405457
Validation loss = 0.5952267646789551
Validation loss = 0.5946558117866516
Validation loss = 0.6013978719711304
Validation loss = 0.6032474637031555
Validation loss = 0.5952377915382385
Validation loss = 0.6084367632865906
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5892080068588257
Validation loss = 0.5990755558013916
Validation loss = 0.5910371541976929
Validation loss = 0.5913687944412231
Validation loss = 0.599644660949707
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5990415811538696
Validation loss = 0.5975852608680725
Validation loss = 0.5995768904685974
Validation loss = 0.5979293584823608
Validation loss = 0.6000915765762329
Validation loss = 0.5982503890991211
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6047236919403076
Validation loss = 0.6052335500717163
Validation loss = 0.6015368103981018
Validation loss = 0.6021416187286377
Validation loss = 0.5977057814598083
Validation loss = 0.6040634512901306
Validation loss = 0.6072899699211121
Validation loss = 0.6059582233428955
Validation loss = 0.6114134788513184
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.14    |
| Iteration     | 58       |
| MaximumReturn | -0.112   |
| MinimumReturn | -68.6    |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.609428882598877
Validation loss = 0.5966785550117493
Validation loss = 0.5941942930221558
Validation loss = 0.5947334170341492
Validation loss = 0.5930182337760925
Validation loss = 0.6010894775390625
Validation loss = 0.6017953753471375
Validation loss = 0.601746141910553
Validation loss = 0.6038283705711365
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6000198125839233
Validation loss = 0.5932894945144653
Validation loss = 0.6003736257553101
Validation loss = 0.5997003316879272
Validation loss = 0.5944383144378662
Validation loss = 0.6026745438575745
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5952833890914917
Validation loss = 0.5956810116767883
Validation loss = 0.5896207094192505
Validation loss = 0.5879203081130981
Validation loss = 0.5936700701713562
Validation loss = 0.6059231758117676
Validation loss = 0.5918211340904236
Validation loss = 0.6061893701553345
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5975740551948547
Validation loss = 0.6030356884002686
Validation loss = 0.6024759411811829
Validation loss = 0.6016390323638916
Validation loss = 0.6022252440452576
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6025894284248352
Validation loss = 0.6026937961578369
Validation loss = 0.5969582200050354
Validation loss = 0.6043744683265686
Validation loss = 0.607155442237854
Validation loss = 0.6042119264602661
Validation loss = 0.6051705479621887
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.4     |
| Iteration     | 59       |
| MaximumReturn | -0.166   |
| MinimumReturn | -36.5    |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6016546487808228
Validation loss = 0.5987379550933838
Validation loss = 0.5948000550270081
Validation loss = 0.5967251062393188
Validation loss = 0.6011385917663574
Validation loss = 0.6006860136985779
Validation loss = 0.6022586226463318
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6037707328796387
Validation loss = 0.5989083647727966
Validation loss = 0.6005146503448486
Validation loss = 0.5990298986434937
Validation loss = 0.6074278354644775
Validation loss = 0.6000397205352783
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5962393283843994
Validation loss = 0.591091513633728
Validation loss = 0.597413957118988
Validation loss = 0.5933538675308228
Validation loss = 0.6021261811256409
Validation loss = 0.5990906357765198
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5983186960220337
Validation loss = 0.6015663743019104
Validation loss = 0.5922970771789551
Validation loss = 0.5996579527854919
Validation loss = 0.6040613055229187
Validation loss = 0.601266622543335
Validation loss = 0.6014580726623535
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6042795181274414
Validation loss = 0.5984784364700317
Validation loss = 0.6022125482559204
Validation loss = 0.6040334701538086
Validation loss = 0.6073588132858276
Validation loss = 0.6056959629058838
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -54.2    |
| Iteration     | 60       |
| MaximumReturn | -0.177   |
| MinimumReturn | -147     |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5869812965393066
Validation loss = 0.5758913159370422
Validation loss = 0.5876343250274658
Validation loss = 0.5863869786262512
Validation loss = 0.588934063911438
Validation loss = 0.5967399477958679
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5914071202278137
Validation loss = 0.584242045879364
Validation loss = 0.5811429619789124
Validation loss = 0.5884706974029541
Validation loss = 0.5865086913108826
Validation loss = 0.5902989506721497
Validation loss = 0.5898053050041199
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5778928399085999
Validation loss = 0.5763903856277466
Validation loss = 0.5837433934211731
Validation loss = 0.5817127227783203
Validation loss = 0.5840743780136108
Validation loss = 0.5864630937576294
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5777763724327087
Validation loss = 0.5855472683906555
Validation loss = 0.5821747779846191
Validation loss = 0.590009331703186
Validation loss = 0.5895929336547852
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5911201238632202
Validation loss = 0.5931239128112793
Validation loss = 0.5914376378059387
Validation loss = 0.5942879915237427
Validation loss = 0.5906609296798706
Validation loss = 0.6080811619758606
Validation loss = 0.6013814210891724
Validation loss = 0.5951924324035645
Validation loss = 0.5936965346336365
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.682   |
| Iteration     | 61       |
| MaximumReturn | -0.118   |
| MinimumReturn | -3.19    |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5970844030380249
Validation loss = 0.5917435884475708
Validation loss = 0.5891220569610596
Validation loss = 0.5926385521888733
Validation loss = 0.593989372253418
Validation loss = 0.594440221786499
Validation loss = 0.6045611500740051
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.590072512626648
Validation loss = 0.5917918682098389
Validation loss = 0.601421594619751
Validation loss = 0.5909050107002258
Validation loss = 0.6000988483428955
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.597112238407135
Validation loss = 0.5860099792480469
Validation loss = 0.5922900438308716
Validation loss = 0.586181640625
Validation loss = 0.5920493602752686
Validation loss = 0.589024007320404
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5941375494003296
Validation loss = 0.589508056640625
Validation loss = 0.5913665294647217
Validation loss = 0.6000059247016907
Validation loss = 0.5912610292434692
Validation loss = 0.5911574959754944
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5965489745140076
Validation loss = 0.5958195924758911
Validation loss = 0.5991421341896057
Validation loss = 0.6009711027145386
Validation loss = 0.6004787683486938
Validation loss = 0.6033539175987244
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -24.3    |
| Iteration     | 62       |
| MaximumReturn | -0.122   |
| MinimumReturn | -87.4    |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5900391340255737
Validation loss = 0.5898537039756775
Validation loss = 0.588648796081543
Validation loss = 0.592448353767395
Validation loss = 0.5937454104423523
Validation loss = 0.5912684798240662
Validation loss = 0.5938807129859924
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5902136564254761
Validation loss = 0.5909712314605713
Validation loss = 0.5934410095214844
Validation loss = 0.5976591110229492
Validation loss = 0.5968186855316162
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5881586670875549
Validation loss = 0.5863363146781921
Validation loss = 0.5797919034957886
Validation loss = 0.5880031585693359
Validation loss = 0.5876403450965881
Validation loss = 0.5972756147384644
Validation loss = 0.5953177213668823
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5948329567909241
Validation loss = 0.5872600674629211
Validation loss = 0.5869559049606323
Validation loss = 0.5994731783866882
Validation loss = 0.5907906889915466
Validation loss = 0.5918818116188049
Validation loss = 0.5926816463470459
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5969908237457275
Validation loss = 0.5934707522392273
Validation loss = 0.5977891683578491
Validation loss = 0.596629798412323
Validation loss = 0.6036002039909363
Validation loss = 0.5988006591796875
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.7     |
| Iteration     | 63       |
| MaximumReturn | -0.152   |
| MinimumReturn | -23.8    |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5952448844909668
Validation loss = 0.5923110842704773
Validation loss = 0.5908601880073547
Validation loss = 0.5914643406867981
Validation loss = 0.5921593904495239
Validation loss = 0.5956509113311768
Validation loss = 0.5964515209197998
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5934450626373291
Validation loss = 0.5972737669944763
Validation loss = 0.5912365913391113
Validation loss = 0.5901497602462769
Validation loss = 0.5997681617736816
Validation loss = 0.5943670868873596
Validation loss = 0.5972153544425964
Validation loss = 0.5992146134376526
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5889079570770264
Validation loss = 0.5932457447052002
Validation loss = 0.5848462581634521
Validation loss = 0.5942162871360779
Validation loss = 0.5894845128059387
Validation loss = 0.5883641839027405
Validation loss = 0.5933817625045776
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6040717959403992
Validation loss = 0.5933610200881958
Validation loss = 0.5994009375572205
Validation loss = 0.5886396169662476
Validation loss = 0.5932803153991699
Validation loss = 0.5909771919250488
Validation loss = 0.5955004692077637
Validation loss = 0.5955407023429871
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5976044535636902
Validation loss = 0.5916768312454224
Validation loss = 0.5938815474510193
Validation loss = 0.5948628187179565
Validation loss = 0.6017078757286072
Validation loss = 0.5951847434043884
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -59      |
| Iteration     | 64       |
| MaximumReturn | -0.0498  |
| MinimumReturn | -125     |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5929862260818481
Validation loss = 0.5939200520515442
Validation loss = 0.5942935347557068
Validation loss = 0.5891314148902893
Validation loss = 0.5961449146270752
Validation loss = 0.5952183604240417
Validation loss = 0.5951249003410339
Validation loss = 0.5992013812065125
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5934746265411377
Validation loss = 0.5940989255905151
Validation loss = 0.5946254134178162
Validation loss = 0.5967441201210022
Validation loss = 0.5963168144226074
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5859967470169067
Validation loss = 0.5887765884399414
Validation loss = 0.5905864238739014
Validation loss = 0.5900998711585999
Validation loss = 0.5919930338859558
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.592496395111084
Validation loss = 0.5961388945579529
Validation loss = 0.5899369120597839
Validation loss = 0.5960007309913635
Validation loss = 0.5986554026603699
Validation loss = 0.5959774255752563
Validation loss = 0.5967065095901489
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5980591773986816
Validation loss = 0.5980868935585022
Validation loss = 0.5951202511787415
Validation loss = 0.5956699848175049
Validation loss = 0.6001253128051758
Validation loss = 0.6015450954437256
Validation loss = 0.5976012349128723
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -78.9    |
| Iteration     | 65       |
| MaximumReturn | -0.171   |
| MinimumReturn | -136     |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5776383876800537
Validation loss = 0.585623025894165
Validation loss = 0.5875914096832275
Validation loss = 0.5863330364227295
Validation loss = 0.591748833656311
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5837866067886353
Validation loss = 0.5857962369918823
Validation loss = 0.5882085561752319
Validation loss = 0.5935745239257812
Validation loss = 0.5919647812843323
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5785645246505737
Validation loss = 0.5816339254379272
Validation loss = 0.5866348147392273
Validation loss = 0.5857580304145813
Validation loss = 0.5799446702003479
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5920398235321045
Validation loss = 0.5919854640960693
Validation loss = 0.5823816061019897
Validation loss = 0.59105384349823
Validation loss = 0.5874925255775452
Validation loss = 0.58885657787323
Validation loss = 0.5939412713050842
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5876479148864746
Validation loss = 0.5912808775901794
Validation loss = 0.5960009694099426
Validation loss = 0.5979373455047607
Validation loss = 0.5938948392868042
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -77.8    |
| Iteration     | 66       |
| MaximumReturn | -0.165   |
| MinimumReturn | -134     |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5798060297966003
Validation loss = 0.586699366569519
Validation loss = 0.5854458212852478
Validation loss = 0.5883105397224426
Validation loss = 0.5868186950683594
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5781794786453247
Validation loss = 0.585657000541687
Validation loss = 0.5801381468772888
Validation loss = 0.5832744240760803
Validation loss = 0.5893072485923767
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5730400681495667
Validation loss = 0.5765491724014282
Validation loss = 0.5847099423408508
Validation loss = 0.5810349583625793
Validation loss = 0.576781153678894
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5824810266494751
Validation loss = 0.5851131081581116
Validation loss = 0.5895441770553589
Validation loss = 0.5883004069328308
Validation loss = 0.5867862701416016
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.584591805934906
Validation loss = 0.592946469783783
Validation loss = 0.5938202142715454
Validation loss = 0.5876871347427368
Validation loss = 0.5920699238777161
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -90.5    |
| Iteration     | 67       |
| MaximumReturn | -0.398   |
| MinimumReturn | -146     |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5809451937675476
Validation loss = 0.577984631061554
Validation loss = 0.5789371728897095
Validation loss = 0.5806514620780945
Validation loss = 0.5870871543884277
Validation loss = 0.5856211185455322
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5818459987640381
Validation loss = 0.577422559261322
Validation loss = 0.5765619874000549
Validation loss = 0.5788248777389526
Validation loss = 0.5859649777412415
Validation loss = 0.5840846300125122
Validation loss = 0.5815877914428711
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5745823383331299
Validation loss = 0.5794589519500732
Validation loss = 0.5741159319877625
Validation loss = 0.5841593742370605
Validation loss = 0.5806112289428711
Validation loss = 0.5836971402168274
Validation loss = 0.5814908742904663
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.578329861164093
Validation loss = 0.5806800723075867
Validation loss = 0.5843213796615601
Validation loss = 0.5827558636665344
Validation loss = 0.5855226516723633
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5819949507713318
Validation loss = 0.585784375667572
Validation loss = 0.5891852974891663
Validation loss = 0.5820487141609192
Validation loss = 0.5844942331314087
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -105     |
| Iteration     | 68       |
| MaximumReturn | -0.286   |
| MinimumReturn | -160     |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5720697045326233
Validation loss = 0.5770103335380554
Validation loss = 0.5764615535736084
Validation loss = 0.5809009075164795
Validation loss = 0.5825093984603882
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5734934210777283
Validation loss = 0.5823127627372742
Validation loss = 0.58507239818573
Validation loss = 0.5778841376304626
Validation loss = 0.580173909664154
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5769973397254944
Validation loss = 0.5780155658721924
Validation loss = 0.5768579840660095
Validation loss = 0.5838277339935303
Validation loss = 0.5755720734596252
Validation loss = 0.5757465362548828
Validation loss = 0.5787782073020935
Validation loss = 0.5818542242050171
Validation loss = 0.5897717475891113
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5752286314964294
Validation loss = 0.5752302408218384
Validation loss = 0.5841531157493591
Validation loss = 0.5799581408500671
Validation loss = 0.5836511254310608
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5799299478530884
Validation loss = 0.5804639458656311
Validation loss = 0.586422860622406
Validation loss = 0.5864011645317078
Validation loss = 0.5840042233467102
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -49.3    |
| Iteration     | 69       |
| MaximumReturn | -0.649   |
| MinimumReturn | -119     |
| TotalSamples  | 118286   |
----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5753867626190186
Validation loss = 0.5715012550354004
Validation loss = 0.5785539150238037
Validation loss = 0.5738928318023682
Validation loss = 0.5797845125198364
Validation loss = 0.5799803137779236
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5867868065834045
Validation loss = 0.5755175948143005
Validation loss = 0.5800672769546509
Validation loss = 0.5773367881774902
Validation loss = 0.5807940363883972
Validation loss = 0.5804144740104675
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.575709342956543
Validation loss = 0.5746488571166992
Validation loss = 0.5779808759689331
Validation loss = 0.5766425728797913
Validation loss = 0.5780899524688721
Validation loss = 0.5765887498855591
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5718938112258911
Validation loss = 0.577993631362915
Validation loss = 0.5721509456634521
Validation loss = 0.5723108649253845
Validation loss = 0.5788300633430481
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5774124264717102
Validation loss = 0.5888029336929321
Validation loss = 0.577298641204834
Validation loss = 0.5847551226615906
Validation loss = 0.5794244408607483
Validation loss = 0.5860571265220642
Validation loss = 0.579773485660553
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.562   |
| Iteration     | 70       |
| MaximumReturn | -0.11    |
| MinimumReturn | -1.11    |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5795173048973083
Validation loss = 0.5785936713218689
Validation loss = 0.5761014223098755
Validation loss = 0.5781921744346619
Validation loss = 0.5780057311058044
Validation loss = 0.5754621028900146
Validation loss = 0.5761610865592957
Validation loss = 0.5764229893684387
Validation loss = 0.5793125629425049
Validation loss = 0.5799466967582703
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5773760676383972
Validation loss = 0.5797843337059021
Validation loss = 0.5762180685997009
Validation loss = 0.5770522356033325
Validation loss = 0.5812257528305054
Validation loss = 0.5783562064170837
Validation loss = 0.5799080729484558
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5788885354995728
Validation loss = 0.5734812617301941
Validation loss = 0.574039876461029
Validation loss = 0.5786943435668945
Validation loss = 0.5838766098022461
Validation loss = 0.5773683786392212
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5760724544525146
Validation loss = 0.5722207427024841
Validation loss = 0.5775138735771179
Validation loss = 0.5771580934524536
Validation loss = 0.5780830383300781
Validation loss = 0.5792912840843201
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5802884697914124
Validation loss = 0.5844123363494873
Validation loss = 0.5804103016853333
Validation loss = 0.5804722309112549
Validation loss = 0.5825833678245544
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.551   |
| Iteration     | 71       |
| MaximumReturn | -0.085   |
| MinimumReturn | -1.29    |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5765396952629089
Validation loss = 0.5797223448753357
Validation loss = 0.5794001817703247
Validation loss = 0.5843803882598877
Validation loss = 0.5796633958816528
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5784009695053101
Validation loss = 0.5769571661949158
Validation loss = 0.577422559261322
Validation loss = 0.5790351033210754
Validation loss = 0.5829821825027466
Validation loss = 0.583565354347229
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5755450129508972
Validation loss = 0.5737806558609009
Validation loss = 0.5745852589607239
Validation loss = 0.5698565244674683
Validation loss = 0.5767567753791809
Validation loss = 0.5819151401519775
Validation loss = 0.5774494409561157
Validation loss = 0.5785288214683533
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5772565603256226
Validation loss = 0.5766194462776184
Validation loss = 0.5736297369003296
Validation loss = 0.5746010541915894
Validation loss = 0.5767819285392761
Validation loss = 0.5791760683059692
Validation loss = 0.5788987874984741
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5823447108268738
Validation loss = 0.5785524249076843
Validation loss = 0.5779281258583069
Validation loss = 0.5803195238113403
Validation loss = 0.5812236070632935
Validation loss = 0.5860509276390076
Validation loss = 0.5876020789146423
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.548   |
| Iteration     | 72       |
| MaximumReturn | -0.0526  |
| MinimumReturn | -1.53    |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5786715149879456
Validation loss = 0.5762805342674255
Validation loss = 0.5763323903083801
Validation loss = 0.5797753930091858
Validation loss = 0.5800203084945679
Validation loss = 0.5828012824058533
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.578118622303009
Validation loss = 0.5834761261940002
Validation loss = 0.5764777660369873
Validation loss = 0.5819849967956543
Validation loss = 0.5814200639724731
Validation loss = 0.5798733830451965
Validation loss = 0.5819631814956665
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5787310600280762
Validation loss = 0.5731586217880249
Validation loss = 0.5771880745887756
Validation loss = 0.5738445520401001
Validation loss = 0.5777769684791565
Validation loss = 0.580924391746521
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5792063474655151
Validation loss = 0.5811413526535034
Validation loss = 0.571508526802063
Validation loss = 0.5739006400108337
Validation loss = 0.5826538801193237
Validation loss = 0.5804151296615601
Validation loss = 0.5806319713592529
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5827199816703796
Validation loss = 0.5803263187408447
Validation loss = 0.5817234516143799
Validation loss = 0.5840057134628296
Validation loss = 0.5860620737075806
Validation loss = 0.5908759236335754
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.46    |
| Iteration     | 73       |
| MaximumReturn | -0.0674  |
| MinimumReturn | -55.6    |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5777968764305115
Validation loss = 0.579404890537262
Validation loss = 0.5833797454833984
Validation loss = 0.5834366679191589
Validation loss = 0.5802489519119263
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.579708456993103
Validation loss = 0.5800310969352722
Validation loss = 0.5828192830085754
Validation loss = 0.5790807604789734
Validation loss = 0.5806180834770203
Validation loss = 0.5835392475128174
Validation loss = 0.5802403092384338
Validation loss = 0.5802158713340759
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5767592787742615
Validation loss = 0.5758236050605774
Validation loss = 0.5801713466644287
Validation loss = 0.5737678408622742
Validation loss = 0.5865340232849121
Validation loss = 0.5795165300369263
Validation loss = 0.580268383026123
Validation loss = 0.5792416930198669
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5752487778663635
Validation loss = 0.5769438147544861
Validation loss = 0.5742387175559998
Validation loss = 0.5782274007797241
Validation loss = 0.5767934918403625
Validation loss = 0.5772221684455872
Validation loss = 0.5786297917366028
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5833622217178345
Validation loss = 0.5789700746536255
Validation loss = 0.5814915895462036
Validation loss = 0.585063099861145
Validation loss = 0.5836121439933777
Validation loss = 0.5818461179733276
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.343   |
| Iteration     | 74       |
| MaximumReturn | -0.0743  |
| MinimumReturn | -0.639   |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5764750838279724
Validation loss = 0.572158694267273
Validation loss = 0.5784597396850586
Validation loss = 0.5771763920783997
Validation loss = 0.5802306532859802
Validation loss = 0.5774498581886292
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.575386643409729
Validation loss = 0.5749488472938538
Validation loss = 0.5800212621688843
Validation loss = 0.5815995931625366
Validation loss = 0.5806812644004822
Validation loss = 0.5828807950019836
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5791569352149963
Validation loss = 0.5781289339065552
Validation loss = 0.5760759115219116
Validation loss = 0.5774919390678406
Validation loss = 0.5806111097335815
Validation loss = 0.5828344225883484
Validation loss = 0.5821735262870789
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.574184238910675
Validation loss = 0.569718599319458
Validation loss = 0.5813268423080444
Validation loss = 0.5753654837608337
Validation loss = 0.5756697058677673
Validation loss = 0.5814970135688782
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5784302949905396
Validation loss = 0.5804771184921265
Validation loss = 0.576620876789093
Validation loss = 0.5828022956848145
Validation loss = 0.5793041586875916
Validation loss = 0.5839577317237854
Validation loss = 0.5846756100654602
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.32    |
| Iteration     | 75       |
| MaximumReturn | -0.0735  |
| MinimumReturn | -1.05    |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5775238275527954
Validation loss = 0.5781367421150208
Validation loss = 0.5785642862319946
Validation loss = 0.5770573616027832
Validation loss = 0.5838853120803833
Validation loss = 0.5784372687339783
Validation loss = 0.5765409469604492
Validation loss = 0.5766614079475403
Validation loss = 0.5851943492889404
Validation loss = 0.5819018483161926
Validation loss = 0.5814980864524841
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5793765783309937
Validation loss = 0.5779738426208496
Validation loss = 0.5782543420791626
Validation loss = 0.5852779746055603
Validation loss = 0.579878568649292
Validation loss = 0.5834113359451294
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5784040689468384
Validation loss = 0.5805169343948364
Validation loss = 0.5826114416122437
Validation loss = 0.5791202783584595
Validation loss = 0.5786522030830383
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5779625177383423
Validation loss = 0.5783873796463013
Validation loss = 0.5762808322906494
Validation loss = 0.5796573162078857
Validation loss = 0.5789969563484192
Validation loss = 0.5792117118835449
Validation loss = 0.5799505710601807
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5763759613037109
Validation loss = 0.5778196454048157
Validation loss = 0.5850610136985779
Validation loss = 0.579988420009613
Validation loss = 0.5855127573013306
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.78    |
| Iteration     | 76       |
| MaximumReturn | -0.11    |
| MinimumReturn | -86.8    |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5865569114685059
Validation loss = 0.5761638283729553
Validation loss = 0.5792874097824097
Validation loss = 0.5809101462364197
Validation loss = 0.5818917155265808
Validation loss = 0.5793099403381348
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5802256464958191
Validation loss = 0.5778140425682068
Validation loss = 0.5829432010650635
Validation loss = 0.5785062909126282
Validation loss = 0.5779133439064026
Validation loss = 0.5818331837654114
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5784766674041748
Validation loss = 0.5803678631782532
Validation loss = 0.571724534034729
Validation loss = 0.5807709693908691
Validation loss = 0.5796499848365784
Validation loss = 0.5796582698822021
Validation loss = 0.5807361602783203
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5825185775756836
Validation loss = 0.5798540711402893
Validation loss = 0.5802772045135498
Validation loss = 0.5741308331489563
Validation loss = 0.5780916213989258
Validation loss = 0.5782854557037354
Validation loss = 0.5853682160377502
Validation loss = 0.5820331573486328
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5855374336242676
Validation loss = 0.575996994972229
Validation loss = 0.5780683159828186
Validation loss = 0.5808401703834534
Validation loss = 0.5800929069519043
Validation loss = 0.5860071778297424
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.6    |
| Iteration     | 77       |
| MaximumReturn | -0.125   |
| MinimumReturn | -79.5    |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5750967860221863
Validation loss = 0.5759773850440979
Validation loss = 0.5779138803482056
Validation loss = 0.5794937610626221
Validation loss = 0.5799615383148193
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5714611411094666
Validation loss = 0.5783093571662903
Validation loss = 0.5768957734107971
Validation loss = 0.5772703886032104
Validation loss = 0.5847527384757996
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.571260392665863
Validation loss = 0.572487473487854
Validation loss = 0.5778900384902954
Validation loss = 0.5756957530975342
Validation loss = 0.5784404873847961
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5813776254653931
Validation loss = 0.5719783902168274
Validation loss = 0.5790584683418274
Validation loss = 0.5818488597869873
Validation loss = 0.5815205574035645
Validation loss = 0.5868794322013855
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5790706872940063
Validation loss = 0.5750875473022461
Validation loss = 0.5782147645950317
Validation loss = 0.5791434645652771
Validation loss = 0.5771065950393677
Validation loss = 0.5842095017433167
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -16.7    |
| Iteration     | 78       |
| MaximumReturn | -0.277   |
| MinimumReturn | -66.7    |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5749929547309875
Validation loss = 0.574145495891571
Validation loss = 0.5784499049186707
Validation loss = 0.5811454653739929
Validation loss = 0.5801664590835571
Validation loss = 0.5779181718826294
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5740607976913452
Validation loss = 0.5718889832496643
Validation loss = 0.5745699405670166
Validation loss = 0.5801846385002136
Validation loss = 0.578027069568634
Validation loss = 0.5812473297119141
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5733668804168701
Validation loss = 0.5689131021499634
Validation loss = 0.5767048001289368
Validation loss = 0.5757980942726135
Validation loss = 0.5731663107872009
Validation loss = 0.5813273787498474
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.577441394329071
Validation loss = 0.574059784412384
Validation loss = 0.57551109790802
Validation loss = 0.5764272212982178
Validation loss = 0.5746221542358398
Validation loss = 0.581525444984436
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5767018795013428
Validation loss = 0.5747526288032532
Validation loss = 0.580113410949707
Validation loss = 0.5788706541061401
Validation loss = 0.5800310969352722
Validation loss = 0.5842199921607971
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10.4    |
| Iteration     | 79       |
| MaximumReturn | -0.118   |
| MinimumReturn | -80.4    |
| TotalSamples  | 134946   |
----------------------------
