Logging to experiments/invertedPendulum/invertedPendulum/Mon-21-Nov-2022-03-21-48-PM-CST_invertedPendulum_trpo_iteration_20_seed3214
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5209412574768066
Validation loss = 0.2702104151248932
Validation loss = 0.2241651862859726
Validation loss = 0.21239101886749268
Validation loss = 0.19382819533348083
Validation loss = 0.1741570234298706
Validation loss = 0.17105914652347565
Validation loss = 0.15436147153377533
Validation loss = 0.14239801466464996
Validation loss = 0.14314381778240204
Validation loss = 0.12430661171674728
Validation loss = 0.11999814212322235
Validation loss = 0.11534266173839569
Validation loss = 0.09989114850759506
Validation loss = 0.1108015775680542
Validation loss = 0.09982147067785263
Validation loss = 0.11050688475370407
Validation loss = 0.11030161380767822
Validation loss = 0.08288394659757614
Validation loss = 0.08189570158720016
Validation loss = 0.08383820205926895
Validation loss = 0.07014579325914383
Validation loss = 0.07777372002601624
Validation loss = 0.07811053097248077
Validation loss = 0.0621214397251606
Validation loss = 0.06185908988118172
Validation loss = 0.05602426826953888
Validation loss = 0.05909036472439766
Validation loss = 0.07473548501729965
Validation loss = 0.0811195820569992
Validation loss = 0.055650584399700165
Validation loss = 0.060093656182289124
Validation loss = 0.06610273569822311
Validation loss = 0.057279232889413834
Validation loss = 0.048336293548345566
Validation loss = 0.05800410360097885
Validation loss = 0.061362117528915405
Validation loss = 0.052503038197755814
Validation loss = 0.04069804772734642
Validation loss = 0.044697798788547516
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5454102754592896
Validation loss = 0.2751862406730652
Validation loss = 0.22689324617385864
Validation loss = 0.22065991163253784
Validation loss = 0.19921530783176422
Validation loss = 0.18570300936698914
Validation loss = 0.17280298471450806
Validation loss = 0.16656845808029175
Validation loss = 0.14535821974277496
Validation loss = 0.14581558108329773
Validation loss = 0.13923722505569458
Validation loss = 0.12157011777162552
Validation loss = 0.11971617490053177
Validation loss = 0.11215098202228546
Validation loss = 0.120387502014637
Validation loss = 0.09683108329772949
Validation loss = 0.08466749638319016
Validation loss = 0.0826287493109703
Validation loss = 0.07525656372308731
Validation loss = 0.06614977866411209
Validation loss = 0.06899304687976837
Validation loss = 0.07751429080963135
Validation loss = 0.09216528385877609
Validation loss = 0.08460135757923126
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5354605913162231
Validation loss = 0.2744888961315155
Validation loss = 0.22297470271587372
Validation loss = 0.2131502777338028
Validation loss = 0.19640867412090302
Validation loss = 0.17576774954795837
Validation loss = 0.1763255000114441
Validation loss = 0.1642579883337021
Validation loss = 0.14284935593605042
Validation loss = 0.14173397421836853
Validation loss = 0.12026103585958481
Validation loss = 0.11453422904014587
Validation loss = 0.1154506504535675
Validation loss = 0.11597459763288498
Validation loss = 0.1098489910364151
Validation loss = 0.10376280546188354
Validation loss = 0.10283544659614563
Validation loss = 0.10195533186197281
Validation loss = 0.07922331243753433
Validation loss = 0.07856960594654083
Validation loss = 0.07362770289182663
Validation loss = 0.07219556719064713
Validation loss = 0.07648272067308426
Validation loss = 0.08318516612052917
Validation loss = 0.06965592503547668
Validation loss = 0.056339409202337265
Validation loss = 0.056127604097127914
Validation loss = 0.06660860031843185
Validation loss = 0.057534683495759964
Validation loss = 0.05504007637500763
Validation loss = 0.04964033514261246
Validation loss = 0.05540080368518829
Validation loss = 0.05907691642642021
Validation loss = 0.0666530653834343
Validation loss = 0.07872750610113144
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5248486399650574
Validation loss = 0.27905115485191345
Validation loss = 0.2236625850200653
Validation loss = 0.21810688078403473
Validation loss = 0.19697001576423645
Validation loss = 0.1905326396226883
Validation loss = 0.1753864288330078
Validation loss = 0.15543293952941895
Validation loss = 0.14121289551258087
Validation loss = 0.13198325037956238
Validation loss = 0.12560907006263733
Validation loss = 0.11081312596797943
Validation loss = 0.11116959154605865
Validation loss = 0.10428714752197266
Validation loss = 0.11405567079782486
Validation loss = 0.1191646009683609
Validation loss = 0.0953085720539093
Validation loss = 0.08217880874872208
Validation loss = 0.08015226572751999
Validation loss = 0.07299116998910904
Validation loss = 0.06813693791627884
Validation loss = 0.06809325516223907
Validation loss = 0.06258898973464966
Validation loss = 0.08002258837223053
Validation loss = 0.08826342970132828
Validation loss = 0.06260383129119873
Validation loss = 0.06319519132375717
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5412946939468384
Validation loss = 0.2698625922203064
Validation loss = 0.23670268058776855
Validation loss = 0.21611225605010986
Validation loss = 0.2058459371328354
Validation loss = 0.18770381808280945
Validation loss = 0.16940972208976746
Validation loss = 0.15972667932510376
Validation loss = 0.14451639354228973
Validation loss = 0.13539743423461914
Validation loss = 0.1287149041891098
Validation loss = 0.12035822123289108
Validation loss = 0.12075847387313843
Validation loss = 0.11992114037275314
Validation loss = 0.1087268590927124
Validation loss = 0.10581706464290619
Validation loss = 0.09115692973136902
Validation loss = 0.09880289435386658
Validation loss = 0.08028207719326019
Validation loss = 0.07361864298582077
Validation loss = 0.07568959146738052
Validation loss = 0.07115998864173889
Validation loss = 0.06904928386211395
Validation loss = 0.0675680935382843
Validation loss = 0.06196370720863342
Validation loss = 0.06389367580413818
Validation loss = 0.05045424401760101
Validation loss = 0.0652347207069397
Validation loss = 0.07020309567451477
Validation loss = 0.06523150950670242
Validation loss = 0.050423674285411835
Validation loss = 0.0448325090110302
Validation loss = 0.04826616495847702
Validation loss = 0.04951312020421028
Validation loss = 0.06267006695270538
Validation loss = 0.05193173885345459
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.5    |
| Iteration     | 0        |
| MaximumReturn | -0.046   |
| MinimumReturn | -68.7    |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2756941020488739
Validation loss = 0.1640281081199646
Validation loss = 0.13534289598464966
Validation loss = 0.10720790922641754
Validation loss = 0.09180674701929092
Validation loss = 0.08124571293592453
Validation loss = 0.06540093570947647
Validation loss = 0.07973462343215942
Validation loss = 0.05300760641694069
Validation loss = 0.049629323184490204
Validation loss = 0.05176948010921478
Validation loss = 0.05788551643490791
Validation loss = 0.0582105852663517
Validation loss = 0.04885096848011017
Validation loss = 0.04863112419843674
Validation loss = 0.04846693575382233
Validation loss = 0.04631631076335907
Validation loss = 0.043369509279727936
Validation loss = 0.04289302974939346
Validation loss = 0.0341784842312336
Validation loss = 0.031081389635801315
Validation loss = 0.03850950673222542
Validation loss = 0.039104294031858444
Validation loss = 0.036210499703884125
Validation loss = 0.030105406418442726
Validation loss = 0.029995061457157135
Validation loss = 0.03421687334775925
Validation loss = 0.040635667741298676
Validation loss = 0.03407011181116104
Validation loss = 0.032529231160879135
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.21625694632530212
Validation loss = 0.1169532760977745
Validation loss = 0.07925424724817276
Validation loss = 0.060785409063100815
Validation loss = 0.05763094872236252
Validation loss = 0.0830184668302536
Validation loss = 0.05595103278756142
Validation loss = 0.06390710920095444
Validation loss = 0.051331304013729095
Validation loss = 0.07334600389003754
Validation loss = 0.047837842255830765
Validation loss = 0.049140460789203644
Validation loss = 0.044904351234436035
Validation loss = 0.044324006885290146
Validation loss = 0.051223427057266235
Validation loss = 0.03816618025302887
Validation loss = 0.05435774847865105
Validation loss = 0.044746819883584976
Validation loss = 0.036630235612392426
Validation loss = 0.03329864889383316
Validation loss = 0.04201466590166092
Validation loss = 0.06444340199232101
Validation loss = 0.05216117203235626
Validation loss = 0.05215556174516678
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.19991621375083923
Validation loss = 0.11174190789461136
Validation loss = 0.07912983000278473
Validation loss = 0.07436215132474899
Validation loss = 0.05793505907058716
Validation loss = 0.0487113893032074
Validation loss = 0.04989819973707199
Validation loss = 0.0448400117456913
Validation loss = 0.043145857751369476
Validation loss = 0.044866569340229034
Validation loss = 0.0368729829788208
Validation loss = 0.03548221290111542
Validation loss = 0.05138712748885155
Validation loss = 0.0463545061647892
Validation loss = 0.0654103085398674
Validation loss = 0.044858839362859726
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.22032788395881653
Validation loss = 0.1232018694281578
Validation loss = 0.08948387950658798
Validation loss = 0.07125263661146164
Validation loss = 0.06191175431013107
Validation loss = 0.05869824066758156
Validation loss = 0.057589102536439896
Validation loss = 0.04977408051490784
Validation loss = 0.05916844308376312
Validation loss = 0.042587652802467346
Validation loss = 0.04576713591814041
Validation loss = 0.0801515132188797
Validation loss = 0.06558941304683685
Validation loss = 0.04159009829163551
Validation loss = 0.05154486373066902
Validation loss = 0.045955412089824677
Validation loss = 0.03732858598232269
Validation loss = 0.03808213770389557
Validation loss = 0.044050734490156174
Validation loss = 0.038510438054800034
Validation loss = 0.044095173478126526
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.25399473309516907
Validation loss = 0.14883030951023102
Validation loss = 0.11816242337226868
Validation loss = 0.08856154978275299
Validation loss = 0.08495501428842545
Validation loss = 0.06504389643669128
Validation loss = 0.051667239516973495
Validation loss = 0.06929156929254532
Validation loss = 0.05252160131931305
Validation loss = 0.052782874554395676
Validation loss = 0.05418625846505165
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0455  |
| Iteration     | 1        |
| MaximumReturn | -0.0206  |
| MinimumReturn | -0.276   |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.23377063870429993
Validation loss = 0.15702024102210999
Validation loss = 0.12691162526607513
Validation loss = 0.08590970933437347
Validation loss = 0.10174356400966644
Validation loss = 0.11974047124385834
Validation loss = 0.10048410296440125
Validation loss = 0.0695267915725708
Validation loss = 0.0737428143620491
Validation loss = 0.06790190935134888
Validation loss = 0.06600435078144073
Validation loss = 0.06022391468286514
Validation loss = 0.055528976023197174
Validation loss = 0.06073977053165436
Validation loss = 0.05869394913315773
Validation loss = 0.05860275402665138
Validation loss = 0.06873931735754013
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.16006261110305786
Validation loss = 0.11055012792348862
Validation loss = 0.08695542812347412
Validation loss = 0.07992509007453918
Validation loss = 0.0778246521949768
Validation loss = 0.07061682641506195
Validation loss = 0.09107346832752228
Validation loss = 0.07816712558269501
Validation loss = 0.07282993942499161
Validation loss = 0.07511088997125626
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.19597868621349335
Validation loss = 0.11990814656019211
Validation loss = 0.09946019947528839
Validation loss = 0.07998301088809967
Validation loss = 0.07654847204685211
Validation loss = 0.1039675772190094
Validation loss = 0.07577871531248093
Validation loss = 0.10148970782756805
Validation loss = 0.08114587515592575
Validation loss = 0.07866954058408737
Validation loss = 0.094356968998909
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.19161924719810486
Validation loss = 0.1201343983411789
Validation loss = 0.10555432736873627
Validation loss = 0.09321852773427963
Validation loss = 0.0712578296661377
Validation loss = 0.08140496909618378
Validation loss = 0.08265002071857452
Validation loss = 0.0818975567817688
Validation loss = 0.06914055347442627
Validation loss = 0.04960200935602188
Validation loss = 0.0688796192407608
Validation loss = 0.07885157316923141
Validation loss = 0.06701017916202545
Validation loss = 0.0574762299656868
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.21166083216667175
Validation loss = 0.1024375781416893
Validation loss = 0.0862920880317688
Validation loss = 0.0735028088092804
Validation loss = 0.08968742191791534
Validation loss = 0.08252286911010742
Validation loss = 0.07792238891124725
Validation loss = 0.0819183886051178
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -34.9    |
| Iteration     | 2        |
| MaximumReturn | -0.716   |
| MinimumReturn | -68.1    |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.176094651222229
Validation loss = 0.049505818635225296
Validation loss = 0.05069901421666145
Validation loss = 0.03438932076096535
Validation loss = 0.02910281904041767
Validation loss = 0.025333702564239502
Validation loss = 0.02893914096057415
Validation loss = 0.020889846608042717
Validation loss = 0.020316768437623978
Validation loss = 0.01677556522190571
Validation loss = 0.017714213579893112
Validation loss = 0.018292196094989777
Validation loss = 0.019615402445197105
Validation loss = 0.017428385093808174
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14003120362758636
Validation loss = 0.05012231692671776
Validation loss = 0.054047197103500366
Validation loss = 0.04817262291908264
Validation loss = 0.036514636129140854
Validation loss = 0.02495705522596836
Validation loss = 0.02599257230758667
Validation loss = 0.02224763296544552
Validation loss = 0.024287888780236244
Validation loss = 0.029636455699801445
Validation loss = 0.02137289009988308
Validation loss = 0.028742240741848946
Validation loss = 0.024045458063483238
Validation loss = 0.01712806150317192
Validation loss = 0.02433920092880726
Validation loss = 0.031020065769553185
Validation loss = 0.022166749462485313
Validation loss = 0.015898479148745537
Validation loss = 0.013655241578817368
Validation loss = 0.016492372378706932
Validation loss = 0.013651692308485508
Validation loss = 0.014798064716160297
Validation loss = 0.017879778519272804
Validation loss = 0.014702660031616688
Validation loss = 0.01808098517358303
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16106979548931122
Validation loss = 0.0529186874628067
Validation loss = 0.045199837535619736
Validation loss = 0.035152800381183624
Validation loss = 0.0405692420899868
Validation loss = 0.03617293760180473
Validation loss = 0.028212757781147957
Validation loss = 0.022444747388362885
Validation loss = 0.0211933720856905
Validation loss = 0.030816679820418358
Validation loss = 0.02966219000518322
Validation loss = 0.023996984586119652
Validation loss = 0.03208862990140915
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.13293440639972687
Validation loss = 0.07375670969486237
Validation loss = 0.05858202651143074
Validation loss = 0.040624238550662994
Validation loss = 0.032276544719934464
Validation loss = 0.027803586795926094
Validation loss = 0.020826760679483414
Validation loss = 0.030361657962203026
Validation loss = 0.0261495653539896
Validation loss = 0.0359838493168354
Validation loss = 0.02425559051334858
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1589692384004593
Validation loss = 0.059615593403577805
Validation loss = 0.03819752857089043
Validation loss = 0.03177906572818756
Validation loss = 0.038356754928827286
Validation loss = 0.026141511276364326
Validation loss = 0.02449677139520645
Validation loss = 0.02788093499839306
Validation loss = 0.026454010978341103
Validation loss = 0.02402273751795292
Validation loss = 0.021085968241095543
Validation loss = 0.023103438317775726
Validation loss = 0.029879381880164146
Validation loss = 0.020751724019646645
Validation loss = 0.01711718924343586
Validation loss = 0.016210036352276802
Validation loss = 0.019804516807198524
Validation loss = 0.020901689305901527
Validation loss = 0.024331264197826385
Validation loss = 0.0261443629860878
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -32.3    |
| Iteration     | 3        |
| MaximumReturn | -0.196   |
| MinimumReturn | -82.8    |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07247897237539291
Validation loss = 0.025360696017742157
Validation loss = 0.02159125730395317
Validation loss = 0.018440162762999535
Validation loss = 0.012884583324193954
Validation loss = 0.014653882943093777
Validation loss = 0.012840623036026955
Validation loss = 0.019651373848319054
Validation loss = 0.017973359674215317
Validation loss = 0.015049466863274574
Validation loss = 0.013087766245007515
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1043749451637268
Validation loss = 0.02627471089363098
Validation loss = 0.027047550305724144
Validation loss = 0.02058444730937481
Validation loss = 0.019768133759498596
Validation loss = 0.014727681875228882
Validation loss = 0.013197915628552437
Validation loss = 0.015081465244293213
Validation loss = 0.015822645276784897
Validation loss = 0.01518863532692194
Validation loss = 0.012556435540318489
Validation loss = 0.018746893852949142
Validation loss = 0.01482605841010809
Validation loss = 0.015173245221376419
Validation loss = 0.01102151907980442
Validation loss = 0.009042421355843544
Validation loss = 0.018094688653945923
Validation loss = 0.015699615702033043
Validation loss = 0.01145290769636631
Validation loss = 0.01464831456542015
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06997667253017426
Validation loss = 0.026984358206391335
Validation loss = 0.019779331982135773
Validation loss = 0.022012297064065933
Validation loss = 0.013754986226558685
Validation loss = 0.020946795120835304
Validation loss = 0.014424597844481468
Validation loss = 0.016364000737667084
Validation loss = 0.014195789583027363
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08591508865356445
Validation loss = 0.03129829466342926
Validation loss = 0.022761432453989983
Validation loss = 0.019289281219244003
Validation loss = 0.015891902148723602
Validation loss = 0.015783561393618584
Validation loss = 0.019423924386501312
Validation loss = 0.01671966351568699
Validation loss = 0.016907617449760437
Validation loss = 0.015788903459906578
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10106496512889862
Validation loss = 0.031765490770339966
Validation loss = 0.02137879468500614
Validation loss = 0.025974303483963013
Validation loss = 0.026425441727042198
Validation loss = 0.0159064382314682
Validation loss = 0.014560770243406296
Validation loss = 0.015093257650732994
Validation loss = 0.014079999178647995
Validation loss = 0.015024697408080101
Validation loss = 0.015216322615742683
Validation loss = 0.01334181148558855
Validation loss = 0.014978181570768356
Validation loss = 0.010363610461354256
Validation loss = 0.011503981426358223
Validation loss = 0.019622545689344406
Validation loss = 0.014065923169255257
Validation loss = 0.013705646619200706
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.198   |
| Iteration     | 4        |
| MaximumReturn | -0.104   |
| MinimumReturn | -0.297   |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04237600415945053
Validation loss = 0.012959763407707214
Validation loss = 0.009514780715107918
Validation loss = 0.009411021135747433
Validation loss = 0.009465191513299942
Validation loss = 0.009252125397324562
Validation loss = 0.008124111220240593
Validation loss = 0.007538133300840855
Validation loss = 0.006486057303845882
Validation loss = 0.010653188452124596
Validation loss = 0.010486388579010963
Validation loss = 0.007512029260396957
Validation loss = 0.009693704545497894
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07136837393045425
Validation loss = 0.015046495012938976
Validation loss = 0.013494612649083138
Validation loss = 0.02203856036067009
Validation loss = 0.010393857955932617
Validation loss = 0.013464836403727531
Validation loss = 0.006887965835630894
Validation loss = 0.007702816277742386
Validation loss = 0.008630460128188133
Validation loss = 0.007916943170130253
Validation loss = 0.00619188928976655
Validation loss = 0.006776183843612671
Validation loss = 0.007556799799203873
Validation loss = 0.007518076803535223
Validation loss = 0.006336644291877747
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.057413775473833084
Validation loss = 0.014433707110583782
Validation loss = 0.011551004834473133
Validation loss = 0.009162425063550472
Validation loss = 0.009421395137906075
Validation loss = 0.011708630248904228
Validation loss = 0.009331239387392998
Validation loss = 0.008450610563158989
Validation loss = 0.00886957161128521
Validation loss = 0.007232343312352896
Validation loss = 0.011901573278009892
Validation loss = 0.012865731492638588
Validation loss = 0.018772605806589127
Validation loss = 0.01079418882727623
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06764331459999084
Validation loss = 0.016857940703630447
Validation loss = 0.017747825011610985
Validation loss = 0.011744672432541847
Validation loss = 0.011954274028539658
Validation loss = 0.010817608796060085
Validation loss = 0.010126451961696148
Validation loss = 0.008689508773386478
Validation loss = 0.008649593219161034
Validation loss = 0.012177055701613426
Validation loss = 0.01205388642847538
Validation loss = 0.01197646465152502
Validation loss = 0.010479962453246117
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05934014916419983
Validation loss = 0.012328698299825191
Validation loss = 0.008492538705468178
Validation loss = 0.007178317755460739
Validation loss = 0.008368117734789848
Validation loss = 0.011043261736631393
Validation loss = 0.00899435393512249
Validation loss = 0.0095875458791852
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00306 |
| Iteration     | 5        |
| MaximumReturn | -0.00216 |
| MinimumReturn | -0.00401 |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07513155788183212
Validation loss = 0.02729279175400734
Validation loss = 0.0187270175665617
Validation loss = 0.012354381382465363
Validation loss = 0.013535614125430584
Validation loss = 0.0130715761333704
Validation loss = 0.010782476514577866
Validation loss = 0.010991170071065426
Validation loss = 0.013952827081084251
Validation loss = 0.009304576553404331
Validation loss = 0.009931196458637714
Validation loss = 0.011315745301544666
Validation loss = 0.012211563065648079
Validation loss = 0.014346634037792683
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026757344603538513
Validation loss = 0.00950385257601738
Validation loss = 0.020117299631237984
Validation loss = 0.008808811195194721
Validation loss = 0.009829742833971977
Validation loss = 0.007914811372756958
Validation loss = 0.007239270024001598
Validation loss = 0.007113644387573004
Validation loss = 0.012879909947514534
Validation loss = 0.013477685861289501
Validation loss = 0.01842334121465683
Validation loss = 0.00855300109833479
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.029268790036439896
Validation loss = 0.0204855315387249
Validation loss = 0.02783941850066185
Validation loss = 0.016366368159651756
Validation loss = 0.010153429582715034
Validation loss = 0.011379962787032127
Validation loss = 0.015483567491173744
Validation loss = 0.011775383725762367
Validation loss = 0.012967893853783607
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02253098413348198
Validation loss = 0.011342163197696209
Validation loss = 0.009872457012534142
Validation loss = 0.013050176203250885
Validation loss = 0.011998457834124565
Validation loss = 0.012670676223933697
Validation loss = 0.010392987169325352
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03993820399045944
Validation loss = 0.011995633132755756
Validation loss = 0.010396919213235378
Validation loss = 0.010993773117661476
Validation loss = 0.014366557821631432
Validation loss = 0.009259100072085857
Validation loss = 0.012019471265375614
Validation loss = 0.013338792137801647
Validation loss = 0.014997621066868305
Validation loss = 0.009745640680193901
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -49.9    |
| Iteration     | 6        |
| MaximumReturn | -20.4    |
| MinimumReturn | -77.4    |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04034249484539032
Validation loss = 0.010644841007888317
Validation loss = 0.009911012835800648
Validation loss = 0.009039080701768398
Validation loss = 0.006853299681097269
Validation loss = 0.004904871340841055
Validation loss = 0.0060315742157399654
Validation loss = 0.006406668573617935
Validation loss = 0.005031744483858347
Validation loss = 0.006178116425871849
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03663840889930725
Validation loss = 0.009647977538406849
Validation loss = 0.008653386496007442
Validation loss = 0.00630485825240612
Validation loss = 0.005218541715294123
Validation loss = 0.00491314148530364
Validation loss = 0.007314989808946848
Validation loss = 0.007046188693493605
Validation loss = 0.006571227218955755
Validation loss = 0.006118821445852518
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.033302728086709976
Validation loss = 0.008908460848033428
Validation loss = 0.005937508773058653
Validation loss = 0.008268080651760101
Validation loss = 0.005298927426338196
Validation loss = 0.005163476802408695
Validation loss = 0.00649205083027482
Validation loss = 0.005207719746977091
Validation loss = 0.005005382001399994
Validation loss = 0.005301614757627249
Validation loss = 0.008409247733652592
Validation loss = 0.006624904926866293
Validation loss = 0.005739898886531591
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03816039487719536
Validation loss = 0.009238884784281254
Validation loss = 0.008655487559735775
Validation loss = 0.006083237007260323
Validation loss = 0.007949222810566425
Validation loss = 0.007504998240619898
Validation loss = 0.007627025246620178
Validation loss = 0.006093525793403387
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.052836570888757706
Validation loss = 0.012192249298095703
Validation loss = 0.0076575507409870625
Validation loss = 0.007505252491682768
Validation loss = 0.006361185107380152
Validation loss = 0.0077307503670454025
Validation loss = 0.006719933822751045
Validation loss = 0.008236239664256573
Validation loss = 0.007624579127877951
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.6    |
| Iteration     | 7        |
| MaximumReturn | -0.186   |
| MinimumReturn | -60.7    |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015575779601931572
Validation loss = 0.006861989852041006
Validation loss = 0.003870694199576974
Validation loss = 0.004279094282537699
Validation loss = 0.004033402539789677
Validation loss = 0.0032286769710481167
Validation loss = 0.00382733391597867
Validation loss = 0.004297930281609297
Validation loss = 0.004044235218316317
Validation loss = 0.004401740618050098
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019545968621969223
Validation loss = 0.006120858248323202
Validation loss = 0.004593808203935623
Validation loss = 0.005328045226633549
Validation loss = 0.0036218140739947557
Validation loss = 0.0038959321100264788
Validation loss = 0.004538343288004398
Validation loss = 0.003913433291018009
Validation loss = 0.00512643251568079
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01996278204023838
Validation loss = 0.009306726977229118
Validation loss = 0.004697211552411318
Validation loss = 0.00584461959078908
Validation loss = 0.004063916392624378
Validation loss = 0.007149753160774708
Validation loss = 0.004412462469190359
Validation loss = 0.004179076757282019
Validation loss = 0.004529170226305723
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013551177456974983
Validation loss = 0.004749556537717581
Validation loss = 0.005026867147535086
Validation loss = 0.004145151469856501
Validation loss = 0.005219563841819763
Validation loss = 0.004224945791065693
Validation loss = 0.004382642451673746
Validation loss = 0.005064026452600956
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02464914694428444
Validation loss = 0.007355710957199335
Validation loss = 0.005433687008917332
Validation loss = 0.006617204286158085
Validation loss = 0.005087147932499647
Validation loss = 0.00435158284381032
Validation loss = 0.005213264375925064
Validation loss = 0.006251624785363674
Validation loss = 0.005490154959261417
Validation loss = 0.005923706106841564
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -71      |
| Iteration     | 8        |
| MaximumReturn | -21.5    |
| MinimumReturn | -96.5    |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03406001254916191
Validation loss = 0.008040370419621468
Validation loss = 0.006812151521444321
Validation loss = 0.004587424919009209
Validation loss = 0.003991193603724241
Validation loss = 0.004871313460171223
Validation loss = 0.004888187628239393
Validation loss = 0.003907707519829273
Validation loss = 0.003343193558976054
Validation loss = 0.00392391812056303
Validation loss = 0.0036739313509315252
Validation loss = 0.003645427990704775
Validation loss = 0.00513392873108387
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.038760267198085785
Validation loss = 0.009453055448830128
Validation loss = 0.007174259051680565
Validation loss = 0.005493508651852608
Validation loss = 0.003986754454672337
Validation loss = 0.0036790824960917234
Validation loss = 0.0035831909626722336
Validation loss = 0.0032676029950380325
Validation loss = 0.004078903701156378
Validation loss = 0.004394846502691507
Validation loss = 0.003893063636496663
Validation loss = 0.003294844413176179
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0881207212805748
Validation loss = 0.010966562665998936
Validation loss = 0.008436929434537888
Validation loss = 0.004986554384231567
Validation loss = 0.0042680902406573296
Validation loss = 0.005203276872634888
Validation loss = 0.004349458962678909
Validation loss = 0.0036859926767647266
Validation loss = 0.003602770157158375
Validation loss = 0.0057240622118115425
Validation loss = 0.00428449921309948
Validation loss = 0.004650854505598545
Validation loss = 0.003533024340867996
Validation loss = 0.004927113652229309
Validation loss = 0.0051287091337144375
Validation loss = 0.003507846500724554
Validation loss = 0.008295832201838493
Validation loss = 0.0035533832851797342
Validation loss = 0.0037925229407846928
Validation loss = 0.003585051279515028
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04403999447822571
Validation loss = 0.009914517402648926
Validation loss = 0.007101076655089855
Validation loss = 0.0052171433344483376
Validation loss = 0.004667636938393116
Validation loss = 0.003934459760785103
Validation loss = 0.003941510338336229
Validation loss = 0.005234782118350267
Validation loss = 0.005104648414999247
Validation loss = 0.004398435819894075
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04813896119594574
Validation loss = 0.014058472588658333
Validation loss = 0.005891143344342709
Validation loss = 0.004867129027843475
Validation loss = 0.00491253100335598
Validation loss = 0.003958055749535561
Validation loss = 0.005173013545572758
Validation loss = 0.0038806835655122995
Validation loss = 0.004161305259913206
Validation loss = 0.0035464658867567778
Validation loss = 0.0033782958053052425
Validation loss = 0.0032716337591409683
Validation loss = 0.003914262168109417
Validation loss = 0.004010261967778206
Validation loss = 0.0037601818330585957
Validation loss = 0.0044701844453811646
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -24.9    |
| Iteration     | 9        |
| MaximumReturn | -0.686   |
| MinimumReturn | -52.6    |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010058026760816574
Validation loss = 0.0045602163299918175
Validation loss = 0.004463675897568464
Validation loss = 0.0031655621714890003
Validation loss = 0.0032444046810269356
Validation loss = 0.0031918862368911505
Validation loss = 0.004629282280802727
Validation loss = 0.0031980013009160757
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014057342894375324
Validation loss = 0.0034937411546707153
Validation loss = 0.0034919518511742353
Validation loss = 0.0032136898953467607
Validation loss = 0.0031394511461257935
Validation loss = 0.00661596329882741
Validation loss = 0.0029073122423142195
Validation loss = 0.0031786737963557243
Validation loss = 0.0032809353433549404
Validation loss = 0.0035996241495013237
Validation loss = 0.003273311536759138
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023783963173627853
Validation loss = 0.005551679525524378
Validation loss = 0.0034754574298858643
Validation loss = 0.004491591826081276
Validation loss = 0.00365662039257586
Validation loss = 0.0028521644417196512
Validation loss = 0.0026577701792120934
Validation loss = 0.005641555413603783
Validation loss = 0.0034401663579046726
Validation loss = 0.004414341878145933
Validation loss = 0.0027157387230545282
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008729003369808197
Validation loss = 0.0038856002502143383
Validation loss = 0.003439266700297594
Validation loss = 0.0036757236812263727
Validation loss = 0.0030488017946481705
Validation loss = 0.004062898922711611
Validation loss = 0.0031275060027837753
Validation loss = 0.0045762937515974045
Validation loss = 0.0035539306700229645
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010741474106907845
Validation loss = 0.005836010444909334
Validation loss = 0.003011617809534073
Validation loss = 0.0027815410867333412
Validation loss = 0.003773466218262911
Validation loss = 0.0030161638278514147
Validation loss = 0.00346113508567214
Validation loss = 0.003988559823483229
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.43    |
| Iteration     | 10       |
| MaximumReturn | -0.0409  |
| MinimumReturn | -58      |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006565137766301632
Validation loss = 0.0026051411405205727
Validation loss = 0.002705069025978446
Validation loss = 0.003013703040778637
Validation loss = 0.004856825340539217
Validation loss = 0.0031161203514784575
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011199600994586945
Validation loss = 0.0028521150816231966
Validation loss = 0.0032744898926466703
Validation loss = 0.00310185132548213
Validation loss = 0.0024649477563798428
Validation loss = 0.0025688677560538054
Validation loss = 0.003259671851992607
Validation loss = 0.0026435325853526592
Validation loss = 0.0032715119887143373
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0071404725313186646
Validation loss = 0.0025444291532039642
Validation loss = 0.0032730791717767715
Validation loss = 0.0022323760204017162
Validation loss = 0.005516230594366789
Validation loss = 0.0030100583098828793
Validation loss = 0.0022800914011895657
Validation loss = 0.003004902508109808
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010196822695434093
Validation loss = 0.0032826694659888744
Validation loss = 0.0030417493544518948
Validation loss = 0.002698152093216777
Validation loss = 0.002495244611054659
Validation loss = 0.0033252735156565905
Validation loss = 0.0030974873807281256
Validation loss = 0.0034366294275969267
Validation loss = 0.003922130446881056
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006317172199487686
Validation loss = 0.003318235743790865
Validation loss = 0.004600139334797859
Validation loss = 0.0025806729681789875
Validation loss = 0.0025304662995040417
Validation loss = 0.003312390996143222
Validation loss = 0.0028940418269485235
Validation loss = 0.0025546366814523935
Validation loss = 0.0030185040086507797
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0181   |
| Iteration     | 11        |
| MaximumReturn | -0.000957 |
| MinimumReturn | -0.384    |
| TotalSamples  | 21658     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004444823134690523
Validation loss = 0.002838973654434085
Validation loss = 0.004316210746765137
Validation loss = 0.002885561902076006
Validation loss = 0.0028266343288123608
Validation loss = 0.0044350107200443745
Validation loss = 0.0029629243072122335
Validation loss = 0.0024459301494061947
Validation loss = 0.003811808302998543
Validation loss = 0.005079983733594418
Validation loss = 0.005035310983657837
Validation loss = 0.003360551316291094
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00883392058312893
Validation loss = 0.0039781564846634865
Validation loss = 0.0038386820815503597
Validation loss = 0.002572676632553339
Validation loss = 0.0030741337686777115
Validation loss = 0.002492599654942751
Validation loss = 0.0024871269706636667
Validation loss = 0.005923992022871971
Validation loss = 0.002871376695111394
Validation loss = 0.0035143308341503143
Validation loss = 0.003197376849129796
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004113151226192713
Validation loss = 0.0028740023262798786
Validation loss = 0.003181752283126116
Validation loss = 0.005853400565683842
Validation loss = 0.0033076382242143154
Validation loss = 0.0019803298637270927
Validation loss = 0.002922462997958064
Validation loss = 0.0028061815537512302
Validation loss = 0.0035779322497546673
Validation loss = 0.0022048931568861008
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005645190365612507
Validation loss = 0.005500212777405977
Validation loss = 0.004131167195737362
Validation loss = 0.004394214134663343
Validation loss = 0.005021263845264912
Validation loss = 0.003466057823970914
Validation loss = 0.0031284845899790525
Validation loss = 0.004801525734364986
Validation loss = 0.0041701700538396835
Validation loss = 0.0029407120309770107
Validation loss = 0.006260578986257315
Validation loss = 0.004331432282924652
Validation loss = 0.003249270375818014
Validation loss = 0.0025888984091579914
Validation loss = 0.003157247556373477
Validation loss = 0.006154896225780249
Validation loss = 0.0033503002487123013
Validation loss = 0.002338535850867629
Validation loss = 0.0034437868744134903
Validation loss = 0.004076779820024967
Validation loss = 0.005397486500442028
Validation loss = 0.0029014430474489927
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012381654232740402
Validation loss = 0.004656297154724598
Validation loss = 0.0027795485220849514
Validation loss = 0.004274784587323666
Validation loss = 0.00449447613209486
Validation loss = 0.0042122965678572655
Validation loss = 0.00447703106328845
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.668   |
| Iteration     | 12       |
| MaximumReturn | -0.00125 |
| MinimumReturn | -15.5    |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004979074466973543
Validation loss = 0.0024790570605546236
Validation loss = 0.0023272691760212183
Validation loss = 0.002980961464345455
Validation loss = 0.0027211590204387903
Validation loss = 0.002524067647755146
Validation loss = 0.0027333723846822977
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009218873456120491
Validation loss = 0.0027067805640399456
Validation loss = 0.002913612639531493
Validation loss = 0.0023909227456897497
Validation loss = 0.002148628933355212
Validation loss = 0.0027266351971775293
Validation loss = 0.0020371621940284967
Validation loss = 0.0026125581935048103
Validation loss = 0.004546779673546553
Validation loss = 0.0022485724184662104
Validation loss = 0.003962681628763676
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005640659015625715
Validation loss = 0.002268526004627347
Validation loss = 0.0037253794725984335
Validation loss = 0.003108683042228222
Validation loss = 0.0019742813892662525
Validation loss = 0.004552287515252829
Validation loss = 0.0043265060521662235
Validation loss = 0.004281128291040659
Validation loss = 0.0027626589871942997
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003546586027368903
Validation loss = 0.002614950994029641
Validation loss = 0.002820322522893548
Validation loss = 0.0027712020091712475
Validation loss = 0.0019859534222632647
Validation loss = 0.003625469747930765
Validation loss = 0.00198721862398088
Validation loss = 0.003982901573181152
Validation loss = 0.0033065699972212315
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005731835961341858
Validation loss = 0.004008544143289328
Validation loss = 0.002341253450140357
Validation loss = 0.003295040922239423
Validation loss = 0.0022137383930385113
Validation loss = 0.003331032581627369
Validation loss = 0.0037985064554959536
Validation loss = 0.00297458004206419
Validation loss = 0.00392182869836688
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -56.1    |
| Iteration     | 13       |
| MaximumReturn | -0.223   |
| MinimumReturn | -96.6    |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005529371555894613
Validation loss = 0.002503713360056281
Validation loss = 0.0018093069083988667
Validation loss = 0.0018745047273114324
Validation loss = 0.0018673869781196117
Validation loss = 0.00179387629032135
Validation loss = 0.001826456398703158
Validation loss = 0.0020765140652656555
Validation loss = 0.002128282794728875
Validation loss = 0.0018067654455080628
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0052952575497329235
Validation loss = 0.002250781049951911
Validation loss = 0.0014581014402210712
Validation loss = 0.001816138974390924
Validation loss = 0.0018583722412586212
Validation loss = 0.001685426919721067
Validation loss = 0.002263053320348263
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005426373332738876
Validation loss = 0.001670900615863502
Validation loss = 0.0015051487134769559
Validation loss = 0.002474389737471938
Validation loss = 0.0019206932047381997
Validation loss = 0.0024734896142035723
Validation loss = 0.0022809288930147886
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007389850914478302
Validation loss = 0.002740337746217847
Validation loss = 0.0020119587425142527
Validation loss = 0.0015804521972313523
Validation loss = 0.002975560026243329
Validation loss = 0.002082585822790861
Validation loss = 0.0038998315576463938
Validation loss = 0.0028367082122713327
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012561826966702938
Validation loss = 0.003853489877656102
Validation loss = 0.002284456044435501
Validation loss = 0.002138960873708129
Validation loss = 0.002127612242475152
Validation loss = 0.0025266611482948065
Validation loss = 0.0017836656188592315
Validation loss = 0.0040923841297626495
Validation loss = 0.0018400239059701562
Validation loss = 0.0016270913183689117
Validation loss = 0.003137847175821662
Validation loss = 0.0023264565970748663
Validation loss = 0.002113598631694913
Validation loss = 0.002334585878998041
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -15.1    |
| Iteration     | 14       |
| MaximumReturn | -0.0293  |
| MinimumReturn | -76.5    |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005258291494101286
Validation loss = 0.001825271057896316
Validation loss = 0.0019150443840771914
Validation loss = 0.0014011859893798828
Validation loss = 0.0019247711170464754
Validation loss = 0.0031250938773155212
Validation loss = 0.0037567182444036007
Validation loss = 0.0016408207593485713
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01166840735822916
Validation loss = 0.003192074364051223
Validation loss = 0.0023003907408565283
Validation loss = 0.0024957656860351562
Validation loss = 0.0018656665924936533
Validation loss = 0.0013519634958356619
Validation loss = 0.001359872636385262
Validation loss = 0.0018255624454468489
Validation loss = 0.0019485936500132084
Validation loss = 0.0017993650399148464
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00581526942551136
Validation loss = 0.001788727706298232
Validation loss = 0.0014700471656396985
Validation loss = 0.0020158018451184034
Validation loss = 0.0016392827965319157
Validation loss = 0.001906857592985034
Validation loss = 0.002353725954890251
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0038183131255209446
Validation loss = 0.0015672326553612947
Validation loss = 0.0033541866578161716
Validation loss = 0.0016423818888142705
Validation loss = 0.0016874410212039948
Validation loss = 0.0014222098980098963
Validation loss = 0.0022839431185275316
Validation loss = 0.0021864818409085274
Validation loss = 0.0034715032670646906
Validation loss = 0.0018146880902349949
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009202883578836918
Validation loss = 0.002129723783582449
Validation loss = 0.001875873189419508
Validation loss = 0.003166273469105363
Validation loss = 0.0022268330212682486
Validation loss = 0.0020910347811877728
Validation loss = 0.001815277268178761
Validation loss = 0.0022771446965634823
Validation loss = 0.0029845668468624353
Validation loss = 0.0014753264840692282
Validation loss = 0.0014788266271352768
Validation loss = 0.002979233395308256
Validation loss = 0.0020003560930490494
Validation loss = 0.003936904016882181
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.7     |
| Iteration     | 15       |
| MaximumReturn | -0.00207 |
| MinimumReturn | -46.2    |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0030444860458374023
Validation loss = 0.002105377148836851
Validation loss = 0.0027399964164942503
Validation loss = 0.003548635868355632
Validation loss = 0.0016433362616226077
Validation loss = 0.0018688760465011
Validation loss = 0.002072618342936039
Validation loss = 0.0015565973008051515
Validation loss = 0.0022881380282342434
Validation loss = 0.002162244636565447
Validation loss = 0.0013872964773327112
Validation loss = 0.001491246628575027
Validation loss = 0.002639856655150652
Validation loss = 0.002363121137022972
Validation loss = 0.001660577254369855
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0026868446730077267
Validation loss = 0.002966203959658742
Validation loss = 0.002238439628854394
Validation loss = 0.0013340336736291647
Validation loss = 0.0015087632928043604
Validation loss = 0.0018819934921339154
Validation loss = 0.001942977076396346
Validation loss = 0.0018899660790339112
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006563584785908461
Validation loss = 0.0016730620991438627
Validation loss = 0.002007502131164074
Validation loss = 0.00127596955280751
Validation loss = 0.002746573416516185
Validation loss = 0.0014904516283422709
Validation loss = 0.0016333602834492922
Validation loss = 0.0016799001023173332
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006481859367340803
Validation loss = 0.0014789708657190204
Validation loss = 0.0018657802138477564
Validation loss = 0.0021543139591813087
Validation loss = 0.002164084929972887
Validation loss = 0.002977808704599738
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005226781126111746
Validation loss = 0.0017675564158707857
Validation loss = 0.0014571830397471786
Validation loss = 0.0026038901414722204
Validation loss = 0.0015588253736495972
Validation loss = 0.0017994206864386797
Validation loss = 0.003973017912358046
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -36.6    |
| Iteration     | 16       |
| MaximumReturn | -0.0366  |
| MinimumReturn | -100     |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002885884139686823
Validation loss = 0.002296722959727049
Validation loss = 0.002798577304929495
Validation loss = 0.001365141593851149
Validation loss = 0.0016478764591738582
Validation loss = 0.002425426384434104
Validation loss = 0.002195953391492367
Validation loss = 0.0015139696188271046
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0021357552614063025
Validation loss = 0.002472377847880125
Validation loss = 0.003071677638217807
Validation loss = 0.001565346261486411
Validation loss = 0.0019995325710624456
Validation loss = 0.0022034882567822933
Validation loss = 0.0015660527860745788
Validation loss = 0.00213146791793406
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015787986340001225
Validation loss = 0.0017621994484215975
Validation loss = 0.0027522339951246977
Validation loss = 0.0022447884548455477
Validation loss = 0.0019256207160651684
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002657967619597912
Validation loss = 0.0013600070960819721
Validation loss = 0.0015880066202953458
Validation loss = 0.00557319913059473
Validation loss = 0.0013432590058073401
Validation loss = 0.001373992650769651
Validation loss = 0.001755259814672172
Validation loss = 0.0020329540129750967
Validation loss = 0.0011735368752852082
Validation loss = 0.0013880463084205985
Validation loss = 0.0015059413854032755
Validation loss = 0.0018082914175465703
Validation loss = 0.0022968559060245752
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0022729779593646526
Validation loss = 0.0017319655744358897
Validation loss = 0.0013513751327991486
Validation loss = 0.0014020906528458
Validation loss = 0.0025285512674599886
Validation loss = 0.0013577205827459693
Validation loss = 0.0014174168463796377
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -7.36     |
| Iteration     | 17        |
| MaximumReturn | -0.000571 |
| MinimumReturn | -94.4     |
| TotalSamples  | 31654     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002974536269903183
Validation loss = 0.0013976189075037837
Validation loss = 0.0020746481604874134
Validation loss = 0.0031039456371217966
Validation loss = 0.0014589396305382252
Validation loss = 0.0023153487127274275
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004064579494297504
Validation loss = 0.0023293939884752035
Validation loss = 0.002275817096233368
Validation loss = 0.0014337528264150023
Validation loss = 0.0014161992585286498
Validation loss = 0.001326976460404694
Validation loss = 0.001114759361371398
Validation loss = 0.0014343991642817855
Validation loss = 0.001764400047250092
Validation loss = 0.001654934138059616
Validation loss = 0.001730209682136774
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001888564438559115
Validation loss = 0.0027030420023947954
Validation loss = 0.0017957402160391212
Validation loss = 0.0019724788144230843
Validation loss = 0.0012939549051225185
Validation loss = 0.001534061273559928
Validation loss = 0.0023293939884752035
Validation loss = 0.004077133256942034
Validation loss = 0.0014126301975920796
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005420096218585968
Validation loss = 0.0018475967226549983
Validation loss = 0.001540432102046907
Validation loss = 0.003466924885287881
Validation loss = 0.0013303261948749423
Validation loss = 0.0018387645250186324
Validation loss = 0.002301528351381421
Validation loss = 0.004155183210968971
Validation loss = 0.003081841627135873
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002501538721844554
Validation loss = 0.002237630309537053
Validation loss = 0.0014663393376395106
Validation loss = 0.001597881899215281
Validation loss = 0.0013503071386367083
Validation loss = 0.0013200327521190047
Validation loss = 0.0013130113948136568
Validation loss = 0.0019897299353033304
Validation loss = 0.002440213691443205
Validation loss = 0.0027243210934102535
Validation loss = 0.0017082631820812821
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.1      |
| Iteration     | 18        |
| MaximumReturn | -0.000592 |
| MinimumReturn | -43.1     |
| TotalSamples  | 33320     |
-----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008001895621418953
Validation loss = 0.0014318965841084719
Validation loss = 0.0020344736985862255
Validation loss = 0.001770616858266294
Validation loss = 0.0016176035860553384
Validation loss = 0.0015454282984137535
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0023896812926977873
Validation loss = 0.0019020303152501583
Validation loss = 0.0023628617636859417
Validation loss = 0.002262758556753397
Validation loss = 0.0011987914331257343
Validation loss = 0.0018321210518479347
Validation loss = 0.002118223812431097
Validation loss = 0.0014851267915219069
Validation loss = 0.0022132049780339003
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003420830238610506
Validation loss = 0.0017164736054837704
Validation loss = 0.004806295037269592
Validation loss = 0.002858369145542383
Validation loss = 0.0014347918331623077
Validation loss = 0.0015508588403463364
Validation loss = 0.0026966482400894165
Validation loss = 0.0034082061611115932
Validation loss = 0.0012858295813202858
Validation loss = 0.0017626387998461723
Validation loss = 0.001258997479453683
Validation loss = 0.002415845636278391
Validation loss = 0.001244042068719864
Validation loss = 0.0014161865692585707
Validation loss = 0.001881029224023223
Validation loss = 0.0024904327001422644
Validation loss = 0.0013425687793642282
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0024184826761484146
Validation loss = 0.002830076264217496
Validation loss = 0.0015954194823279977
Validation loss = 0.001382149988785386
Validation loss = 0.0026742671616375446
Validation loss = 0.0015782341361045837
Validation loss = 0.0014000472147017717
Validation loss = 0.0021680798381567
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002902660518884659
Validation loss = 0.0018465667963027954
Validation loss = 0.001402551308274269
Validation loss = 0.0015021197032183409
Validation loss = 0.0024172943085432053
Validation loss = 0.0022316891700029373
Validation loss = 0.0021411990746855736
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -72.2    |
| Iteration     | 19       |
| MaximumReturn | -0.0769  |
| MinimumReturn | -104     |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011879072524607182
Validation loss = 0.0036488694604486227
Validation loss = 0.0025267978198826313
Validation loss = 0.00179890391882509
Validation loss = 0.0015342390397563577
Validation loss = 0.001310391933657229
Validation loss = 0.0012040503788739443
Validation loss = 0.0011439926456660032
Validation loss = 0.0014247111976146698
Validation loss = 0.00264578964561224
Validation loss = 0.002464951016008854
Validation loss = 0.002403212944045663
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013178061693906784
Validation loss = 0.0039238641038537025
Validation loss = 0.0032634586095809937
Validation loss = 0.0020105047151446342
Validation loss = 0.0026630607899278402
Validation loss = 0.001520823803730309
Validation loss = 0.0012913334649056196
Validation loss = 0.0013692269567400217
Validation loss = 0.0013148338766768575
Validation loss = 0.0023043323308229446
Validation loss = 0.0014030314050614834
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01628565974533558
Validation loss = 0.003883274272084236
Validation loss = 0.002365381456911564
Validation loss = 0.002010197378695011
Validation loss = 0.002182898810133338
Validation loss = 0.0013226968003436923
Validation loss = 0.001759835286065936
Validation loss = 0.0017747567035257816
Validation loss = 0.0013866074150428176
Validation loss = 0.0020437741186469793
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010715925134718418
Validation loss = 0.0031214954797178507
Validation loss = 0.0019858423620462418
Validation loss = 0.0014157950645312667
Validation loss = 0.0014946612063795328
Validation loss = 0.0012401376152411103
Validation loss = 0.0014280832838267088
Validation loss = 0.0012093939585611224
Validation loss = 0.0015906923217698932
Validation loss = 0.0016047003446146846
Validation loss = 0.0013307600747793913
Validation loss = 0.0015493333339691162
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011747044511139393
Validation loss = 0.003837311640381813
Validation loss = 0.0021711403969675303
Validation loss = 0.0034112483263015747
Validation loss = 0.0024932068772614002
Validation loss = 0.001857919036410749
Validation loss = 0.0015283615794032812
Validation loss = 0.0012243816163390875
Validation loss = 0.0017561039421707392
Validation loss = 0.0020891844760626554
Validation loss = 0.0016138437204062939
Validation loss = 0.001699203159660101
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -9.76    |
| Iteration     | 20       |
| MaximumReturn | -0.00062 |
| MinimumReturn | -84.7    |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002558839274570346
Validation loss = 0.0018548959633335471
Validation loss = 0.0015366983134299517
Validation loss = 0.001570256194099784
Validation loss = 0.0016624064883217216
Validation loss = 0.0013275300152599812
Validation loss = 0.002263291273266077
Validation loss = 0.0030768485739827156
Validation loss = 0.0014347615651786327
Validation loss = 0.0013657809467986226
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014415835030376911
Validation loss = 0.0014097560197114944
Validation loss = 0.0016391766257584095
Validation loss = 0.0011245609493926167
Validation loss = 0.0018755093915387988
Validation loss = 0.0023220868315547705
Validation loss = 0.002411040011793375
Validation loss = 0.0027615143917500973
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015179249458014965
Validation loss = 0.0024227872490882874
Validation loss = 0.00117805739864707
Validation loss = 0.0013245189329609275
Validation loss = 0.002859709318727255
Validation loss = 0.0015736999921500683
Validation loss = 0.0016121133230626583
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002393905306234956
Validation loss = 0.0016409417148679495
Validation loss = 0.0012453777017071843
Validation loss = 0.0012037769192829728
Validation loss = 0.001215590164065361
Validation loss = 0.001136891427449882
Validation loss = 0.0011383186792954803
Validation loss = 0.0017993191722780466
Validation loss = 0.0010638799285516143
Validation loss = 0.0025051822885870934
Validation loss = 0.0015559925232082605
Validation loss = 0.0018837903626263142
Validation loss = 0.001331852632574737
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016128438292071223
Validation loss = 0.001420026645064354
Validation loss = 0.0012895312393084168
Validation loss = 0.0015313905896618962
Validation loss = 0.001682956237345934
Validation loss = 0.0010839608730748296
Validation loss = 0.001213579555042088
Validation loss = 0.0023737801238894463
Validation loss = 0.0011437935754656792
Validation loss = 0.0011615422554314137
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.86    |
| Iteration     | 21       |
| MaximumReturn | -0.00094 |
| MinimumReturn | -48.5    |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002213060623034835
Validation loss = 0.0015872298972681165
Validation loss = 0.004851481411606073
Validation loss = 0.001639936468563974
Validation loss = 0.0011339879129081964
Validation loss = 0.0015711948508396745
Validation loss = 0.0017590532079339027
Validation loss = 0.0033283038064837456
Validation loss = 0.0010860180482268333
Validation loss = 0.0017662481404840946
Validation loss = 0.001671873265877366
Validation loss = 0.0019750429783016443
Validation loss = 0.001037156325764954
Validation loss = 0.0013603167608380318
Validation loss = 0.002359684556722641
Validation loss = 0.0011472163023427129
Validation loss = 0.001272098976187408
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0019987723790109158
Validation loss = 0.0011926477309316397
Validation loss = 0.0012899992289021611
Validation loss = 0.0011658086441457272
Validation loss = 0.0019473398569971323
Validation loss = 0.0016359974397346377
Validation loss = 0.0012134857242926955
Validation loss = 0.0012832843931391835
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021536683198064566
Validation loss = 0.001389266923069954
Validation loss = 0.0013975879410281777
Validation loss = 0.0011782150249928236
Validation loss = 0.00117871246766299
Validation loss = 0.001777993398718536
Validation loss = 0.00131873763166368
Validation loss = 0.0023748893290758133
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002220060443505645
Validation loss = 0.001738626859150827
Validation loss = 0.001117021543905139
Validation loss = 0.0013500897912308574
Validation loss = 0.0015483328606933355
Validation loss = 0.0013451260747388005
Validation loss = 0.00103092473000288
Validation loss = 0.001185595290735364
Validation loss = 0.0012265729019418359
Validation loss = 0.0014177458360791206
Validation loss = 0.001451702555641532
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002376893302425742
Validation loss = 0.0012721464736387134
Validation loss = 0.0025075136218219995
Validation loss = 0.0013355041155591607
Validation loss = 0.002265619346871972
Validation loss = 0.0017893334152176976
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -18.9    |
| Iteration     | 22       |
| MaximumReturn | -0.00173 |
| MinimumReturn | -65.6    |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003971635363996029
Validation loss = 0.0009569080430082977
Validation loss = 0.0015204991213977337
Validation loss = 0.0010261882562190294
Validation loss = 0.0013714005472138524
Validation loss = 0.002094503026455641
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003218598198145628
Validation loss = 0.0014803650556132197
Validation loss = 0.002184272976592183
Validation loss = 0.0012618096079677343
Validation loss = 0.001183799933642149
Validation loss = 0.0015171607956290245
Validation loss = 0.0011982509167864919
Validation loss = 0.0009634211892262101
Validation loss = 0.0021784876007586718
Validation loss = 0.0010595842031762004
Validation loss = 0.0011696114670485258
Validation loss = 0.0011713003041222692
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017205991316586733
Validation loss = 0.0017647964414209127
Validation loss = 0.001777899800799787
Validation loss = 0.0013045244850218296
Validation loss = 0.0009869236964732409
Validation loss = 0.0013308359775692225
Validation loss = 0.001608220860362053
Validation loss = 0.003772258758544922
Validation loss = 0.0014415160985663533
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002982854377478361
Validation loss = 0.0016214472707360983
Validation loss = 0.0011816986370831728
Validation loss = 0.0017000701045617461
Validation loss = 0.0011001208331435919
Validation loss = 0.0017728430684655905
Validation loss = 0.001250061672180891
Validation loss = 0.0012792638735845685
Validation loss = 0.0015740633243694901
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00495570432394743
Validation loss = 0.0019165599951520562
Validation loss = 0.001057030400261283
Validation loss = 0.002020737621933222
Validation loss = 0.0011242583859711885
Validation loss = 0.0014327879762277007
Validation loss = 0.0018306139390915632
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00491  |
| Iteration     | 23        |
| MaximumReturn | -0.000856 |
| MinimumReturn | -0.0169   |
| TotalSamples  | 41650     |
-----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002065707463771105
Validation loss = 0.001202810206450522
Validation loss = 0.0025727436877787113
Validation loss = 0.001356215332634747
Validation loss = 0.001046349061653018
Validation loss = 0.0015019132988527417
Validation loss = 0.0015195943415164948
Validation loss = 0.003285537241026759
Validation loss = 0.0015189333353191614
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00416935607790947
Validation loss = 0.0013468165416270494
Validation loss = 0.0011603615712374449
Validation loss = 0.0015730023151263595
Validation loss = 0.0011426475830376148
Validation loss = 0.0023780835326761007
Validation loss = 0.0012685602996498346
Validation loss = 0.0016326650511473417
Validation loss = 0.0017226797062903643
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002819410990923643
Validation loss = 0.0014531384222209454
Validation loss = 0.0018050484359264374
Validation loss = 0.0013493483420461416
Validation loss = 0.0015163602074608207
Validation loss = 0.0014706500805914402
Validation loss = 0.0009516783175058663
Validation loss = 0.0016060627531260252
Validation loss = 0.0028304881416261196
Validation loss = 0.001639597350731492
Validation loss = 0.0008981376886367798
Validation loss = 0.002195235574617982
Validation loss = 0.0038306706119328737
Validation loss = 0.001017291913740337
Validation loss = 0.0016585057601332664
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017361210193485022
Validation loss = 0.0016804278129711747
Validation loss = 0.0019455347210168839
Validation loss = 0.0014879294903948903
Validation loss = 0.001117146573960781
Validation loss = 0.001362451002933085
Validation loss = 0.0009383679134771228
Validation loss = 0.013076980598270893
Validation loss = 0.0013699010014533997
Validation loss = 0.0011783152585849166
Validation loss = 0.0013364226324483752
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002471319865435362
Validation loss = 0.001384318107739091
Validation loss = 0.0009187961695715785
Validation loss = 0.0013089932035654783
Validation loss = 0.0014779984485358
Validation loss = 0.0015838949475437403
Validation loss = 0.0029680903535336256
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00663  |
| Iteration     | 24        |
| MaximumReturn | -0.000791 |
| MinimumReturn | -0.0145   |
| TotalSamples  | 43316     |
-----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001735896454192698
Validation loss = 0.0011197731364518404
Validation loss = 0.0009782303823158145
Validation loss = 0.0009495898266322911
Validation loss = 0.0012106862850487232
Validation loss = 0.0013810924720019102
Validation loss = 0.0010150254238396883
Validation loss = 0.0011246453505009413
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018091702368110418
Validation loss = 0.001202173181809485
Validation loss = 0.0021043058950453997
Validation loss = 0.0014088056050240993
Validation loss = 0.0011858524521812797
Validation loss = 0.0009944865014404058
Validation loss = 0.0016124562826007605
Validation loss = 0.0013612809125334024
Validation loss = 0.002626067725941539
Validation loss = 0.0010424188803881407
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011998837580904365
Validation loss = 0.0009278704528696835
Validation loss = 0.0008457548101432621
Validation loss = 0.001079590292647481
Validation loss = 0.0013701262651011348
Validation loss = 0.0011425433913245797
Validation loss = 0.001256956486031413
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017439069924876094
Validation loss = 0.001494721625931561
Validation loss = 0.0013057549949735403
Validation loss = 0.0010496265022084117
Validation loss = 0.0011395219480618834
Validation loss = 0.0019901206251233816
Validation loss = 0.0011075129732489586
Validation loss = 0.0009126898366957903
Validation loss = 0.0015402579447254539
Validation loss = 0.0028702288400381804
Validation loss = 0.001753190066665411
Validation loss = 0.0015001039719209075
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0017933421768248081
Validation loss = 0.0010327458148822188
Validation loss = 0.0011470781173557043
Validation loss = 0.0022136878687888384
Validation loss = 0.0012354647042229772
Validation loss = 0.0011651444947347045
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00257  |
| Iteration     | 25        |
| MaximumReturn | -0.000657 |
| MinimumReturn | -0.00751  |
| TotalSamples  | 44982     |
-----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021404584404081106
Validation loss = 0.001089969533495605
Validation loss = 0.0012010064674541354
Validation loss = 0.0024028050247579813
Validation loss = 0.001123705762438476
Validation loss = 0.0013777270214632154
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011217707069590688
Validation loss = 0.00327433948405087
Validation loss = 0.0009468396892771125
Validation loss = 0.0013179336674511433
Validation loss = 0.0009576277225278318
Validation loss = 0.0011329679982736707
Validation loss = 0.0011917148949578404
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011957184178754687
Validation loss = 0.00218287599273026
Validation loss = 0.0014601656002923846
Validation loss = 0.0011694772401824594
Validation loss = 0.001309509971179068
Validation loss = 0.0011393370805308223
Validation loss = 0.0012830161722376943
Validation loss = 0.0016645179130136967
Validation loss = 0.001506483182311058
Validation loss = 0.0016763348830863833
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0026034326292574406
Validation loss = 0.0013055038871243596
Validation loss = 0.0010335324332118034
Validation loss = 0.0017280380707234144
Validation loss = 0.0014938681852072477
Validation loss = 0.001055569271557033
Validation loss = 0.0011898226803168654
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0019932687282562256
Validation loss = 0.001203453866764903
Validation loss = 0.0015886054607108235
Validation loss = 0.0012542681070044637
Validation loss = 0.0015210754936560988
Validation loss = 0.001080467482097447
Validation loss = 0.0010184829588979483
Validation loss = 0.001761484774760902
Validation loss = 0.0014081965200603008
Validation loss = 0.0010836375877261162
Validation loss = 0.001648999867029488
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.83    |
| Iteration     | 26       |
| MaximumReturn | -0.0727  |
| MinimumReturn | -20      |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012034015962854028
Validation loss = 0.000979732722043991
Validation loss = 0.0013702197466045618
Validation loss = 0.0013351502129808068
Validation loss = 0.0009298144723288715
Validation loss = 0.0011378020280972123
Validation loss = 0.0008776531904004514
Validation loss = 0.001267343177460134
Validation loss = 0.0010588476434350014
Validation loss = 0.001310247229412198
Validation loss = 0.0011034118942916393
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002250004792585969
Validation loss = 0.00100617459975183
Validation loss = 0.0018028259510174394
Validation loss = 0.0012159961042925715
Validation loss = 0.0010723666055127978
Validation loss = 0.0026330642867833376
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001304249046370387
Validation loss = 0.0019494402222335339
Validation loss = 0.0013611390022560954
Validation loss = 0.0013156255008652806
Validation loss = 0.0008817393099889159
Validation loss = 0.0006785006844438612
Validation loss = 0.0007560345693491399
Validation loss = 0.002032486256211996
Validation loss = 0.0007419641478918493
Validation loss = 0.0011434266343712807
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014710983959957957
Validation loss = 0.000803031784016639
Validation loss = 0.001638412824831903
Validation loss = 0.0007605351274833083
Validation loss = 0.0011330075794830918
Validation loss = 0.0010919130872935057
Validation loss = 0.0008584997849538922
Validation loss = 0.0015403939178213477
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0020162523724138737
Validation loss = 0.0011161130387336016
Validation loss = 0.0009686592384241521
Validation loss = 0.003081005299463868
Validation loss = 0.0015770109603181481
Validation loss = 0.0009223150555044413
Validation loss = 0.0009237006306648254
Validation loss = 0.0029091129545122385
Validation loss = 0.0015163596253842115
Validation loss = 0.0012800005497410893
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.2    |
| Iteration     | 27       |
| MaximumReturn | -0.0794  |
| MinimumReturn | -73.9    |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0037910297978669405
Validation loss = 0.0015752551844343543
Validation loss = 0.0014052767073735595
Validation loss = 0.00125194585416466
Validation loss = 0.0012372297933325171
Validation loss = 0.0011639442527666688
Validation loss = 0.0009271656163036823
Validation loss = 0.0014277760637924075
Validation loss = 0.0011827450944110751
Validation loss = 0.00144281389657408
Validation loss = 0.0009433483355678618
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002161836251616478
Validation loss = 0.0011702897027134895
Validation loss = 0.001197674311697483
Validation loss = 0.00088521494762972
Validation loss = 0.0012596430024132133
Validation loss = 0.000984703772701323
Validation loss = 0.0012313079787418246
Validation loss = 0.0010842726333066821
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0031219006050378084
Validation loss = 0.0017678812146186829
Validation loss = 0.001363007933832705
Validation loss = 0.0011826502159237862
Validation loss = 0.0009116145665757358
Validation loss = 0.0010554202599450946
Validation loss = 0.0016601462848484516
Validation loss = 0.000815133098512888
Validation loss = 0.0007575224153697491
Validation loss = 0.00135336525272578
Validation loss = 0.0023075612261891365
Validation loss = 0.001000197953544557
Validation loss = 0.001286700600758195
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010123945539817214
Validation loss = 0.0011207268107682467
Validation loss = 0.0012074746191501617
Validation loss = 0.0010972003219649196
Validation loss = 0.0017038873629644513
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011782089713960886
Validation loss = 0.0011331335408613086
Validation loss = 0.001282492303289473
Validation loss = 0.0009110674145631492
Validation loss = 0.001285759499296546
Validation loss = 0.001537100994028151
Validation loss = 0.00129032414406538
Validation loss = 0.0009372326894663274
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.87    |
| Iteration     | 28       |
| MaximumReturn | -0.0454  |
| MinimumReturn | -37.7    |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016649161698296666
Validation loss = 0.0011270403629168868
Validation loss = 0.0008909250609576702
Validation loss = 0.001018132665194571
Validation loss = 0.0012652748264372349
Validation loss = 0.0009872951777651906
Validation loss = 0.001814494258724153
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015893875388428569
Validation loss = 0.0008868463919498026
Validation loss = 0.0011558477999642491
Validation loss = 0.0010380198946222663
Validation loss = 0.0009552616393193603
Validation loss = 0.00406901678070426
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00087994389468804
Validation loss = 0.0008071644697338343
Validation loss = 0.0015000569401308894
Validation loss = 0.0011578722624108195
Validation loss = 0.0009686819394119084
Validation loss = 0.0007444143411703408
Validation loss = 0.0011903051054105163
Validation loss = 0.0008521716808900237
Validation loss = 0.0012322982074692845
Validation loss = 0.0007189379539340734
Validation loss = 0.0010902669746428728
Validation loss = 0.000675131450407207
Validation loss = 0.0009919509757310152
Validation loss = 0.0014389801071956754
Validation loss = 0.0013479398330673575
Validation loss = 0.0010834132554009557
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0028593128081411123
Validation loss = 0.000981479766778648
Validation loss = 0.0007426522788591683
Validation loss = 0.0012909844517707825
Validation loss = 0.0006689975271001458
Validation loss = 0.0023276477586477995
Validation loss = 0.0007711156504228711
Validation loss = 0.001828277949243784
Validation loss = 0.0009879311546683311
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00109495606739074
Validation loss = 0.0011070359032601118
Validation loss = 0.0011350364657118917
Validation loss = 0.0008824266260489821
Validation loss = 0.0021441038697957993
Validation loss = 0.0025202129036188126
Validation loss = 0.0008132950752042234
Validation loss = 0.001423028064891696
Validation loss = 0.00100921920966357
Validation loss = 0.0015042193699628115
Validation loss = 0.0015476122498512268
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -34.8    |
| Iteration     | 29       |
| MaximumReturn | -0.0718  |
| MinimumReturn | -67.1    |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011530802585184574
Validation loss = 0.0015375894727185369
Validation loss = 0.000895556528121233
Validation loss = 0.0008377262274734676
Validation loss = 0.0012344990391284227
Validation loss = 0.00099458871409297
Validation loss = 0.0011338075855746865
Validation loss = 0.0007795272977091372
Validation loss = 0.0008057028753682971
Validation loss = 0.0018505926709622145
Validation loss = 0.0011117336107417941
Validation loss = 0.0010153891053050756
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013821171596646309
Validation loss = 0.002179454080760479
Validation loss = 0.000824973511043936
Validation loss = 0.0009968471713364124
Validation loss = 0.0014819784555584192
Validation loss = 0.0016240125987678766
Validation loss = 0.0007736494881100953
Validation loss = 0.0016297731781378388
Validation loss = 0.0008480277610942721
Validation loss = 0.001034639892168343
Validation loss = 0.0009586569503881037
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00159441027790308
Validation loss = 0.000726566999219358
Validation loss = 0.0009330164175480604
Validation loss = 0.0009965900098904967
Validation loss = 0.0009406409226357937
Validation loss = 0.0015029937494546175
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0032139820978045464
Validation loss = 0.0011787916300818324
Validation loss = 0.000679263670463115
Validation loss = 0.001082492177374661
Validation loss = 0.001018875278532505
Validation loss = 0.0008907817536965013
Validation loss = 0.0011227884097024798
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010119135258719325
Validation loss = 0.00068747962359339
Validation loss = 0.0008990056230686605
Validation loss = 0.0005923524731770158
Validation loss = 0.0011257365113124251
Validation loss = 0.001628801808692515
Validation loss = 0.001293095527216792
Validation loss = 0.0007646236335858703
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00339  |
| Iteration     | 30        |
| MaximumReturn | -0.000605 |
| MinimumReturn | -0.0173   |
| TotalSamples  | 53312     |
-----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0023550675250589848
Validation loss = 0.0010284669697284698
Validation loss = 0.000818262284155935
Validation loss = 0.0020952976774424314
Validation loss = 0.0015811861958354712
Validation loss = 0.0015492496313527226
Validation loss = 0.0008717569289728999
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0030466518364846706
Validation loss = 0.0012500135926529765
Validation loss = 0.0009555462747812271
Validation loss = 0.0007649235194548965
Validation loss = 0.0010921197244897485
Validation loss = 0.0021361883264034986
Validation loss = 0.0011038297088816762
Validation loss = 0.0008063508430495858
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016368150245398283
Validation loss = 0.0014416711637750268
Validation loss = 0.0007155657513067126
Validation loss = 0.0009957542642951012
Validation loss = 0.0008780730422586203
Validation loss = 0.001190293813124299
Validation loss = 0.0009633666486479342
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011280867038294673
Validation loss = 0.0010629177559167147
Validation loss = 0.0008661583415232599
Validation loss = 0.0008666860521771014
Validation loss = 0.000939054531045258
Validation loss = 0.0008966663153842092
Validation loss = 0.0010063308291137218
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015092930989339948
Validation loss = 0.0008707818924449384
Validation loss = 0.0015662709483876824
Validation loss = 0.0011350498534739017
Validation loss = 0.001561922486871481
Validation loss = 0.001155383069999516
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0162  |
| Iteration     | 31       |
| MaximumReturn | -0.0007  |
| MinimumReturn | -0.11    |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0027554514817893505
Validation loss = 0.0017603484448045492
Validation loss = 0.00087959278607741
Validation loss = 0.0015089246444404125
Validation loss = 0.0009424600284546614
Validation loss = 0.0012413226068019867
Validation loss = 0.0008132032817229629
Validation loss = 0.001018896815367043
Validation loss = 0.003235087962821126
Validation loss = 0.0008552910294383764
Validation loss = 0.0007779989973641932
Validation loss = 0.000858462299220264
Validation loss = 0.0027886719908565283
Validation loss = 0.001219912082888186
Validation loss = 0.0011434624902904034
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017260395688936114
Validation loss = 0.0010028231190517545
Validation loss = 0.0017235885607078671
Validation loss = 0.0009524511988274753
Validation loss = 0.001210520975291729
Validation loss = 0.0016965506365522742
Validation loss = 0.0016247652238234878
Validation loss = 0.0008617247804068029
Validation loss = 0.0009595082374289632
Validation loss = 0.0008704953361302614
Validation loss = 0.0006762701668776572
Validation loss = 0.0013554280158132315
Validation loss = 0.001239633304066956
Validation loss = 0.0010439635952934623
Validation loss = 0.0006775829242542386
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010922705987468362
Validation loss = 0.0006140233599580824
Validation loss = 0.0013561242958530784
Validation loss = 0.0009420730057172477
Validation loss = 0.0007894366281107068
Validation loss = 0.0016698059625923634
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008447428699582815
Validation loss = 0.0026200381107628345
Validation loss = 0.0007304560858756304
Validation loss = 0.001477951998822391
Validation loss = 0.000883080589119345
Validation loss = 0.0010284383315593004
Validation loss = 0.0010388506343588233
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010971539886668324
Validation loss = 0.0009525739587843418
Validation loss = 0.0009035735274665058
Validation loss = 0.0010234345681965351
Validation loss = 0.0013893498107790947
Validation loss = 0.0012119263410568237
Validation loss = 0.0009682110976427794
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -9.13    |
| Iteration     | 32       |
| MaximumReturn | -0.00106 |
| MinimumReturn | -52.4    |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0037253028713166714
Validation loss = 0.0008570458157919347
Validation loss = 0.0009367006132379174
Validation loss = 0.000953581475187093
Validation loss = 0.0007921820506453514
Validation loss = 0.0008311670972034335
Validation loss = 0.002272279467433691
Validation loss = 0.0011069747852161527
Validation loss = 0.0010614375350996852
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010875638108700514
Validation loss = 0.0012754300842061639
Validation loss = 0.0011477345833554864
Validation loss = 0.0008604463073424995
Validation loss = 0.0008734269067645073
Validation loss = 0.0009319016826339066
Validation loss = 0.001048596459440887
Validation loss = 0.0011493576457723975
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009951276006177068
Validation loss = 0.001299353432841599
Validation loss = 0.001503845676779747
Validation loss = 0.0012555515859276056
Validation loss = 0.0013374334666877985
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013215148355811834
Validation loss = 0.0013173333136364818
Validation loss = 0.0010006988886743784
Validation loss = 0.0005705659277737141
Validation loss = 0.002813443075865507
Validation loss = 0.0011335680028423667
Validation loss = 0.0015877691330388188
Validation loss = 0.0014053955674171448
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001113821635954082
Validation loss = 0.0024257712066173553
Validation loss = 0.0007470928248949349
Validation loss = 0.001886666752398014
Validation loss = 0.0008643832989037037
Validation loss = 0.0011954809306189418
Validation loss = 0.0010291223879903555
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -6.87     |
| Iteration     | 33        |
| MaximumReturn | -0.000592 |
| MinimumReturn | -74.6     |
| TotalSamples  | 58310     |
-----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014101638225838542
Validation loss = 0.000939787772949785
Validation loss = 0.0010711055947467685
Validation loss = 0.0010248080361634493
Validation loss = 0.0008505761506967247
Validation loss = 0.0008252122788690031
Validation loss = 0.0008376054465770721
Validation loss = 0.0008407450513914227
Validation loss = 0.0010745776817202568
Validation loss = 0.0007066673133522272
Validation loss = 0.0009250490693375468
Validation loss = 0.0018961395835503936
Validation loss = 0.001131214085035026
Validation loss = 0.0017409960273653269
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013645114377140999
Validation loss = 0.0010520261712372303
Validation loss = 0.0015928171342238784
Validation loss = 0.0007470637792721391
Validation loss = 0.0021491709630936384
Validation loss = 0.0008324550581164658
Validation loss = 0.0007413700222969055
Validation loss = 0.0017041423125192523
Validation loss = 0.0007393410778604448
Validation loss = 0.0009653021697886288
Validation loss = 0.0009562174091115594
Validation loss = 0.0009940891759470105
Validation loss = 0.0008944000583142042
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017755938461050391
Validation loss = 0.000685199280269444
Validation loss = 0.0019100697245448828
Validation loss = 0.0008308296673931181
Validation loss = 0.0012907861964777112
Validation loss = 0.00106612010858953
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0022388030774891376
Validation loss = 0.0014000446535646915
Validation loss = 0.0005789780407212675
Validation loss = 0.0010657764505594969
Validation loss = 0.0007761488086543977
Validation loss = 0.0010866160737350583
Validation loss = 0.0013798587024211884
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011547539616003633
Validation loss = 0.0011166317854076624
Validation loss = 0.0020994222722947598
Validation loss = 0.001015408430248499
Validation loss = 0.0008073111530393362
Validation loss = 0.0011991484789177775
Validation loss = 0.0009177098982036114
Validation loss = 0.0009261303930543363
Validation loss = 0.0012136814184486866
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -43.6    |
| Iteration     | 34       |
| MaximumReturn | -0.00221 |
| MinimumReturn | -106     |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013439641334116459
Validation loss = 0.0008440898382104933
Validation loss = 0.0006837606779299676
Validation loss = 0.0007126406999304891
Validation loss = 0.0013135566841810942
Validation loss = 0.0010261483257636428
Validation loss = 0.0006556585431098938
Validation loss = 0.0009058724972419441
Validation loss = 0.000809736258815974
Validation loss = 0.0012182226637378335
Validation loss = 0.0011216668644919991
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015575885772705078
Validation loss = 0.0008324141381308436
Validation loss = 0.0011740339687094092
Validation loss = 0.0010503078810870647
Validation loss = 0.0012162349885329604
Validation loss = 0.0008375663310289383
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008448176085948944
Validation loss = 0.0007517990889027715
Validation loss = 0.0008710498223081231
Validation loss = 0.0006981449550949037
Validation loss = 0.0009403674048371613
Validation loss = 0.002407423686236143
Validation loss = 0.0008050434989854693
Validation loss = 0.001077611348591745
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009115915163420141
Validation loss = 0.0009997215820476413
Validation loss = 0.0007869954570196569
Validation loss = 0.0007788097718730569
Validation loss = 0.0007346972706727684
Validation loss = 0.0019489346304908395
Validation loss = 0.000992423971183598
Validation loss = 0.002122953301295638
Validation loss = 0.0010788734070956707
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012503680773079395
Validation loss = 0.0009607605752535164
Validation loss = 0.0014066131552681327
Validation loss = 0.0008852829923853278
Validation loss = 0.0007578367949463427
Validation loss = 0.0009497955907136202
Validation loss = 0.000771805236581713
Validation loss = 0.0007803188054822385
Validation loss = 0.0009891947265714407
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -61.2    |
| Iteration     | 35       |
| MaximumReturn | -0.0937  |
| MinimumReturn | -102     |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009713808074593544
Validation loss = 0.0018789025489240885
Validation loss = 0.0010781415039673448
Validation loss = 0.0012259812792763114
Validation loss = 0.0012605853844434023
Validation loss = 0.0011487809242680669
Validation loss = 0.0007891348795965314
Validation loss = 0.0010623729322105646
Validation loss = 0.0010433685965836048
Validation loss = 0.0009496823186054826
Validation loss = 0.000856641388963908
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004254649858921766
Validation loss = 0.002065007807686925
Validation loss = 0.001106114243157208
Validation loss = 0.0010907967807725072
Validation loss = 0.0007938478956930339
Validation loss = 0.0010632536141201854
Validation loss = 0.0008164083119481802
Validation loss = 0.0012690362054854631
Validation loss = 0.0008037730003707111
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007261906284838915
Validation loss = 0.0015977082075551152
Validation loss = 0.0011677355505526066
Validation loss = 0.001086393021978438
Validation loss = 0.0010478516342118382
Validation loss = 0.0008571362122893333
Validation loss = 0.0013316369149833918
Validation loss = 0.0012702025705948472
Validation loss = 0.0009003024897538126
Validation loss = 0.0008576080435886979
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0022654051426798105
Validation loss = 0.0010280978167429566
Validation loss = 0.0009981507901102304
Validation loss = 0.0008876865613274276
Validation loss = 0.0005923438002355397
Validation loss = 0.0006412413204088807
Validation loss = 0.0008366319234482944
Validation loss = 0.0007182458066381514
Validation loss = 0.0011691567488014698
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0040945191867649555
Validation loss = 0.002308661350980401
Validation loss = 0.0008506064768880606
Validation loss = 0.0009552736300975084
Validation loss = 0.0007492660661228001
Validation loss = 0.000760345661547035
Validation loss = 0.0008264087373390794
Validation loss = 0.0008298320462927222
Validation loss = 0.0006027118070051074
Validation loss = 0.0011752807768061757
Validation loss = 0.0006847453769296408
Validation loss = 0.00081114680506289
Validation loss = 0.0009567009983584285
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -73.2    |
| Iteration     | 36       |
| MaximumReturn | -0.598   |
| MinimumReturn | -101     |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0019769330974668264
Validation loss = 0.0009161219350062311
Validation loss = 0.0008072072523646057
Validation loss = 0.0008944955188781023
Validation loss = 0.0008786959224380553
Validation loss = 0.0010288195917382836
Validation loss = 0.0014587653568014503
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013437271118164062
Validation loss = 0.0008204554324038327
Validation loss = 0.0020923102274537086
Validation loss = 0.0008459276868961751
Validation loss = 0.0009329526801593602
Validation loss = 0.0008769475971348584
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0025296052917838097
Validation loss = 0.000942402461078018
Validation loss = 0.0009252838790416718
Validation loss = 0.0008182001183740795
Validation loss = 0.0007285142783075571
Validation loss = 0.0008198703872039914
Validation loss = 0.0011655846610665321
Validation loss = 0.0010652488563209772
Validation loss = 0.000751647399738431
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0019049197435379028
Validation loss = 0.0007565669366158545
Validation loss = 0.0006712623289786279
Validation loss = 0.0006605535745620728
Validation loss = 0.0006711656460538507
Validation loss = 0.0008396095363423228
Validation loss = 0.0008247037185356021
Validation loss = 0.0005394102772697806
Validation loss = 0.0008692324627190828
Validation loss = 0.0008154456154443324
Validation loss = 0.0005735852755606174
Validation loss = 0.0012713030446320772
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006441205274313688
Validation loss = 0.0012559123570099473
Validation loss = 0.0010024680523201823
Validation loss = 0.0008383816457353532
Validation loss = 0.0008303861832246184
Validation loss = 0.0009516490390524268
Validation loss = 0.0013083481462672353
Validation loss = 0.0008099412079900503
Validation loss = 0.0006699736113660038
Validation loss = 0.0009578773169778287
Validation loss = 0.000979050062596798
Validation loss = 0.0008651625248603523
Validation loss = 0.0008339881314896047
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -69.2    |
| Iteration     | 37       |
| MaximumReturn | -0.164   |
| MinimumReturn | -119     |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003342944663017988
Validation loss = 0.001090472680516541
Validation loss = 0.0011537342797964811
Validation loss = 0.0008101703715510666
Validation loss = 0.001129314536228776
Validation loss = 0.0008346908143721521
Validation loss = 0.0010865966323763132
Validation loss = 0.0008310511475428939
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008558687986806035
Validation loss = 0.0009260495426133275
Validation loss = 0.0006193653680384159
Validation loss = 0.0010480015771463513
Validation loss = 0.0007283466402441263
Validation loss = 0.001888869097456336
Validation loss = 0.0006679874495603144
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0039139618165791035
Validation loss = 0.0012028391938656569
Validation loss = 0.0009165219962596893
Validation loss = 0.0017586443573236465
Validation loss = 0.0008932859636843204
Validation loss = 0.0009386101737618446
Validation loss = 0.0010215477086603642
Validation loss = 0.0006705927662551403
Validation loss = 0.0009240889339707792
Validation loss = 0.001188474241644144
Validation loss = 0.0006283879047259688
Validation loss = 0.0007206141017377377
Validation loss = 0.0014853640459477901
Validation loss = 0.0008370099239982665
Validation loss = 0.001062762108631432
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0020806542597711086
Validation loss = 0.0008472059853374958
Validation loss = 0.0008472975459881127
Validation loss = 0.0007206869777292013
Validation loss = 0.0008476651273667812
Validation loss = 0.0009609493426978588
Validation loss = 0.0007699503330513835
Validation loss = 0.000914328615181148
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012173018185421824
Validation loss = 0.0006843957817181945
Validation loss = 0.0008993580704554915
Validation loss = 0.0006845712196081877
Validation loss = 0.0007205818546935916
Validation loss = 0.0008081194246187806
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -16.6     |
| Iteration     | 38        |
| MaximumReturn | -0.000619 |
| MinimumReturn | -130      |
| TotalSamples  | 66640     |
-----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010601505637168884
Validation loss = 0.0008819806389510632
Validation loss = 0.0008237215806730092
Validation loss = 0.0007934244349598885
Validation loss = 0.000805561663582921
Validation loss = 0.0017569228075444698
Validation loss = 0.000878811813890934
Validation loss = 0.000763535441365093
Validation loss = 0.0006429028580896556
Validation loss = 0.0007102926610969007
Validation loss = 0.0007434398285113275
Validation loss = 0.0010178153170272708
Validation loss = 0.000979451579041779
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009517656289972365
Validation loss = 0.0010646553710103035
Validation loss = 0.001274071866646409
Validation loss = 0.0007498981431126595
Validation loss = 0.0007385621429421008
Validation loss = 0.0005516435485333204
Validation loss = 0.0013482922222465277
Validation loss = 0.0006598833133466542
Validation loss = 0.0013312065275385976
Validation loss = 0.0008888466982170939
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007915669120848179
Validation loss = 0.0007054670131765306
Validation loss = 0.0014449508162215352
Validation loss = 0.00120528694242239
Validation loss = 0.0008925300207920372
Validation loss = 0.0010156985372304916
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005534204537980258
Validation loss = 0.0008590099751017988
Validation loss = 0.0013498113257810473
Validation loss = 0.000697416253387928
Validation loss = 0.0005413004546426237
Validation loss = 0.0012031374499201775
Validation loss = 0.0007988542784005404
Validation loss = 0.001277924282476306
Validation loss = 0.0010463603539392352
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000694046844728291
Validation loss = 0.0006492760148830712
Validation loss = 0.0012129036476835608
Validation loss = 0.002431177766993642
Validation loss = 0.0006342886481434107
Validation loss = 0.0007610946777276695
Validation loss = 0.000990243861451745
Validation loss = 0.0006777987582609057
Validation loss = 0.00124243157915771
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -6.28     |
| Iteration     | 39        |
| MaximumReturn | -0.000543 |
| MinimumReturn | -88.8     |
| TotalSamples  | 68306     |
-----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008666125941090286
Validation loss = 0.0011531367199495435
Validation loss = 0.0010503382654860616
Validation loss = 0.0008150090579874814
Validation loss = 0.000889020855538547
Validation loss = 0.0011545515153557062
Validation loss = 0.0008930462063290179
Validation loss = 0.0006959413876757026
Validation loss = 0.0008459310047328472
Validation loss = 0.0012418924598023295
Validation loss = 0.0008202777244150639
Validation loss = 0.0007046840037219226
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007543339161202312
Validation loss = 0.000747745216358453
Validation loss = 0.0011019115336239338
Validation loss = 0.0008042633417062461
Validation loss = 0.0009407870238646865
Validation loss = 0.0026450424920767546
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001362315728329122
Validation loss = 0.0008284413488581777
Validation loss = 0.0010038611944764853
Validation loss = 0.0015677318442612886
Validation loss = 0.001358482288196683
Validation loss = 0.0006712305475957692
Validation loss = 0.00097970652859658
Validation loss = 0.0007819950697012246
Validation loss = 0.0005454702768474817
Validation loss = 0.0006519532762467861
Validation loss = 0.0007752949022687972
Validation loss = 0.0008106061723083258
Validation loss = 0.0010495946044102311
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000884678156580776
Validation loss = 0.0006773681379854679
Validation loss = 0.0012290808372199535
Validation loss = 0.001263615326024592
Validation loss = 0.0007615798967890441
Validation loss = 0.0007127878488972783
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006736986106261611
Validation loss = 0.0012842246796935797
Validation loss = 0.0010216357186436653
Validation loss = 0.0008914650534279644
Validation loss = 0.0008100453997030854
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -6.45     |
| Iteration     | 40        |
| MaximumReturn | -0.000707 |
| MinimumReturn | -79.4     |
| TotalSamples  | 69972     |
-----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006016719271428883
Validation loss = 0.0006903636967763305
Validation loss = 0.0006558586028404534
Validation loss = 0.0007396832224912941
Validation loss = 0.0007383237825706601
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001134136226028204
Validation loss = 0.0007601666729897261
Validation loss = 0.000645313470158726
Validation loss = 0.0008254502317868173
Validation loss = 0.0008081341511569917
Validation loss = 0.0008896663784980774
Validation loss = 0.0006615566671825945
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010819920571520925
Validation loss = 0.0006997259915806353
Validation loss = 0.000978562398813665
Validation loss = 0.0009059601579792798
Validation loss = 0.0006239091162569821
Validation loss = 0.0005794934695586562
Validation loss = 0.000581660307943821
Validation loss = 0.0005881852703168988
Validation loss = 0.0008738483884371817
Validation loss = 0.000665262050461024
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006578861502930522
Validation loss = 0.0009575505391694605
Validation loss = 0.000829596072435379
Validation loss = 0.0009209512500092387
Validation loss = 0.0011000256054103374
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008157186675816774
Validation loss = 0.0010426511289551854
Validation loss = 0.0013377536088228226
Validation loss = 0.0008268904057331383
Validation loss = 0.000869494047947228
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -56.9    |
| Iteration     | 41       |
| MaximumReturn | -0.00212 |
| MinimumReturn | -135     |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007174373022280633
Validation loss = 0.0005844415281899273
Validation loss = 0.0006881893496029079
Validation loss = 0.0005450891330838203
Validation loss = 0.0011626174673438072
Validation loss = 0.001296995673328638
Validation loss = 0.0007466503302566707
Validation loss = 0.0007290783687494695
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009487895877100527
Validation loss = 0.0006122971535660326
Validation loss = 0.000794728402979672
Validation loss = 0.0008132173679769039
Validation loss = 0.0005612295935861766
Validation loss = 0.0008454826893284917
Validation loss = 0.000667073589283973
Validation loss = 0.0005248390370979905
Validation loss = 0.0010561685776337981
Validation loss = 0.0006890242802910507
Validation loss = 0.0006349904579110444
Validation loss = 0.0007581479148939252
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007433384889736772
Validation loss = 0.000545854156371206
Validation loss = 0.0007500868523493409
Validation loss = 0.0006310792523436248
Validation loss = 0.002518838969990611
Validation loss = 0.0009374244837090373
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007759944419376552
Validation loss = 0.0011667496291920543
Validation loss = 0.0008383448584936559
Validation loss = 0.0007166263530962169
Validation loss = 0.0005861493409611285
Validation loss = 0.0009539791499264538
Validation loss = 0.0011590748326852918
Validation loss = 0.0007127600838430226
Validation loss = 0.0008525484590791166
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010184275452047586
Validation loss = 0.0008415572810918093
Validation loss = 0.0005187540082260966
Validation loss = 0.0008093751966953278
Validation loss = 0.0006746954168193042
Validation loss = 0.0011353507870808244
Validation loss = 0.0011371361324563622
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -16.5     |
| Iteration     | 42        |
| MaximumReturn | -0.000499 |
| MinimumReturn | -125      |
| TotalSamples  | 73304     |
-----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005414163460955024
Validation loss = 0.0010450194822624326
Validation loss = 0.0008471626206301153
Validation loss = 0.0009989612735807896
Validation loss = 0.0006150319240987301
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006912215612828732
Validation loss = 0.0006070302333682775
Validation loss = 0.0007505670073442161
Validation loss = 0.0006562971975654364
Validation loss = 0.0006135195144452155
Validation loss = 0.000889898044988513
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008545233868062496
Validation loss = 0.0005809064023196697
Validation loss = 0.0009173713624477386
Validation loss = 0.000749379803892225
Validation loss = 0.0010120252845808864
Validation loss = 0.000762031995691359
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008469909662380815
Validation loss = 0.0009478626307100058
Validation loss = 0.000596591446083039
Validation loss = 0.0010246813762933016
Validation loss = 0.0008204427431337535
Validation loss = 0.001230308087542653
Validation loss = 0.0009446577751077712
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007861574995331466
Validation loss = 0.001082996022887528
Validation loss = 0.0009487483184784651
Validation loss = 0.0008599037537351251
Validation loss = 0.0010881880298256874
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -132     |
| Iteration     | 43       |
| MaximumReturn | -91.2    |
| MinimumReturn | -165     |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007990611717104912
Validation loss = 0.0014206288615241647
Validation loss = 0.0010380251333117485
Validation loss = 0.0008537965477444232
Validation loss = 0.0009121705661527812
Validation loss = 0.0007576300413347781
Validation loss = 0.0008505331352353096
Validation loss = 0.0009362613200210035
Validation loss = 0.0007328594801947474
Validation loss = 0.0008809236460365355
Validation loss = 0.0006899979198351502
Validation loss = 0.0007794934208504856
Validation loss = 0.0007805986097082496
Validation loss = 0.0007762417080812156
Validation loss = 0.0009378825197927654
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0045951008796691895
Validation loss = 0.0013185931602492929
Validation loss = 0.0010488785337656736
Validation loss = 0.0007368251099251211
Validation loss = 0.000742386095225811
Validation loss = 0.0008639149600639939
Validation loss = 0.0007508794660679996
Validation loss = 0.0009817280806601048
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006957279518246651
Validation loss = 0.0011292911367490888
Validation loss = 0.0011219991138204932
Validation loss = 0.0010853884741663933
Validation loss = 0.0009857873665168881
Validation loss = 0.0008853433537296951
Validation loss = 0.0008600194123573601
Validation loss = 0.0008960995473898947
Validation loss = 0.0010882108472287655
Validation loss = 0.0007284614839591086
Validation loss = 0.00089809246128425
Validation loss = 0.0008803980308584869
Validation loss = 0.0009190129931084812
Validation loss = 0.000808979501016438
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0029332705307751894
Validation loss = 0.000936139200348407
Validation loss = 0.0012106088688597083
Validation loss = 0.0013193286722525954
Validation loss = 0.0008468959131278098
Validation loss = 0.000645297288428992
Validation loss = 0.0006792504573240876
Validation loss = 0.0009058196446858346
Validation loss = 0.0009361992706544697
Validation loss = 0.0008673178381286561
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00462711788713932
Validation loss = 0.001244999933987856
Validation loss = 0.0009009754285216331
Validation loss = 0.0007993148756213486
Validation loss = 0.0010935890022665262
Validation loss = 0.0007443723152391613
Validation loss = 0.000860928266774863
Validation loss = 0.0007070021820254624
Validation loss = 0.0010896780295297503
Validation loss = 0.0006277200300246477
Validation loss = 0.0007812583935447037
Validation loss = 0.0008375832694582641
Validation loss = 0.0007801254978403449
Validation loss = 0.0008406931301578879
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.56     |
| Iteration     | 44        |
| MaximumReturn | -0.000598 |
| MinimumReturn | -38.6     |
| TotalSamples  | 76636     |
-----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007845920044928789
Validation loss = 0.000740224844776094
Validation loss = 0.0006546485237777233
Validation loss = 0.0007682112045586109
Validation loss = 0.0009578794706612825
Validation loss = 0.0007006431696936488
Validation loss = 0.000982495374046266
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012038234854117036
Validation loss = 0.0007855176809243858
Validation loss = 0.001012273016385734
Validation loss = 0.0012042425805702806
Validation loss = 0.0010173195041716099
Validation loss = 0.0010445896768942475
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008304670336656272
Validation loss = 0.0007914290181361139
Validation loss = 0.0006794555229134858
Validation loss = 0.0009125022334046662
Validation loss = 0.0009156988235190511
Validation loss = 0.00174844975117594
Validation loss = 0.0006158786127343774
Validation loss = 0.0005311825079843402
Validation loss = 0.0010523843811824918
Validation loss = 0.000699856027495116
Validation loss = 0.0008507550228387117
Validation loss = 0.0006746319122612476
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011315152514725924
Validation loss = 0.000807831238489598
Validation loss = 0.0010111094452440739
Validation loss = 0.0013495477614924312
Validation loss = 0.0011722869239747524
Validation loss = 0.000574190285988152
Validation loss = 0.0007824873318895698
Validation loss = 0.0008294617873616517
Validation loss = 0.0008718834142200649
Validation loss = 0.0007190995966084301
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000642270315438509
Validation loss = 0.0006417541299015284
Validation loss = 0.0008219715673476458
Validation loss = 0.0007447583484463394
Validation loss = 0.0011473678750917315
Validation loss = 0.0007625912548974156
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -13.2     |
| Iteration     | 45        |
| MaximumReturn | -0.000624 |
| MinimumReturn | -112      |
| TotalSamples  | 78302     |
-----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008322559297084808
Validation loss = 0.000819526903796941
Validation loss = 0.000567900890018791
Validation loss = 0.0006073036347515881
Validation loss = 0.000954688002821058
Validation loss = 0.0006856092368252575
Validation loss = 0.000741110707167536
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008834013133309782
Validation loss = 0.0006605202797800303
Validation loss = 0.0009554220596328378
Validation loss = 0.0007057830225676298
Validation loss = 0.0006166762905195355
Validation loss = 0.0007748397765681148
Validation loss = 0.0011854579206556082
Validation loss = 0.0006906916969455779
Validation loss = 0.0007244486478157341
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007707980694249272
Validation loss = 0.0007374438573606312
Validation loss = 0.0006541799521073699
Validation loss = 0.0008443339029327035
Validation loss = 0.0006325352587737143
Validation loss = 0.0009277556673623621
Validation loss = 0.0011758070904761553
Validation loss = 0.0006484304321929812
Validation loss = 0.0008588727214373648
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007153122569434345
Validation loss = 0.0005609848885796964
Validation loss = 0.0018907722551375628
Validation loss = 0.0011717162560671568
Validation loss = 0.0006941650062799454
Validation loss = 0.0008143077720887959
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010483469814062119
Validation loss = 0.0007088060374371707
Validation loss = 0.0007082033553160727
Validation loss = 0.0008365172543562949
Validation loss = 0.0008073805365711451
Validation loss = 0.0006742089753970504
Validation loss = 0.0011264977511018515
Validation loss = 0.0007305436301976442
Validation loss = 0.0009168222895823419
Validation loss = 0.0007027151877991855
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.27    |
| Iteration     | 46       |
| MaximumReturn | -0.178   |
| MinimumReturn | -52.6    |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001069151796400547
Validation loss = 0.0008348386618308723
Validation loss = 0.001165346009656787
Validation loss = 0.0010025581577792764
Validation loss = 0.0008735276642255485
Validation loss = 0.000808094278909266
Validation loss = 0.001301160198636353
Validation loss = 0.0018205167725682259
Validation loss = 0.0015436251414939761
Validation loss = 0.0008796781185083091
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001349572092294693
Validation loss = 0.0009446715703234076
Validation loss = 0.0010098733473569155
Validation loss = 0.0009036721894517541
Validation loss = 0.0008288638782687485
Validation loss = 0.0009297356009483337
Validation loss = 0.0009647142142057419
Validation loss = 0.0007023984799161553
Validation loss = 0.0007865264778956771
Validation loss = 0.0008435087511315942
Validation loss = 0.0013351452071219683
Validation loss = 0.000607627909630537
Validation loss = 0.0012162872590124607
Validation loss = 0.0008452708716504276
Validation loss = 0.0011619401630014181
Validation loss = 0.001204668078571558
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001247571548447013
Validation loss = 0.0012752661714330316
Validation loss = 0.0007413063431158662
Validation loss = 0.0013693550135940313
Validation loss = 0.0007942955708131194
Validation loss = 0.0008219971205107868
Validation loss = 0.0007767534698359668
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013989853905513883
Validation loss = 0.000962333579082042
Validation loss = 0.0007328513311222196
Validation loss = 0.0006202679360285401
Validation loss = 0.0007342174067161977
Validation loss = 0.0018778673838824034
Validation loss = 0.0008139398996718228
Validation loss = 0.0008857211214490235
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015652090078219771
Validation loss = 0.0007451637648046017
Validation loss = 0.0009058283758349717
Validation loss = 0.0007506869733333588
Validation loss = 0.0009386780438944697
Validation loss = 0.0008792343433015049
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.233   |
| Iteration     | 47       |
| MaximumReturn | -0.0995  |
| MinimumReturn | -0.492   |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009087709477171302
Validation loss = 0.0015890474896878004
Validation loss = 0.0009327991865575314
Validation loss = 0.0011368419509381056
Validation loss = 0.001574414549395442
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006922759930603206
Validation loss = 0.0008367697009816766
Validation loss = 0.0007682419382035732
Validation loss = 0.0006233238382264972
Validation loss = 0.0011733013670891523
Validation loss = 0.0009539792081341147
Validation loss = 0.0007547413115389645
Validation loss = 0.001050329185090959
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010539807844907045
Validation loss = 0.0014464621199294925
Validation loss = 0.001328559359535575
Validation loss = 0.0017584968591108918
Validation loss = 0.000931747374124825
Validation loss = 0.0006472858367487788
Validation loss = 0.0018377590458840132
Validation loss = 0.000703806581441313
Validation loss = 0.0011846909765154123
Validation loss = 0.0010279405396431684
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007381085306406021
Validation loss = 0.000978644471615553
Validation loss = 0.0007265796302817762
Validation loss = 0.0008591647492721677
Validation loss = 0.0007275667157955468
Validation loss = 0.0013008943060413003
Validation loss = 0.002352639799937606
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006946889916434884
Validation loss = 0.0007194819045253098
Validation loss = 0.0005868197185918689
Validation loss = 0.0009160498157143593
Validation loss = 0.0011795550817623734
Validation loss = 0.0016155336052179337
Validation loss = 0.0025997983757406473
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0523   |
| Iteration     | 48        |
| MaximumReturn | -0.000938 |
| MinimumReturn | -0.196    |
| TotalSamples  | 83300     |
-----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008337165927514434
Validation loss = 0.0009609052794985473
Validation loss = 0.0008916733204387128
Validation loss = 0.0008083844440989196
Validation loss = 0.0010809836676344275
Validation loss = 0.0013294110540300608
Validation loss = 0.0009279620135203004
Validation loss = 0.000728135637473315
Validation loss = 0.0007190646720118821
Validation loss = 0.0014496224466711283
Validation loss = 0.0009405585587956011
Validation loss = 0.0013937556650489569
Validation loss = 0.0007310414803214371
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011360228527337313
Validation loss = 0.0015521416207775474
Validation loss = 0.0008644162444397807
Validation loss = 0.0026387046091258526
Validation loss = 0.000696172472089529
Validation loss = 0.0008428808068856597
Validation loss = 0.0012701189843937755
Validation loss = 0.0007145279669202864
Validation loss = 0.000720465846825391
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009449315839447081
Validation loss = 0.0007452593999914825
Validation loss = 0.0012306225253269076
Validation loss = 0.0009125135838985443
Validation loss = 0.0008082393324002624
Validation loss = 0.0012710732407867908
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015824642032384872
Validation loss = 0.0013834488345310092
Validation loss = 0.0017778021283447742
Validation loss = 0.0009955472778528929
Validation loss = 0.0006719278171658516
Validation loss = 0.0008160438155755401
Validation loss = 0.0016805792693048716
Validation loss = 0.0007351227686740458
Validation loss = 0.000635390228126198
Validation loss = 0.001217500539496541
Validation loss = 0.0005990210338495672
Validation loss = 0.0008385409601032734
Validation loss = 0.001202591578476131
Validation loss = 0.0017646747874096036
Validation loss = 0.000749932718463242
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008928882889449596
Validation loss = 0.0020174437668174505
Validation loss = 0.0007477542385458946
Validation loss = 0.0009016914991661906
Validation loss = 0.001250313245691359
Validation loss = 0.0007981447270140052
Validation loss = 0.0006517043220810592
Validation loss = 0.0009915789123624563
Validation loss = 0.001466417103074491
Validation loss = 0.0009952338878065348
Validation loss = 0.0006112965056672692
Validation loss = 0.0007858768803998828
Validation loss = 0.0008827188867144287
Validation loss = 0.0010248933685943484
Validation loss = 0.0008400990045629442
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0544   |
| Iteration     | 49        |
| MaximumReturn | -0.000992 |
| MinimumReturn | -0.153    |
| TotalSamples  | 84966     |
-----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000580651278141886
Validation loss = 0.0009188694530166686
Validation loss = 0.0008890945464372635
Validation loss = 0.0007859274046495557
Validation loss = 0.0010691772913560271
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006894540856592357
Validation loss = 0.0009011958609335124
Validation loss = 0.0018087151693180203
Validation loss = 0.0006367472233250737
Validation loss = 0.0008470183820463717
Validation loss = 0.001367128104902804
Validation loss = 0.000670476583763957
Validation loss = 0.0011280685430392623
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009960243478417397
Validation loss = 0.0009311283938586712
Validation loss = 0.0007124278345145285
Validation loss = 0.0007708719349466264
Validation loss = 0.0008831726736389101
Validation loss = 0.0009491689270362258
Validation loss = 0.001253102789632976
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011028788285329938
Validation loss = 0.0008591792429797351
Validation loss = 0.0011686062207445502
Validation loss = 0.0009152842103503644
Validation loss = 0.0010341474553570151
Validation loss = 0.0010594545165076852
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008517581736668944
Validation loss = 0.0009311491739936173
Validation loss = 0.0015956410206854343
Validation loss = 0.0008098194957710803
Validation loss = 0.001405799644999206
Validation loss = 0.0013111969456076622
Validation loss = 0.0009342568810097873
Validation loss = 0.0007323076715692878
Validation loss = 0.0006823735893703997
Validation loss = 0.0006928700022399426
Validation loss = 0.0013766530901193619
Validation loss = 0.0010690436465665698
Validation loss = 0.0008674947894178331
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0976  |
| Iteration     | 50       |
| MaximumReturn | -0.00089 |
| MinimumReturn | -0.238   |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000594942073803395
Validation loss = 0.0007427058299072087
Validation loss = 0.0006174306035973132
Validation loss = 0.001428444404155016
Validation loss = 0.0008571724174544215
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010288003832101822
Validation loss = 0.0009977802401408553
Validation loss = 0.0006917028222233057
Validation loss = 0.0010872971033677459
Validation loss = 0.0006448669009841979
Validation loss = 0.0018476293189451098
Validation loss = 0.000830575474537909
Validation loss = 0.0013215921353548765
Validation loss = 0.0008168300846591592
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008896276704035699
Validation loss = 0.0007930606952868402
Validation loss = 0.0025383196771144867
Validation loss = 0.000645788328256458
Validation loss = 0.0007002126658335328
Validation loss = 0.0012748341541737318
Validation loss = 0.0010174107737839222
Validation loss = 0.0013859535101801157
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006980485050007701
Validation loss = 0.0008061000844463706
Validation loss = 0.001181294210255146
Validation loss = 0.0007989227306097746
Validation loss = 0.0009512485703453422
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000784148636739701
Validation loss = 0.0007874804432503879
Validation loss = 0.0015794294886291027
Validation loss = 0.0007946591940708458
Validation loss = 0.0006267850403673947
Validation loss = 0.0011630764929577708
Validation loss = 0.0028863197658210993
Validation loss = 0.0009547650697641075
Validation loss = 0.0006947274669073522
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.144   |
| Iteration     | 51       |
| MaximumReturn | -0.00101 |
| MinimumReturn | -0.445   |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006177913746796548
Validation loss = 0.0009170849225483835
Validation loss = 0.0007358757429756224
Validation loss = 0.000774551706854254
Validation loss = 0.0007799388258717954
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018372581107541919
Validation loss = 0.0006024366011843085
Validation loss = 0.00099668197799474
Validation loss = 0.0010106952395290136
Validation loss = 0.0006108858506195247
Validation loss = 0.0008906666771508753
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008392985910177231
Validation loss = 0.0008764642989262938
Validation loss = 0.0009384058648720384
Validation loss = 0.0014514271169900894
Validation loss = 0.0008150596404448152
Validation loss = 0.0009073873516172171
Validation loss = 0.0007894171285443008
Validation loss = 0.0009625788661651313
Validation loss = 0.0013119337381795049
Validation loss = 0.0024399859830737114
Validation loss = 0.000710377236828208
Validation loss = 0.0008326156530529261
Validation loss = 0.0009174097795039415
Validation loss = 0.000803283357527107
Validation loss = 0.0008321422501467168
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005997679545544088
Validation loss = 0.0009638884803280234
Validation loss = 0.0008783364319242537
Validation loss = 0.0006369493785314262
Validation loss = 0.0007289953646250069
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010817608563229442
Validation loss = 0.0009469289798289537
Validation loss = 0.0005199430743232369
Validation loss = 0.0006181505159474909
Validation loss = 0.0006984391366131604
Validation loss = 0.0008434024057351053
Validation loss = 0.0006465246551670134
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.404   |
| Iteration     | 52       |
| MaximumReturn | -0.0683  |
| MinimumReturn | -0.703   |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008720889454707503
Validation loss = 0.0009890543296933174
Validation loss = 0.0007867104723118246
Validation loss = 0.0009448940400034189
Validation loss = 0.0013046119129285216
Validation loss = 0.0006703761173412204
Validation loss = 0.0011282307095825672
Validation loss = 0.0009042440797202289
Validation loss = 0.000706297461874783
Validation loss = 0.0005962249124422669
Validation loss = 0.0006244685500860214
Validation loss = 0.0008830809965729713
Validation loss = 0.0005440809763967991
Validation loss = 0.0009129223180934787
Validation loss = 0.001536645693704486
Validation loss = 0.0007917387410998344
Validation loss = 0.0007386051001958549
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011774059385061264
Validation loss = 0.0008081864798441529
Validation loss = 0.0007625239668413997
Validation loss = 0.0007177703082561493
Validation loss = 0.000566239352338016
Validation loss = 0.0009787677554413676
Validation loss = 0.0005874924827367067
Validation loss = 0.0006553131388500333
Validation loss = 0.0006861964357085526
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005756458849646151
Validation loss = 0.0009066587663255632
Validation loss = 0.0006568575045093894
Validation loss = 0.0010393899865448475
Validation loss = 0.0007619710522703826
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006110591348260641
Validation loss = 0.0007308113854378462
Validation loss = 0.0006125553045421839
Validation loss = 0.001015417743474245
Validation loss = 0.0007807564688846469
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005382532253861427
Validation loss = 0.0006354854558594525
Validation loss = 0.000763546850066632
Validation loss = 0.0010622413828969002
Validation loss = 0.0006583372596651316
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.104    |
| Iteration     | 53        |
| MaximumReturn | -0.000718 |
| MinimumReturn | -1.24     |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007622345583513379
Validation loss = 0.0006968527450226247
Validation loss = 0.0006699178484268486
Validation loss = 0.0006746270810253918
Validation loss = 0.0007244797889143229
Validation loss = 0.0011153905652463436
Validation loss = 0.0012351181358098984
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006660043727606535
Validation loss = 0.0005493151838891208
Validation loss = 0.0009339936077594757
Validation loss = 0.00072329374961555
Validation loss = 0.0010085477260872722
Validation loss = 0.0006274926709011197
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007355741108767688
Validation loss = 0.0006870162906125188
Validation loss = 0.0008210427477024496
Validation loss = 0.000862152548506856
Validation loss = 0.0007041890639811754
Validation loss = 0.0008050052565522492
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013688246253877878
Validation loss = 0.0008765856618992984
Validation loss = 0.0007914449670352042
Validation loss = 0.001152545097284019
Validation loss = 0.0010578717337921262
Validation loss = 0.0007714348612353206
Validation loss = 0.0007375817513093352
Validation loss = 0.0005902645643800497
Validation loss = 0.0007716423715464771
Validation loss = 0.0007626907899975777
Validation loss = 0.0009769875323399901
Validation loss = 0.0006258516805246472
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007650824845768511
Validation loss = 0.0006366701563820243
Validation loss = 0.0007502007647417486
Validation loss = 0.0007518677157349885
Validation loss = 0.0005529819172807038
Validation loss = 0.0006172858411446214
Validation loss = 0.0007172961486503482
Validation loss = 0.0006247466080822051
Validation loss = 0.0005638999864459038
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.24    |
| Iteration     | 54       |
| MaximumReturn | -0.00131 |
| MinimumReturn | -2.05    |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007083368254825473
Validation loss = 0.0008394013857468963
Validation loss = 0.0006233795429579914
Validation loss = 0.0007425895892083645
Validation loss = 0.0014429788570851088
Validation loss = 0.0006454001413658261
Validation loss = 0.0008470246102660894
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017480122623965144
Validation loss = 0.000696964212693274
Validation loss = 0.0004958283388987184
Validation loss = 0.0005153173115104437
Validation loss = 0.0007637472008354962
Validation loss = 0.0007045447127893567
Validation loss = 0.0011776204919442534
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009455105173401535
Validation loss = 0.0009472788660787046
Validation loss = 0.000683689839206636
Validation loss = 0.001157602178864181
Validation loss = 0.0024549802765250206
Validation loss = 0.0006793295033276081
Validation loss = 0.000580283987801522
Validation loss = 0.0008923616842366755
Validation loss = 0.0007677111425437033
Validation loss = 0.0006652525626122952
Validation loss = 0.0008630688535049558
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006051043746992946
Validation loss = 0.0008816797053441405
Validation loss = 0.0007271106005646288
Validation loss = 0.00112222321331501
Validation loss = 0.0006420973804779351
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000597611244302243
Validation loss = 0.0006549874087795615
Validation loss = 0.0007589335436932743
Validation loss = 0.0006655485485680401
Validation loss = 0.0007875466253608465
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0762   |
| Iteration     | 55        |
| MaximumReturn | -0.000631 |
| MinimumReturn | -1.24     |
| TotalSamples  | 94962     |
-----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010603154078125954
Validation loss = 0.001281515578739345
Validation loss = 0.0006703587714582682
Validation loss = 0.0006194007000885904
Validation loss = 0.0006000909488648176
Validation loss = 0.0011317378375679255
Validation loss = 0.0008232290856540203
Validation loss = 0.0007282893639057875
Validation loss = 0.0007400242611765862
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012129251845180988
Validation loss = 0.000738548522349447
Validation loss = 0.0007724755560047925
Validation loss = 0.0008713258430361748
Validation loss = 0.0007443542126566172
Validation loss = 0.0013053026050329208
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009635411552153528
Validation loss = 0.0008442364633083344
Validation loss = 0.000587919435929507
Validation loss = 0.0006870294455438852
Validation loss = 0.0008637533173896372
Validation loss = 0.0006785309524275362
Validation loss = 0.0007938201888464391
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007178484811447561
Validation loss = 0.0005598653806373477
Validation loss = 0.0006959265447221696
Validation loss = 0.0005504636792466044
Validation loss = 0.0010101645020768046
Validation loss = 0.0018890196224674582
Validation loss = 0.00101772032212466
Validation loss = 0.0007312002708204091
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005034907953813672
Validation loss = 0.0007404957432299852
Validation loss = 0.0008044755668379366
Validation loss = 0.0006482138996943831
Validation loss = 0.0004734814283438027
Validation loss = 0.000649424793664366
Validation loss = 0.0011046073632314801
Validation loss = 0.0008347195107489824
Validation loss = 0.001858441741205752
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0971   |
| Iteration     | 56        |
| MaximumReturn | -0.000653 |
| MinimumReturn | -1.39     |
| TotalSamples  | 96628     |
-----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010243895230814815
Validation loss = 0.0008869007579050958
Validation loss = 0.0008057517115958035
Validation loss = 0.0013045041123405099
Validation loss = 0.000629899266641587
Validation loss = 0.0006800456903874874
Validation loss = 0.0006304657435975969
Validation loss = 0.0005967649631202221
Validation loss = 0.0010734674287959933
Validation loss = 0.000844909343868494
Validation loss = 0.0005386300035752356
Validation loss = 0.0006985758082009852
Validation loss = 0.0005588115891441703
Validation loss = 0.0007816500146873295
Validation loss = 0.0009108964004553854
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007817978621460497
Validation loss = 0.0006261752569116652
Validation loss = 0.0007227095775306225
Validation loss = 0.001099999062716961
Validation loss = 0.0010218668030574918
Validation loss = 0.0008615028928034008
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006607628311030567
Validation loss = 0.0005659338203258812
Validation loss = 0.0007282753358595073
Validation loss = 0.0008260801550932229
Validation loss = 0.0006495852139778435
Validation loss = 0.0006148401298560202
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008888104930520058
Validation loss = 0.0007053217850625515
Validation loss = 0.0004884642548859119
Validation loss = 0.0006027229246683419
Validation loss = 0.0007421127520501614
Validation loss = 0.0006126182270236313
Validation loss = 0.0010056301252916455
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007750094518996775
Validation loss = 0.0005807880661450326
Validation loss = 0.0006136926240287721
Validation loss = 0.0007712733931839466
Validation loss = 0.001011500135064125
Validation loss = 0.0006926085916347802
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.417    |
| Iteration     | 57        |
| MaximumReturn | -0.000811 |
| MinimumReturn | -1.7      |
| TotalSamples  | 98294     |
-----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001042212126776576
Validation loss = 0.0005976362735964358
Validation loss = 0.0010006673401221633
Validation loss = 0.0006645814282819629
Validation loss = 0.0007475788006559014
Validation loss = 0.0005800339858978987
Validation loss = 0.0011782277142629027
Validation loss = 0.0005673000705428421
Validation loss = 0.0006105739739723504
Validation loss = 0.0008484006393700838
Validation loss = 0.0007603309350088239
Validation loss = 0.0009374836809001863
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009986339136958122
Validation loss = 0.000695917522534728
Validation loss = 0.0005561312427744269
Validation loss = 0.0004936527111567557
Validation loss = 0.0006806644378229976
Validation loss = 0.0011789690470322967
Validation loss = 0.0005259888712316751
Validation loss = 0.0008408667636103928
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001208036788739264
Validation loss = 0.0006393534713424742
Validation loss = 0.0005652022082358599
Validation loss = 0.0009209919371642172
Validation loss = 0.0007094462052918971
Validation loss = 0.0005519706173799932
Validation loss = 0.0006335050566121936
Validation loss = 0.0009245688561350107
Validation loss = 0.0006630279240198433
Validation loss = 0.0007201543194241822
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007627839804627001
Validation loss = 0.0006328221061266959
Validation loss = 0.0006833061925135553
Validation loss = 0.0009687995770946145
Validation loss = 0.0006459344876930118
Validation loss = 0.0009015259565785527
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009929330553859472
Validation loss = 0.0007197088561952114
Validation loss = 0.001096854917705059
Validation loss = 0.0007017647149041295
Validation loss = 0.0011043129488825798
Validation loss = 0.0008467326406389475
Validation loss = 0.0005944519070908427
Validation loss = 0.0006010294309817255
Validation loss = 0.0009519550367258489
Validation loss = 0.0006534911808557808
Validation loss = 0.0006498178699985147
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3       |
| Iteration     | 58       |
| MaximumReturn | -2.39    |
| MinimumReturn | -3.94    |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008072368800640106
Validation loss = 0.0010685729794204235
Validation loss = 0.0004761602031067014
Validation loss = 0.000484505231725052
Validation loss = 0.0007505918038077652
Validation loss = 0.0007681594579480588
Validation loss = 0.0005221439059823751
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005377603811211884
Validation loss = 0.001070491038262844
Validation loss = 0.0005334438174031675
Validation loss = 0.00044915653415955603
Validation loss = 0.0005360417999327183
Validation loss = 0.00063793093431741
Validation loss = 0.0005482761771418154
Validation loss = 0.0005690362304449081
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000691205495968461
Validation loss = 0.00043646860285662115
Validation loss = 0.0004501107323449105
Validation loss = 0.000540324894245714
Validation loss = 0.00048074760707095265
Validation loss = 0.0006378912949003279
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008365897228941321
Validation loss = 0.000588964088819921
Validation loss = 0.0005179421859793365
Validation loss = 0.0007735221297480166
Validation loss = 0.00040634290780872107
Validation loss = 0.0008030786993913352
Validation loss = 0.0008239480666816235
Validation loss = 0.0007015959126874804
Validation loss = 0.0005998598644509912
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001204942585900426
Validation loss = 0.0005985225434415042
Validation loss = 0.0005140157300047576
Validation loss = 0.0007263440056703985
Validation loss = 0.0009950484381988645
Validation loss = 0.000587544753216207
Validation loss = 0.0006029068026691675
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.01     |
| Iteration     | 59        |
| MaximumReturn | -0.000842 |
| MinimumReturn | -3.48     |
| TotalSamples  | 101626    |
-----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0003795987868215889
Validation loss = 0.0004931611474603415
Validation loss = 0.0014152339426800609
Validation loss = 0.0004905143869109452
Validation loss = 0.0011490803444758058
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0004766994679812342
Validation loss = 0.0004225629672873765
Validation loss = 0.0004841636400669813
Validation loss = 0.000847821997012943
Validation loss = 0.0006779088289476931
Validation loss = 0.0007591491448692977
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0004960849182680249
Validation loss = 0.0005914539215154946
Validation loss = 0.0005302128847688437
Validation loss = 0.00042636043508537114
Validation loss = 0.0007055105525068939
Validation loss = 0.0007762634777463973
Validation loss = 0.0006051831878721714
Validation loss = 0.0005179101717658341
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00046544003998860717
Validation loss = 0.0006887934287078679
Validation loss = 0.0005513825453817844
Validation loss = 0.0004997103824280202
Validation loss = 0.000558370491489768
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005010938039049506
Validation loss = 0.0009730695746839046
Validation loss = 0.0008135515381582081
Validation loss = 0.0006752291810698807
Validation loss = 0.0005856304196640849
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.462    |
| Iteration     | 60        |
| MaximumReturn | -0.000661 |
| MinimumReturn | -3.24     |
| TotalSamples  | 103292    |
-----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011897002113983035
Validation loss = 0.0005298411124385893
Validation loss = 0.00044245042954571545
Validation loss = 0.0007833075360395014
Validation loss = 0.0008556102402508259
Validation loss = 0.0005861434619873762
Validation loss = 0.0006201818468980491
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0004771638195961714
Validation loss = 0.0006503044278360903
Validation loss = 0.0007150981109589338
Validation loss = 0.00055071891983971
Validation loss = 0.0005273675778880715
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005173073732294142
Validation loss = 0.0010134830372408032
Validation loss = 0.0006604762747883797
Validation loss = 0.0007967135752551258
Validation loss = 0.0005679989117197692
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006404236191883683
Validation loss = 0.0005704323411919177
Validation loss = 0.0006181174539960921
Validation loss = 0.00046702660620212555
Validation loss = 0.0009368116734549403
Validation loss = 0.000997880706563592
Validation loss = 0.0005113924853503704
Validation loss = 0.000499115907587111
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009032537345774472
Validation loss = 0.0005724492366425693
Validation loss = 0.0005363318487070501
Validation loss = 0.0005971621139906347
Validation loss = 0.0005410393350757658
Validation loss = 0.00041191757190972567
Validation loss = 0.0004785501805599779
Validation loss = 0.0005720952176488936
Validation loss = 0.0005448373267427087
Validation loss = 0.0005010530585423112
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -88.3    |
| Iteration     | 61       |
| MaximumReturn | -2.88    |
| MinimumReturn | -145     |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003188009839504957
Validation loss = 0.0008087849710136652
Validation loss = 0.0007372132386080921
Validation loss = 0.0005766034009866416
Validation loss = 0.0005834781331941485
Validation loss = 0.0010879478650167584
Validation loss = 0.0005609182408079505
Validation loss = 0.0004724215541500598
Validation loss = 0.0006223003147169948
Validation loss = 0.0008773136069066823
Validation loss = 0.0006883320165798068
Validation loss = 0.0004090708971489221
Validation loss = 0.0004992258618585765
Validation loss = 0.0004292821104172617
Validation loss = 0.0005699772736988962
Validation loss = 0.0004164865822531283
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009554118732921779
Validation loss = 0.0006008860073052347
Validation loss = 0.0005595588008873165
Validation loss = 0.00048074053484015167
Validation loss = 0.0005990305216982961
Validation loss = 0.0005740084452554584
Validation loss = 0.0005965728778392076
Validation loss = 0.0004411150002852082
Validation loss = 0.0005702902562916279
Validation loss = 0.0005916876834817231
Validation loss = 0.0009155489969998598
Validation loss = 0.00044898962369188666
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010620779357850552
Validation loss = 0.0007364007760770619
Validation loss = 0.0016467588720843196
Validation loss = 0.0005300703924149275
Validation loss = 0.0005055087385699153
Validation loss = 0.00047919206554070115
Validation loss = 0.0005115031381137669
Validation loss = 0.0005619494477286935
Validation loss = 0.0006993056158535182
Validation loss = 0.0006592985591851175
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0028269437607377768
Validation loss = 0.0004925904213450849
Validation loss = 0.0005698712193407118
Validation loss = 0.0009676995105110109
Validation loss = 0.0004469891428016126
Validation loss = 0.0004525301337707788
Validation loss = 0.0005801307270303369
Validation loss = 0.0004581977264024317
Validation loss = 0.0004142877005506307
Validation loss = 0.0005950629711151123
Validation loss = 0.0004899287014268339
Validation loss = 0.0005097582470625639
Validation loss = 0.0007638121023774147
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005719296168535948
Validation loss = 0.0005311440909281373
Validation loss = 0.0005606433260254562
Validation loss = 0.0006863036542199552
Validation loss = 0.0003914206463377923
Validation loss = 0.00043764710426330566
Validation loss = 0.00038311671232804656
Validation loss = 0.0005719817709177732
Validation loss = 0.0004048982809763402
Validation loss = 0.0005600564763881266
Validation loss = 0.0003937006404157728
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14      |
| Iteration     | 62       |
| MaximumReturn | -0.00242 |
| MinimumReturn | -83.8    |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007266885950230062
Validation loss = 0.0005696347216144204
Validation loss = 0.000631131639238447
Validation loss = 0.0006026826449669898
Validation loss = 0.000577666622120887
Validation loss = 0.0007111593149602413
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009811082854866982
Validation loss = 0.0005743398214690387
Validation loss = 0.0005939260008744895
Validation loss = 0.0006135033909231424
Validation loss = 0.0006051784730516374
Validation loss = 0.0008393439347855747
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006508376100100577
Validation loss = 0.0005490368930622935
Validation loss = 0.0005022227414883673
Validation loss = 0.0005452159093692899
Validation loss = 0.0005306887323968112
Validation loss = 0.00046150677371770144
Validation loss = 0.0007265525637194514
Validation loss = 0.0005985928582958877
Validation loss = 0.0006118625751696527
Validation loss = 0.0006317980587482452
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0004802731273230165
Validation loss = 0.0005636210553348064
Validation loss = 0.000498568348120898
Validation loss = 0.0007750087534077466
Validation loss = 0.0007847317610867321
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005872283945791423
Validation loss = 0.0005758656188845634
Validation loss = 0.0005290908738970757
Validation loss = 0.0005784454406239092
Validation loss = 0.0005948462639935315
Validation loss = 0.0007189858588390052
Validation loss = 0.0007266189786605537
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.94     |
| Iteration     | 63        |
| MaximumReturn | -0.000805 |
| MinimumReturn | -47.8     |
| TotalSamples  | 108290    |
-----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008813999011181295
Validation loss = 0.0004925589310005307
Validation loss = 0.0005417074426077306
Validation loss = 0.000529367767740041
Validation loss = 0.0009782775305211544
Validation loss = 0.0005970245692878962
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006930283852852881
Validation loss = 0.0008498544921167195
Validation loss = 0.000655299169011414
Validation loss = 0.0006287851720117033
Validation loss = 0.0004553001781459898
Validation loss = 0.0007957333000376821
Validation loss = 0.0006064533372409642
Validation loss = 0.0010095383040606976
Validation loss = 0.0006795970257371664
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008203858160413802
Validation loss = 0.0007393507985398173
Validation loss = 0.0006287128198891878
Validation loss = 0.00046765798470005393
Validation loss = 0.0005386193515732884
Validation loss = 0.0007696126704104245
Validation loss = 0.0007113200845196843
Validation loss = 0.0006871857331134379
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008352033910341561
Validation loss = 0.00048711535055190325
Validation loss = 0.000714911031536758
Validation loss = 0.0005269582034088671
Validation loss = 0.0006092342082411051
Validation loss = 0.0007403161725960672
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006867002230137587
Validation loss = 0.0007159669185057282
Validation loss = 0.00054359738714993
Validation loss = 0.0004970899317413568
Validation loss = 0.000904682616237551
Validation loss = 0.0006525970529764891
Validation loss = 0.0004201090196147561
Validation loss = 0.000677113130223006
Validation loss = 0.0006695459596812725
Validation loss = 0.0007925817044451833
Validation loss = 0.0005469559691846371
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -3.77     |
| Iteration     | 64        |
| MaximumReturn | -0.000695 |
| MinimumReturn | -56.7     |
| TotalSamples  | 109956    |
-----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009649080457165837
Validation loss = 0.0005820554797537625
Validation loss = 0.00062256318051368
Validation loss = 0.0005569534259848297
Validation loss = 0.0006469472427852452
Validation loss = 0.000549658143427223
Validation loss = 0.0006616708706133068
Validation loss = 0.0005336180329322815
Validation loss = 0.0005233660340309143
Validation loss = 0.0006098728044889867
Validation loss = 0.00048210102249868214
Validation loss = 0.0009141187183558941
Validation loss = 0.0005677553126588464
Validation loss = 0.0006982443737797439
Validation loss = 0.00048753441660664976
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006949722301214933
Validation loss = 0.0009367591701447964
Validation loss = 0.0006934802513569593
Validation loss = 0.0007989008445292711
Validation loss = 0.0005567010375671089
Validation loss = 0.0005757930339314044
Validation loss = 0.0005029798485338688
Validation loss = 0.0006912560202181339
Validation loss = 0.0005544341984204948
Validation loss = 0.0008781489450484514
Validation loss = 0.0011437095236033201
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005218053120188415
Validation loss = 0.0008251428371295333
Validation loss = 0.000638458295725286
Validation loss = 0.0007640654221177101
Validation loss = 0.000547169242054224
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005320029449649155
Validation loss = 0.0005999981076456606
Validation loss = 0.0008718376629985869
Validation loss = 0.0004991450114175677
Validation loss = 0.0005368217243812978
Validation loss = 0.0007868694374337792
Validation loss = 0.0005571880610659719
Validation loss = 0.0008480268297716975
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000633854593615979
Validation loss = 0.0007457953179255128
Validation loss = 0.0005939897964708507
Validation loss = 0.000723242643289268
Validation loss = 0.0005300093325786293
Validation loss = 0.0010454265866428614
Validation loss = 0.0005329053965397179
Validation loss = 0.000740746152587235
Validation loss = 0.0005521029816009104
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -13.6    |
| Iteration     | 65       |
| MaximumReturn | -0.217   |
| MinimumReturn | -63.7    |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005014654016122222
Validation loss = 0.0006491882377304137
Validation loss = 0.0006355360383167863
Validation loss = 0.0005574995302595198
Validation loss = 0.000463223026599735
Validation loss = 0.0005212132236920297
Validation loss = 0.0005179989384487271
Validation loss = 0.0006778523093089461
Validation loss = 0.0004182648262940347
Validation loss = 0.0004630838811863214
Validation loss = 0.0004540180671028793
Validation loss = 0.0005457845982164145
Validation loss = 0.0005277892923913896
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005595492548309267
Validation loss = 0.001409601652994752
Validation loss = 0.000377902906620875
Validation loss = 0.0005580568104051054
Validation loss = 0.00039268904947675765
Validation loss = 0.0005272561102174222
Validation loss = 0.0010759640717878938
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00051555794198066
Validation loss = 0.000522093556355685
Validation loss = 0.0006072227843105793
Validation loss = 0.0004933385062031448
Validation loss = 0.0005323346122168005
Validation loss = 0.0003828588523901999
Validation loss = 0.0006221806979738176
Validation loss = 0.0007079344359226525
Validation loss = 0.0005334651214070618
Validation loss = 0.0003992353449575603
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005039504030719399
Validation loss = 0.0005310612614266574
Validation loss = 0.0005520244012586772
Validation loss = 0.0003760279796551913
Validation loss = 0.0005756529862992465
Validation loss = 0.0004083929816260934
Validation loss = 0.0007084945100359619
Validation loss = 0.0004585006390698254
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005621863529086113
Validation loss = 0.0004833883431274444
Validation loss = 0.0005573845119215548
Validation loss = 0.0005633047549054027
Validation loss = 0.000510897021740675
Validation loss = 0.00040242195245809853
Validation loss = 0.0005481495754793286
Validation loss = 0.00045779580250382423
Validation loss = 0.0003964044153690338
Validation loss = 0.0005730216507799923
Validation loss = 0.0005232247640378773
Validation loss = 0.00040048573282547295
Validation loss = 0.0005103148287162185
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.64    |
| Iteration     | 66       |
| MaximumReturn | -2.37    |
| MinimumReturn | -77.3    |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004502884403336793
Validation loss = 0.0003470342780929059
Validation loss = 0.0004265850584488362
Validation loss = 0.0007549362489953637
Validation loss = 0.0004767987411469221
Validation loss = 0.00043317556264810264
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00038982831756584346
Validation loss = 0.0005656955763697624
Validation loss = 0.0005151747609488666
Validation loss = 0.00048140730359591544
Validation loss = 0.0004634311771951616
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0004479564377106726
Validation loss = 0.0004926292458549142
Validation loss = 0.0003937707224395126
Validation loss = 0.00036435635411180556
Validation loss = 0.0004452908760868013
Validation loss = 0.0005283923237584531
Validation loss = 0.0005067855818197131
Validation loss = 0.0003304780402686447
Validation loss = 0.0005290050758048892
Validation loss = 0.000496680208016187
Validation loss = 0.0005071725463494658
Validation loss = 0.0004884201916866004
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000623635423835367
Validation loss = 0.00046780516277067363
Validation loss = 0.000639369070995599
Validation loss = 0.0003755795769393444
Validation loss = 0.0006056051352061331
Validation loss = 0.00040564246592111886
Validation loss = 0.00040073462878353894
Validation loss = 0.0006428073975257576
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0003765204455703497
Validation loss = 0.0005138102569617331
Validation loss = 0.0007813335978426039
Validation loss = 0.0003891575033776462
Validation loss = 0.0004389322712086141
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -36.2    |
| Iteration     | 67       |
| MaximumReturn | -0.00253 |
| MinimumReturn | -103     |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004622651031240821
Validation loss = 0.00041128011071123183
Validation loss = 0.00047400163020938635
Validation loss = 0.00047153563355095685
Validation loss = 0.0006271845195442438
Validation loss = 0.00047375241410918534
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00043364966404624283
Validation loss = 0.0007977670175023377
Validation loss = 0.0006279576919041574
Validation loss = 0.0004348249640315771
Validation loss = 0.00047171380720101297
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006176136084832251
Validation loss = 0.0003722573455888778
Validation loss = 0.0004435534356161952
Validation loss = 0.0004378999292384833
Validation loss = 0.00037564372178167105
Validation loss = 0.0004088082059752196
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00044368483941070735
Validation loss = 0.0007383443298749626
Validation loss = 0.000418884155806154
Validation loss = 0.0004565723647829145
Validation loss = 0.00047049892600625753
Validation loss = 0.0012489981018006802
Validation loss = 0.000540795037522912
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00049402448348701
Validation loss = 0.0006199761410243809
Validation loss = 0.000558741157874465
Validation loss = 0.0005862067337147892
Validation loss = 0.0005370807484723628
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0871   |
| Iteration     | 68        |
| MaximumReturn | -0.000674 |
| MinimumReturn | -1.89     |
| TotalSamples  | 116620    |
-----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005733416182920337
Validation loss = 0.0004589713062159717
Validation loss = 0.0005501770065166056
Validation loss = 0.00048731183051131666
Validation loss = 0.0005699129542335868
Validation loss = 0.0006347529706545174
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0004965324769727886
Validation loss = 0.0005187084898352623
Validation loss = 0.0005604808684438467
Validation loss = 0.0005027238512411714
Validation loss = 0.0004972494789399207
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0004718257987406105
Validation loss = 0.0005311764543876052
Validation loss = 0.00040928550879471004
Validation loss = 0.00037283211713656783
Validation loss = 0.0006269775331020355
Validation loss = 0.000507481163367629
Validation loss = 0.00044268896454013884
Validation loss = 0.00048767757834866643
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005993688828311861
Validation loss = 0.0005397285567596555
Validation loss = 0.0005248140660114586
Validation loss = 0.0005774669698439538
Validation loss = 0.0004310188814997673
Validation loss = 0.0006502738106064498
Validation loss = 0.0005125878960825503
Validation loss = 0.000461477815406397
Validation loss = 0.0005542648141272366
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00041816002340056
Validation loss = 0.0005043299752287567
Validation loss = 0.000508445780724287
Validation loss = 0.0006513417465612292
Validation loss = 0.0004753966350108385
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -3.42     |
| Iteration     | 69        |
| MaximumReturn | -0.000774 |
| MinimumReturn | -83.6     |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00046508549712598324
Validation loss = 0.0005448408774100244
Validation loss = 0.0005775251192972064
Validation loss = 0.0004149583401158452
Validation loss = 0.00046039873268455267
Validation loss = 0.0005020791431888938
Validation loss = 0.00052436109399423
Validation loss = 0.0004989985027350485
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00048345475806854665
Validation loss = 0.0005505182780325413
Validation loss = 0.0010551162995398045
Validation loss = 0.00044696161057800055
Validation loss = 0.0004038672486785799
Validation loss = 0.0005159052670933306
Validation loss = 0.00042448571184650064
Validation loss = 0.0005433452897705138
Validation loss = 0.0003864311729557812
Validation loss = 0.0005658006994053721
Validation loss = 0.0006611159187741578
Validation loss = 0.0005197959835641086
Validation loss = 0.00040817222907207906
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00041785161010921
Validation loss = 0.0004897305043414235
Validation loss = 0.0003501439932733774
Validation loss = 0.00045631389366462827
Validation loss = 0.000667824933771044
Validation loss = 0.00040765927406027913
Validation loss = 0.00045784152462147176
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007408348028548062
Validation loss = 0.0005028870073147118
Validation loss = 0.0006040583830326796
Validation loss = 0.0005212381947785616
Validation loss = 0.00047900641220621765
Validation loss = 0.00045272032730281353
Validation loss = 0.0005467412993311882
Validation loss = 0.00045123763266019523
Validation loss = 0.0006217925110831857
Validation loss = 0.00043025161721743643
Validation loss = 0.0005020643584430218
Validation loss = 0.00039674772415310144
Validation loss = 0.00043989132973365486
Validation loss = 0.0006444401224143803
Validation loss = 0.00044597344822250307
Validation loss = 0.0003649506252259016
Validation loss = 0.0005254768766462803
Validation loss = 0.0006096552242524922
Validation loss = 0.00043659310904331505
Validation loss = 0.0005370028084143996
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000500150432344526
Validation loss = 0.0014618182322010398
Validation loss = 0.00044279778376221657
Validation loss = 0.0005495897494256496
Validation loss = 0.0006262431270442903
Validation loss = 0.0005814936594106257
Validation loss = 0.000592763302847743
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -36      |
| Iteration     | 70       |
| MaximumReturn | -1.74    |
| MinimumReturn | -144     |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013374064583331347
Validation loss = 0.000380314770154655
Validation loss = 0.0005572641384787858
Validation loss = 0.00045123492600396276
Validation loss = 0.0004784781485795975
Validation loss = 0.0003717447107192129
Validation loss = 0.0004121006350032985
Validation loss = 0.00040347769390791655
Validation loss = 0.0004952189628966153
Validation loss = 0.00041473880992271006
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005507654859684408
Validation loss = 0.0004655461816582829
Validation loss = 0.0004418453318066895
Validation loss = 0.000410971901146695
Validation loss = 0.0005631098756566644
Validation loss = 0.0005250623216852546
Validation loss = 0.0003933180414605886
Validation loss = 0.0007520950748585165
Validation loss = 0.0004299008578527719
Validation loss = 0.0004688164044637233
Validation loss = 0.00042618621955625713
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00038353566196747124
Validation loss = 0.00038242092705331743
Validation loss = 0.0004767938517034054
Validation loss = 0.0005561906727962196
Validation loss = 0.00039609239320270717
Validation loss = 0.00045347079867497087
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0004333812103141099
Validation loss = 0.0005274644936434925
Validation loss = 0.0005957699613645673
Validation loss = 0.0004041088104713708
Validation loss = 0.00042522867443040013
Validation loss = 0.0005271973786875606
Validation loss = 0.00038874949677847326
Validation loss = 0.0005316552123986185
Validation loss = 0.0004696916148532182
Validation loss = 0.0003638278285507113
Validation loss = 0.0007172170444391668
Validation loss = 0.00033731485018506646
Validation loss = 0.00041427419637329876
Validation loss = 0.0005088869947940111
Validation loss = 0.0006279449444264174
Validation loss = 0.00043434512917883694
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007017872994765639
Validation loss = 0.00039351999294012785
Validation loss = 0.0005300891934894025
Validation loss = 0.00046957569429650903
Validation loss = 0.0003787526802625507
Validation loss = 0.0005211158422753215
Validation loss = 0.000581040745601058
Validation loss = 0.0004081193183083087
Validation loss = 0.00044887137482874095
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.7    |
| Iteration     | 71       |
| MaximumReturn | -1.31    |
| MinimumReturn | -99.8    |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00041799552855081856
Validation loss = 0.0004917121841572225
Validation loss = 0.0005857898504473269
Validation loss = 0.0003555458097252995
Validation loss = 0.0004411212576087564
Validation loss = 0.00046999979531392455
Validation loss = 0.00047185903531499207
Validation loss = 0.0004663986328523606
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005717735039070249
Validation loss = 0.0004238228139001876
Validation loss = 0.0003900012234225869
Validation loss = 0.00037016684655100107
Validation loss = 0.00039476066012866795
Validation loss = 0.0004952131421305239
Validation loss = 0.0005259585450403392
Validation loss = 0.0004920393694192171
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00045812310418114066
Validation loss = 0.00045054900692775846
Validation loss = 0.0003462696331553161
Validation loss = 0.00037862471072003245
Validation loss = 0.0006013067322783172
Validation loss = 0.00048035415238700807
Validation loss = 0.00047340558376163244
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0004417177406139672
Validation loss = 0.0003978176391683519
Validation loss = 0.0003815671370830387
Validation loss = 0.0003831473586615175
Validation loss = 0.0004805009812116623
Validation loss = 0.0003408930788282305
Validation loss = 0.0004641527484636754
Validation loss = 0.00038994645001366735
Validation loss = 0.00037209640140645206
Validation loss = 0.00034369839704595506
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008751084678806365
Validation loss = 0.00042421245598234236
Validation loss = 0.0004646328161470592
Validation loss = 0.0004920851206406951
Validation loss = 0.0006231751176528633
Validation loss = 0.0004361472965683788
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.23    |
| Iteration     | 72       |
| MaximumReturn | -1.4     |
| MinimumReturn | -87      |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00037692306796088815
Validation loss = 0.00045278447214514017
Validation loss = 0.0007610536995343864
Validation loss = 0.0005427640862762928
Validation loss = 0.00036793778417631984
Validation loss = 0.0004210163315292448
Validation loss = 0.00040871204691939056
Validation loss = 0.0006070525269024074
Validation loss = 0.00037796408287249506
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00040511618135496974
Validation loss = 0.00033951670047827065
Validation loss = 0.0004909686977043748
Validation loss = 0.00034670214517973363
Validation loss = 0.00046611594734713435
Validation loss = 0.000545741815585643
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00035391608253121376
Validation loss = 0.00039257199387066066
Validation loss = 0.0004733514797408134
Validation loss = 0.0003823479055427015
Validation loss = 0.0004557896754704416
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00040705586434341967
Validation loss = 0.00046125275548547506
Validation loss = 0.0003125749935861677
Validation loss = 0.00033108636853285134
Validation loss = 0.0004009685362689197
Validation loss = 0.0003399800043553114
Validation loss = 0.00032304596970789135
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000405710656195879
Validation loss = 0.00038405274972319603
Validation loss = 0.00043427880154922605
Validation loss = 0.0004729607899207622
Validation loss = 0.00044974213233217597
Validation loss = 0.0005513007054105401
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.72    |
| Iteration     | 73       |
| MaximumReturn | -0.0015  |
| MinimumReturn | -38.2    |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00039138755528256297
Validation loss = 0.0009450821089558303
Validation loss = 0.00037365814205259085
Validation loss = 0.00038770673563703895
Validation loss = 0.0004131115274503827
Validation loss = 0.0005753282457590103
Validation loss = 0.0006534211570397019
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0004171213076915592
Validation loss = 0.0005265265936031938
Validation loss = 0.0004269641067367047
Validation loss = 0.00033518881537020206
Validation loss = 0.00039841519901528955
Validation loss = 0.0004535079642664641
Validation loss = 0.00040979875484481454
Validation loss = 0.00036969652865082026
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0003686805139295757
Validation loss = 0.0003800065314862877
Validation loss = 0.0004710327775683254
Validation loss = 0.0005439282394945621
Validation loss = 0.00037984628579579294
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0004184400022495538
Validation loss = 0.00036969417124055326
Validation loss = 0.0003605513775255531
Validation loss = 0.00036712680594064295
Validation loss = 0.0005203224718570709
Validation loss = 0.00038586126174777746
Validation loss = 0.0005392166785895824
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006250376463867724
Validation loss = 0.0005452082259580493
Validation loss = 0.00040390764479525387
Validation loss = 0.0005918570677749813
Validation loss = 0.00045234287972562015
Validation loss = 0.0004296944825910032
Validation loss = 0.00043522700434550643
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -59.9    |
| Iteration     | 74       |
| MaximumReturn | -1.44    |
| MinimumReturn | -140     |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00035965570714324713
Validation loss = 0.00042974413372576237
Validation loss = 0.0005694977007806301
Validation loss = 0.0003523097839206457
Validation loss = 0.0004815567808691412
Validation loss = 0.0005313524743542075
Validation loss = 0.0004150283057242632
Validation loss = 0.0004110200679861009
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0003584432415664196
Validation loss = 0.000448505423264578
Validation loss = 0.00042149241198785603
Validation loss = 0.0004784171178471297
Validation loss = 0.0005168431671336293
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005368887796066701
Validation loss = 0.0004164732526987791
Validation loss = 0.00037420462467707694
Validation loss = 0.00031650648452341557
Validation loss = 0.0003779219405259937
Validation loss = 0.00045562227023765445
Validation loss = 0.0003593097208067775
Validation loss = 0.00044481203076429665
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00044529110891744494
Validation loss = 0.00032373194699175656
Validation loss = 0.0005439394735731184
Validation loss = 0.00033402914414182305
Validation loss = 0.0004451787972357124
Validation loss = 0.0003802184364758432
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0003847814805340022
Validation loss = 0.0003922118339687586
Validation loss = 0.00041785254143178463
Validation loss = 0.0005153802921995521
Validation loss = 0.0004806812503375113
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.121    |
| Iteration     | 75        |
| MaximumReturn | -0.000623 |
| MinimumReturn | -2.31     |
| TotalSamples  | 128282    |
-----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004595379577949643
Validation loss = 0.00040736928349360824
Validation loss = 0.0005306939128786325
Validation loss = 0.0004893336445093155
Validation loss = 0.0005402729148045182
Validation loss = 0.00035763659980148077
Validation loss = 0.0004322351887822151
Validation loss = 0.0004792452964466065
Validation loss = 0.0004486686666496098
Validation loss = 0.00036077824188396335
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0003766886657103896
Validation loss = 0.0003945420030504465
Validation loss = 0.0004290800716262311
Validation loss = 0.000459677423350513
Validation loss = 0.0006496146088466048
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00041920196963474154
Validation loss = 0.0004129161243326962
Validation loss = 0.0004449870903044939
Validation loss = 0.0004460221971385181
Validation loss = 0.00041126227006316185
Validation loss = 0.00045556860277429223
Validation loss = 0.00037785014137625694
Validation loss = 0.00045592006063088775
Validation loss = 0.0003589825064409524
Validation loss = 0.0004446070524863899
Validation loss = 0.00033472751965746284
Validation loss = 0.0004961280501447618
Validation loss = 0.0003496440185699612
Validation loss = 0.00042809927253983915
Validation loss = 0.0004089489229954779
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006824815645813942
Validation loss = 0.00036847498267889023
Validation loss = 0.0003498030127957463
Validation loss = 0.00046164492960087955
Validation loss = 0.00034435890847817063
Validation loss = 0.00039607242797501385
Validation loss = 0.00046094975550659
Validation loss = 0.00035542159457691014
Validation loss = 0.0003617967595346272
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00043699319940060377
Validation loss = 0.00037267053266987205
Validation loss = 0.0004835691652260721
Validation loss = 0.0005085730226710439
Validation loss = 0.0004948588320985436
Validation loss = 0.0004017940373159945
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.029    |
| Iteration     | 76        |
| MaximumReturn | -0.000637 |
| MinimumReturn | -0.694    |
| TotalSamples  | 129948    |
-----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004227004246786237
Validation loss = 0.00046405219472944736
Validation loss = 0.0004918281338177621
Validation loss = 0.0005080011324025691
Validation loss = 0.0003996501909568906
Validation loss = 0.00032366669620387256
Validation loss = 0.0004018629842903465
Validation loss = 0.0003775409422814846
Validation loss = 0.0004758449795190245
Validation loss = 0.000368064473150298
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005254765856079757
Validation loss = 0.00041129300370812416
Validation loss = 0.00031792864319868386
Validation loss = 0.00039704490336589515
Validation loss = 0.0004651460039895028
Validation loss = 0.00040752574568614364
Validation loss = 0.0003717159270308912
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00045538440463133156
Validation loss = 0.0004121165839023888
Validation loss = 0.00043319351971149445
Validation loss = 0.00041209926712326705
Validation loss = 0.0005387193523347378
Validation loss = 0.00036184428608976305
Validation loss = 0.0003715389466378838
Validation loss = 0.00047488551354035735
Validation loss = 0.0004137264331802726
Validation loss = 0.0003760820545721799
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00042247833334840834
Validation loss = 0.00034913973649963737
Validation loss = 0.0005074187647551298
Validation loss = 0.00039823458064347506
Validation loss = 0.0009154471918009222
Validation loss = 0.00036313539021648467
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004054799792356789
Validation loss = 0.0003755929647013545
Validation loss = 0.00038386613596230745
Validation loss = 0.0003984607756137848
Validation loss = 0.0006494205445051193
Validation loss = 0.000387248961487785
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.63    |
| Iteration     | 77       |
| MaximumReturn | -0.00056 |
| MinimumReturn | -65.1    |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004009150143247098
Validation loss = 0.0003372363280504942
Validation loss = 0.00035984444548375905
Validation loss = 0.0004254317900631577
Validation loss = 0.00037784239975735545
Validation loss = 0.00040186525438912213
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0004744464822579175
Validation loss = 0.0005850587622262537
Validation loss = 0.0004518458736129105
Validation loss = 0.0003456440463196486
Validation loss = 0.0007295478717423975
Validation loss = 0.00033047350007109344
Validation loss = 0.0004431688576005399
Validation loss = 0.00037086900556460023
Validation loss = 0.0005137989646755159
Validation loss = 0.00043253274634480476
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0004415038274601102
Validation loss = 0.00034663337282836437
Validation loss = 0.00042736047180369496
Validation loss = 0.0003734476922545582
Validation loss = 0.00045092450454831123
Validation loss = 0.0003451770462561399
Validation loss = 0.00047307717613875866
Validation loss = 0.0006044582114554942
Validation loss = 0.00047841802006587386
Validation loss = 0.0003571920969989151
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0003943765477743
Validation loss = 0.0003985016082879156
Validation loss = 0.00048446058644913137
Validation loss = 0.00042506580939516425
Validation loss = 0.00036019989056512713
Validation loss = 0.00033994120894931257
Validation loss = 0.00031452986877411604
Validation loss = 0.00037008259096182883
Validation loss = 0.00036837212974205613
Validation loss = 0.0003417223342694342
Validation loss = 0.0003485280030872673
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00035030252183787525
Validation loss = 0.0004412170092109591
Validation loss = 0.00039711067802272737
Validation loss = 0.0004216085944790393
Validation loss = 0.0003948044322896749
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.203    |
| Iteration     | 78        |
| MaximumReturn | -0.000834 |
| MinimumReturn | -1.07     |
| TotalSamples  | 133280    |
-----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0003559948818292469
Validation loss = 0.00039698247564956546
Validation loss = 0.00037622268428094685
Validation loss = 0.0005181649466976523
Validation loss = 0.00043270387686789036
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0004253640945535153
Validation loss = 0.00037496539880521595
Validation loss = 0.0008189811487682164
Validation loss = 0.0003607641556300223
Validation loss = 0.0003597512550186366
Validation loss = 0.000650887144729495
Validation loss = 0.00031697083613835275
Validation loss = 0.0003367602766957134
Validation loss = 0.00037446568603627384
Validation loss = 0.0003526049549691379
Validation loss = 0.0004316976119298488
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0003929290978703648
Validation loss = 0.0003752949123736471
Validation loss = 0.000371127447579056
Validation loss = 0.0003511567774694413
Validation loss = 0.0003198978374712169
Validation loss = 0.00036471927887760103
Validation loss = 0.0003593194705899805
Validation loss = 0.0003133020072709769
Validation loss = 0.0004337213176768273
Validation loss = 0.0003694787446875125
Validation loss = 0.0003350981278344989
Validation loss = 0.00033068881020881236
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005765047972090542
Validation loss = 0.0004637025995180011
Validation loss = 0.00042344327084720135
Validation loss = 0.00038295326521620154
Validation loss = 0.0003743927809409797
Validation loss = 0.0005650683306157589
Validation loss = 0.00039262251812033355
Validation loss = 0.00039388934965245426
Validation loss = 0.0003728880255948752
Validation loss = 0.0003194475721102208
Validation loss = 0.0006067397189326584
Validation loss = 0.00039077422115951777
Validation loss = 0.00043912880937568843
Validation loss = 0.000410198699682951
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00036111054942011833
Validation loss = 0.00040457374416291714
Validation loss = 0.00046096849837340415
Validation loss = 0.000399805954657495
Validation loss = 0.00041502551175653934
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -40      |
| Iteration     | 79       |
| MaximumReturn | -0.00156 |
| MinimumReturn | -166     |
| TotalSamples  | 134946   |
----------------------------
