Logging to experiments/gym_cheetahO01/gym_cheetahO01/Fri-28-Oct-2022-08-59-10-PM-CDT_gym_cheetahO01_trpo_iteration_20_seed2314
Print configuration .....
{'env_name': 'gym_cheetahO01', 'random_seeds': [4321, 2314, 2341, 3421], 'save_variables': False, 'model_save_dir': '/tmp/gym_cheetahO01_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.536567747592926
Validation loss = 0.22676938772201538
Validation loss = 0.18590137362480164
Validation loss = 0.1691143810749054
Validation loss = 0.16434818506240845
Validation loss = 0.16197934746742249
Validation loss = 0.16496646404266357
Validation loss = 0.16577816009521484
Validation loss = 0.25141727924346924
Validation loss = 0.17635858058929443
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5082875490188599
Validation loss = 0.22806110978126526
Validation loss = 0.18097776174545288
Validation loss = 0.17131942510604858
Validation loss = 0.1682935357093811
Validation loss = 0.16472914814949036
Validation loss = 0.165939599275589
Validation loss = 0.17872074246406555
Validation loss = 0.1760813295841217
Validation loss = 0.17037394642829895
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.46177586913108826
Validation loss = 0.23248276114463806
Validation loss = 0.18095849454402924
Validation loss = 0.17051398754119873
Validation loss = 0.16634458303451538
Validation loss = 0.17323866486549377
Validation loss = 0.1669723391532898
Validation loss = 0.17156144976615906
Validation loss = 0.18667449057102203
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4839484691619873
Validation loss = 0.22935807704925537
Validation loss = 0.1790652573108673
Validation loss = 0.16933435201644897
Validation loss = 0.16838672757148743
Validation loss = 0.17055124044418335
Validation loss = 0.18606920540332794
Validation loss = 0.178018718957901
Validation loss = 0.1826019287109375
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6621328592300415
Validation loss = 0.23357194662094116
Validation loss = 0.183782160282135
Validation loss = 0.17248518764972687
Validation loss = 0.17071017622947693
Validation loss = 0.1683664321899414
Validation loss = 0.1659233272075653
Validation loss = 0.17647619545459747
Validation loss = 0.1774473488330841
Validation loss = 0.1774454414844513
Validation loss = 0.17291390895843506
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -329     |
| Iteration     | 0        |
| MaximumReturn | -282     |
| MinimumReturn | -369     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.20999741554260254
Validation loss = 0.1657540202140808
Validation loss = 0.16856683790683746
Validation loss = 0.16622066497802734
Validation loss = 0.1717495322227478
Validation loss = 0.1787588894367218
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2068483531475067
Validation loss = 0.16959285736083984
Validation loss = 0.167731374502182
Validation loss = 0.1681235134601593
Validation loss = 0.17611375451087952
Validation loss = 0.16670118272304535
Validation loss = 0.16763103008270264
Validation loss = 0.16901516914367676
Validation loss = 0.18430134654045105
Validation loss = 0.18792277574539185
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.21192310750484467
Validation loss = 0.16526857018470764
Validation loss = 0.16286404430866241
Validation loss = 0.16350722312927246
Validation loss = 0.16446635127067566
Validation loss = 0.17472343146800995
Validation loss = 0.16946612298488617
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2161274403333664
Validation loss = 0.1672719419002533
Validation loss = 0.1654796153306961
Validation loss = 0.16954493522644043
Validation loss = 0.17140845954418182
Validation loss = 0.16503271460533142
Validation loss = 0.16575448215007782
Validation loss = 0.16613757610321045
Validation loss = 0.19641399383544922
Validation loss = 0.1696213036775589
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.20922890305519104
Validation loss = 0.16793367266654968
Validation loss = 0.17884355783462524
Validation loss = 0.17371350526809692
Validation loss = 0.17404934763908386
Validation loss = 0.17025098204612732
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -39.2    |
| Iteration     | 1        |
| MaximumReturn | -14.4    |
| MinimumReturn | -71.2    |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16919201612472534
Validation loss = 0.16366885602474213
Validation loss = 0.18787173926830292
Validation loss = 0.16184769570827484
Validation loss = 0.16455216705799103
Validation loss = 0.16932493448257446
Validation loss = 0.16363392770290375
Validation loss = 0.18269126117229462
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.17266516387462616
Validation loss = 0.16745829582214355
Validation loss = 0.17524302005767822
Validation loss = 0.16619621217250824
Validation loss = 0.1686335653066635
Validation loss = 0.17108185589313507
Validation loss = 0.17395512759685516
Validation loss = 0.18047355115413666
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16953541338443756
Validation loss = 0.1622806340456009
Validation loss = 0.16155187785625458
Validation loss = 0.16498881578445435
Validation loss = 0.1744566559791565
Validation loss = 0.16734953224658966
Validation loss = 0.16560731828212738
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.17143239080905914
Validation loss = 0.16462992131710052
Validation loss = 0.16636024415493011
Validation loss = 0.16656458377838135
Validation loss = 0.16835443675518036
Validation loss = 0.16922350227832794
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1715431660413742
Validation loss = 0.16065335273742676
Validation loss = 0.16051152348518372
Validation loss = 0.1626027375459671
Validation loss = 0.16928990185260773
Validation loss = 0.16751837730407715
Validation loss = 0.16593748331069946
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 276      |
| Iteration     | 2        |
| MaximumReturn | 499      |
| MinimumReturn | -244     |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1778727024793625
Validation loss = 0.16891004145145416
Validation loss = 0.1683954894542694
Validation loss = 0.21399490535259247
Validation loss = 0.1697159707546234
Validation loss = 0.17931252717971802
Validation loss = 0.16727110743522644
Validation loss = 0.17995619773864746
Validation loss = 0.17732584476470947
Validation loss = 0.17334873974323273
Validation loss = 0.17384183406829834
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1786818653345108
Validation loss = 0.17447446286678314
Validation loss = 0.17053984105587006
Validation loss = 0.1675288826227188
Validation loss = 0.17297562956809998
Validation loss = 0.17342683672904968
Validation loss = 0.17262598872184753
Validation loss = 0.1840958595275879
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.17738544940948486
Validation loss = 0.174628347158432
Validation loss = 0.16774451732635498
Validation loss = 0.17610660195350647
Validation loss = 0.17078319191932678
Validation loss = 0.17297139763832092
Validation loss = 0.1817716360092163
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.17751792073249817
Validation loss = 0.1701817512512207
Validation loss = 0.1871064305305481
Validation loss = 0.17003372311592102
Validation loss = 0.17402884364128113
Validation loss = 0.19661495089530945
Validation loss = 0.1854354441165924
Validation loss = 0.17104411125183105
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.17516712844371796
Validation loss = 0.17935782670974731
Validation loss = 0.17001691460609436
Validation loss = 0.16788652539253235
Validation loss = 0.17160199582576752
Validation loss = 0.18036654591560364
Validation loss = 0.17006734013557434
Validation loss = 0.17378900945186615
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 117      |
| Iteration     | 3        |
| MaximumReturn | 871      |
| MinimumReturn | -493     |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.17807289958000183
Validation loss = 0.17497919499874115
Validation loss = 0.17826895415782928
Validation loss = 0.18779827654361725
Validation loss = 0.1737014353275299
Validation loss = 0.18096399307250977
Validation loss = 0.1802130937576294
Validation loss = 0.17476089298725128
Validation loss = 0.17148450016975403
Validation loss = 0.17389142513275146
Validation loss = 0.17808082699775696
Validation loss = 0.1768488585948944
Validation loss = 0.17920231819152832
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.17406146228313446
Validation loss = 0.1724378764629364
Validation loss = 0.17384590208530426
Validation loss = 0.1717231273651123
Validation loss = 0.17585770785808563
Validation loss = 0.17385154962539673
Validation loss = 0.17797373235225677
Validation loss = 0.18193092942237854
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.17647773027420044
Validation loss = 0.16896572709083557
Validation loss = 0.21315324306488037
Validation loss = 0.17023946344852448
Validation loss = 0.17763975262641907
Validation loss = 0.17571106553077698
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.17581579089164734
Validation loss = 0.1756560504436493
Validation loss = 0.17221763730049133
Validation loss = 0.17842990159988403
Validation loss = 0.17091786861419678
Validation loss = 0.17421607673168182
Validation loss = 0.17520146071910858
Validation loss = 0.17506548762321472
Validation loss = 0.17123785614967346
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.17948324978351593
Validation loss = 0.1702020913362503
Validation loss = 0.16996291279792786
Validation loss = 0.176425963640213
Validation loss = 0.17110344767570496
Validation loss = 0.17486877739429474
Validation loss = 0.17628340423107147
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 308      |
| Iteration     | 4        |
| MaximumReturn | 1.37e+03 |
| MinimumReturn | -498     |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.18699628114700317
Validation loss = 0.18408946692943573
Validation loss = 0.1868257075548172
Validation loss = 0.1857471913099289
Validation loss = 0.1878928393125534
Validation loss = 0.19117559492588043
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.18612228333950043
Validation loss = 0.1888883262872696
Validation loss = 0.1892932504415512
Validation loss = 0.18592502176761627
Validation loss = 0.18523530662059784
Validation loss = 0.18789173662662506
Validation loss = 0.18685279786586761
Validation loss = 0.18714500963687897
Validation loss = 0.18653859198093414
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.18310242891311646
Validation loss = 0.180905282497406
Validation loss = 0.18295453488826752
Validation loss = 0.18372423946857452
Validation loss = 0.1867462396621704
Validation loss = 0.18740969896316528
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.18359428644180298
Validation loss = 0.18453510105609894
Validation loss = 0.18521344661712646
Validation loss = 0.18339549005031586
Validation loss = 0.18293552100658417
Validation loss = 0.1859017014503479
Validation loss = 0.1867130845785141
Validation loss = 0.18612287938594818
Validation loss = 0.1910007745027542
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.18709798157215118
Validation loss = 0.18288783729076385
Validation loss = 0.18398340046405792
Validation loss = 0.18340373039245605
Validation loss = 0.18208324909210205
Validation loss = 0.1929711103439331
Validation loss = 0.1949140578508377
Validation loss = 0.184014692902565
Validation loss = 0.19084040820598602
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -412     |
| Iteration     | 5        |
| MaximumReturn | 490      |
| MinimumReturn | -688     |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.18918582797050476
Validation loss = 0.1878965198993683
Validation loss = 0.19085702300071716
Validation loss = 0.1897340714931488
Validation loss = 0.18938341736793518
Validation loss = 0.1961091011762619
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.18912295997142792
Validation loss = 0.18556204438209534
Validation loss = 0.18859796226024628
Validation loss = 0.18955229222774506
Validation loss = 0.1903698444366455
Validation loss = 0.1965617686510086
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.18906117975711823
Validation loss = 0.1841372400522232
Validation loss = 0.1902015060186386
Validation loss = 0.18883256614208221
Validation loss = 0.19621071219444275
Validation loss = 0.18930725753307343
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.18958111107349396
Validation loss = 0.18796463310718536
Validation loss = 0.18893465399742126
Validation loss = 0.1864388883113861
Validation loss = 0.18718180060386658
Validation loss = 0.18875554203987122
Validation loss = 0.18945014476776123
Validation loss = 0.18970395624637604
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.18706654012203217
Validation loss = 0.18641631305217743
Validation loss = 0.1880510300397873
Validation loss = 0.18435904383659363
Validation loss = 0.1887635886669159
Validation loss = 0.19338183104991913
Validation loss = 0.2052547037601471
Validation loss = 0.18778382241725922
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 319      |
| Iteration     | 6        |
| MaximumReturn | 1.42e+03 |
| MinimumReturn | -464     |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.18358114361763
Validation loss = 0.182425856590271
Validation loss = 0.1819334626197815
Validation loss = 0.1861337423324585
Validation loss = 0.18350714445114136
Validation loss = 0.18353156745433807
Validation loss = 0.18444401025772095
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.18421536684036255
Validation loss = 0.18142977356910706
Validation loss = 0.1925225406885147
Validation loss = 0.18118219077587128
Validation loss = 0.1854512244462967
Validation loss = 0.18474265933036804
Validation loss = 0.186061829328537
Validation loss = 0.18764932453632355
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1799265444278717
Validation loss = 0.17797034978866577
Validation loss = 0.1816704273223877
Validation loss = 0.18005922436714172
Validation loss = 0.1799435168504715
Validation loss = 0.18202486634254456
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1828971803188324
Validation loss = 0.17992115020751953
Validation loss = 0.1828032284975052
Validation loss = 0.1812887191772461
Validation loss = 0.18380051851272583
Validation loss = 0.18352264165878296
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.19018828868865967
Validation loss = 0.17839795351028442
Validation loss = 0.18276752531528473
Validation loss = 0.18159809708595276
Validation loss = 0.1870109885931015
Validation loss = 0.1823837161064148
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -28.7    |
| Iteration     | 7        |
| MaximumReturn | 1.7e+03  |
| MinimumReturn | -510     |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1831640899181366
Validation loss = 0.18786843121051788
Validation loss = 0.18338744342327118
Validation loss = 0.18608133494853973
Validation loss = 0.18125684559345245
Validation loss = 0.1829177737236023
Validation loss = 0.18478664755821228
Validation loss = 0.1835690140724182
Validation loss = 0.1850673258304596
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.18687951564788818
Validation loss = 0.18314197659492493
Validation loss = 0.1841403692960739
Validation loss = 0.18647931516170502
Validation loss = 0.18703575432300568
Validation loss = 0.18522188067436218
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.18169036507606506
Validation loss = 0.18039417266845703
Validation loss = 0.18118725717067719
Validation loss = 0.1800590604543686
Validation loss = 0.18156535923480988
Validation loss = 0.1833980530500412
Validation loss = 0.18793848156929016
Validation loss = 0.18682903051376343
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.18315233290195465
Validation loss = 0.178706556558609
Validation loss = 0.1869533210992813
Validation loss = 0.18286985158920288
Validation loss = 0.18354104459285736
Validation loss = 0.18387505412101746
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.17862460017204285
Validation loss = 0.1803215593099594
Validation loss = 0.18349795043468475
Validation loss = 0.18609054386615753
Validation loss = 0.1873904913663864
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -152     |
| Iteration     | 8        |
| MaximumReturn | 1.11e+03 |
| MinimumReturn | -618     |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2590472996234894
Validation loss = 0.2681466341018677
Validation loss = 0.2940770983695984
Validation loss = 0.2742905020713806
Validation loss = 0.29425063729286194
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.228214293718338
Validation loss = 0.2410796582698822
Validation loss = 0.2527305483818054
Validation loss = 0.25677359104156494
Validation loss = 0.25664472579956055
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2328978329896927
Validation loss = 0.24363812804222107
Validation loss = 0.25319546461105347
Validation loss = 0.25113439559936523
Validation loss = 0.2808206081390381
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.22927403450012207
Validation loss = 0.22961027920246124
Validation loss = 0.2497376948595047
Validation loss = 0.2518477737903595
Validation loss = 0.2544096112251282
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.23824052512645721
Validation loss = 0.24701285362243652
Validation loss = 0.24998769164085388
Validation loss = 0.26305755972862244
Validation loss = 0.2642880082130432
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 215      |
| Iteration     | 9        |
| MaximumReturn | 939      |
| MinimumReturn | -429     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.25513413548469543
Validation loss = 0.27400460839271545
Validation loss = 0.26343896985054016
Validation loss = 0.29338330030441284
Validation loss = 0.29518863558769226
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2343432903289795
Validation loss = 0.25055992603302
Validation loss = 0.2436199188232422
Validation loss = 0.25315919518470764
Validation loss = 0.2478068321943283
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.23443178832530975
Validation loss = 0.2569873631000519
Validation loss = 0.2603782117366791
Validation loss = 0.2680564224720001
Validation loss = 0.27758052945137024
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2300831824541092
Validation loss = 0.2463451474905014
Validation loss = 0.24286140501499176
Validation loss = 0.23519276082515717
Validation loss = 0.24030809104442596
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.24472030997276306
Validation loss = 0.2512320876121521
Validation loss = 0.25082170963287354
Validation loss = 0.26751068234443665
Validation loss = 0.2538924217224121
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -118     |
| Iteration     | 10       |
| MaximumReturn | 671      |
| MinimumReturn | -701     |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.18203513324260712
Validation loss = 0.1770642250776291
Validation loss = 0.1795579195022583
Validation loss = 0.1795155555009842
Validation loss = 0.18076522648334503
Validation loss = 0.1796032190322876
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.18093420565128326
Validation loss = 0.18140794336795807
Validation loss = 0.1819075345993042
Validation loss = 0.18222101032733917
Validation loss = 0.18733127415180206
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.18119679391384125
Validation loss = 0.17613375186920166
Validation loss = 0.1786794662475586
Validation loss = 0.17801254987716675
Validation loss = 0.1782008409500122
Validation loss = 0.18041300773620605
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.18320898711681366
Validation loss = 0.17734704911708832
Validation loss = 0.1766466349363327
Validation loss = 0.1807645559310913
Validation loss = 0.17867155373096466
Validation loss = 0.17938704788684845
Validation loss = 0.17753319442272186
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.17828696966171265
Validation loss = 0.17923851311206818
Validation loss = 0.17858050763607025
Validation loss = 0.17847192287445068
Validation loss = 0.18143068253993988
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 90.6     |
| Iteration     | 11       |
| MaximumReturn | 1.44e+03 |
| MinimumReturn | -949     |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.17611272633075714
Validation loss = 0.17273971438407898
Validation loss = 0.17565597593784332
Validation loss = 0.1749034821987152
Validation loss = 0.1752932220697403
Validation loss = 0.17688362300395966
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.17697499692440033
Validation loss = 0.1748281568288803
Validation loss = 0.17684783041477203
Validation loss = 0.17736124992370605
Validation loss = 0.17692865431308746
Validation loss = 0.17613142728805542
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.17357993125915527
Validation loss = 0.1705804467201233
Validation loss = 0.1769268810749054
Validation loss = 0.1751655489206314
Validation loss = 0.1747879683971405
Validation loss = 0.17242534458637238
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.17526744306087494
Validation loss = 0.17472821474075317
Validation loss = 0.17491459846496582
Validation loss = 0.17524497210979462
Validation loss = 0.1810804307460785
Validation loss = 0.17549970746040344
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1736902892589569
Validation loss = 0.1714506894350052
Validation loss = 0.1729208379983902
Validation loss = 0.17436081171035767
Validation loss = 0.1741241067647934
Validation loss = 0.17426630854606628
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 621      |
| Iteration     | 12       |
| MaximumReturn | 1.73e+03 |
| MinimumReturn | -608     |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.17070522904396057
Validation loss = 0.16928492486476898
Validation loss = 0.16851148009300232
Validation loss = 0.1682349145412445
Validation loss = 0.1688230037689209
Validation loss = 0.16845354437828064
Validation loss = 0.17094306647777557
Validation loss = 0.1690274029970169
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.17149567604064941
Validation loss = 0.16762462258338928
Validation loss = 0.17137372493743896
Validation loss = 0.16907338798046112
Validation loss = 0.1677282154560089
Validation loss = 0.1703631579875946
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16838492453098297
Validation loss = 0.16591452062129974
Validation loss = 0.167772576212883
Validation loss = 0.16699203848838806
Validation loss = 0.16861580312252045
Validation loss = 0.16599532961845398
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.16890908777713776
Validation loss = 0.16738085448741913
Validation loss = 0.16729429364204407
Validation loss = 0.1686697155237198
Validation loss = 0.16715455055236816
Validation loss = 0.16854174435138702
Validation loss = 0.16761814057826996
Validation loss = 0.1668591946363449
Validation loss = 0.16778111457824707
Validation loss = 0.16968142986297607
Validation loss = 0.16786912083625793
Validation loss = 0.1675557941198349
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.16711577773094177
Validation loss = 0.16498948633670807
Validation loss = 0.16734124720096588
Validation loss = 0.16829264163970947
Validation loss = 0.16611532866954803
Validation loss = 0.16958065330982208
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.34e+03 |
| Iteration     | 13       |
| MaximumReturn | 1.69e+03 |
| MinimumReturn | 462      |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16627545654773712
Validation loss = 0.16256862878799438
Validation loss = 0.16634012758731842
Validation loss = 0.16616766154766083
Validation loss = 0.16470880806446075
Validation loss = 0.16536970436573029
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.16823799908161163
Validation loss = 0.16479024291038513
Validation loss = 0.16619277000427246
Validation loss = 0.16622163355350494
Validation loss = 0.16631992161273956
Validation loss = 0.1675850749015808
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16828808188438416
Validation loss = 0.16336509585380554
Validation loss = 0.16272689402103424
Validation loss = 0.16289781033992767
Validation loss = 0.1627681702375412
Validation loss = 0.16334135830402374
Validation loss = 0.16308368742465973
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1670975685119629
Validation loss = 0.162075474858284
Validation loss = 0.1631884127855301
Validation loss = 0.165301114320755
Validation loss = 0.16324765980243683
Validation loss = 0.16370892524719238
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.16475839912891388
Validation loss = 0.16504067182540894
Validation loss = 0.16453517973423004
Validation loss = 0.16261738538742065
Validation loss = 0.16609664261341095
Validation loss = 0.1661662459373474
Validation loss = 0.1643720269203186
Validation loss = 0.16552883386611938
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 37.8     |
| Iteration     | 14       |
| MaximumReturn | 1.71e+03 |
| MinimumReturn | -717     |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16509383916854858
Validation loss = 0.1661696583032608
Validation loss = 0.16266122460365295
Validation loss = 0.16530224680900574
Validation loss = 0.16358177363872528
Validation loss = 0.1644579917192459
Validation loss = 0.1644458770751953
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.16853711009025574
Validation loss = 0.16518551111221313
Validation loss = 0.16544611752033234
Validation loss = 0.1666710376739502
Validation loss = 0.16661153733730316
Validation loss = 0.1656484603881836
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16442668437957764
Validation loss = 0.1611773520708084
Validation loss = 0.16226202249526978
Validation loss = 0.161467045545578
Validation loss = 0.16201335191726685
Validation loss = 0.16291965544223785
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.165144681930542
Validation loss = 0.16269904375076294
Validation loss = 0.1654033064842224
Validation loss = 0.1631167083978653
Validation loss = 0.16288745403289795
Validation loss = 0.16346290707588196
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1659429520368576
Validation loss = 0.16392210125923157
Validation loss = 0.16304567456245422
Validation loss = 0.16540756821632385
Validation loss = 0.16400928795337677
Validation loss = 0.16300177574157715
Validation loss = 0.16255724430084229
Validation loss = 0.16401907801628113
Validation loss = 0.16334116458892822
Validation loss = 0.16216029226779938
Validation loss = 0.1621520221233368
Validation loss = 0.162862628698349
Validation loss = 0.16405749320983887
Validation loss = 0.1622667908668518
Validation loss = 0.16472086310386658
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 291      |
| Iteration     | 15       |
| MaximumReturn | 1.58e+03 |
| MinimumReturn | -747     |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16594097018241882
Validation loss = 0.16020745038986206
Validation loss = 0.16158835589885712
Validation loss = 0.16055257618427277
Validation loss = 0.1618354320526123
Validation loss = 0.167038694024086
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1645818054676056
Validation loss = 0.1621512770652771
Validation loss = 0.16342303156852722
Validation loss = 0.16395533084869385
Validation loss = 0.16331852972507477
Validation loss = 0.1636073887348175
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1623506247997284
Validation loss = 0.15857672691345215
Validation loss = 0.16003276407718658
Validation loss = 0.15986567735671997
Validation loss = 0.16016660630702972
Validation loss = 0.1602780818939209
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.16223686933517456
Validation loss = 0.16039159893989563
Validation loss = 0.16015613079071045
Validation loss = 0.16172803938388824
Validation loss = 0.16200657188892365
Validation loss = 0.1627141535282135
Validation loss = 0.16180075705051422
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.16351720690727234
Validation loss = 0.15903544425964355
Validation loss = 0.15963740646839142
Validation loss = 0.16073141992092133
Validation loss = 0.1610417366027832
Validation loss = 0.1619255691766739
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 364      |
| Iteration     | 16       |
| MaximumReturn | 1.83e+03 |
| MinimumReturn | -693     |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1628762185573578
Validation loss = 0.1593693047761917
Validation loss = 0.1604500412940979
Validation loss = 0.15994414687156677
Validation loss = 0.1598740965127945
Validation loss = 0.16077785193920135
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.16202417016029358
Validation loss = 0.16147053241729736
Validation loss = 0.1602822095155716
Validation loss = 0.16075854003429413
Validation loss = 0.16056877374649048
Validation loss = 0.1599408984184265
Validation loss = 0.16185744106769562
Validation loss = 0.16160576045513153
Validation loss = 0.16164672374725342
Validation loss = 0.1603180468082428
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16330838203430176
Validation loss = 0.15858572721481323
Validation loss = 0.1577773541212082
Validation loss = 0.15989039838314056
Validation loss = 0.15797437727451324
Validation loss = 0.15754804015159607
Validation loss = 0.15961821377277374
Validation loss = 0.1591745913028717
Validation loss = 0.15890951454639435
Validation loss = 0.15869122743606567
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.16504012048244476
Validation loss = 0.15931372344493866
Validation loss = 0.1625642031431198
Validation loss = 0.15909108519554138
Validation loss = 0.15926143527030945
Validation loss = 0.15906131267547607
Validation loss = 0.15952761471271515
Validation loss = 0.16162480413913727
Validation loss = 0.16166213154792786
Validation loss = 0.15965765714645386
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1630750298500061
Validation loss = 0.1585576832294464
Validation loss = 0.15940187871456146
Validation loss = 0.15833693742752075
Validation loss = 0.15906491875648499
Validation loss = 0.15822535753250122
Validation loss = 0.16005295515060425
Validation loss = 0.15974681079387665
Validation loss = 0.15873731672763824
Validation loss = 0.16013400256633759
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 707      |
| Iteration     | 17       |
| MaximumReturn | 1.77e+03 |
| MinimumReturn | -716     |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16139952838420868
Validation loss = 0.15789227187633514
Validation loss = 0.1580815315246582
Validation loss = 0.15987996757030487
Validation loss = 0.15797317028045654
Validation loss = 0.15763674676418304
Validation loss = 0.15823383629322052
Validation loss = 0.15836432576179504
Validation loss = 0.15739800035953522
Validation loss = 0.15760686993598938
Validation loss = 0.1581408530473709
Validation loss = 0.15755856037139893
Validation loss = 0.1590178906917572
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.16103313863277435
Validation loss = 0.15954968333244324
Validation loss = 0.1578507423400879
Validation loss = 0.15864618122577667
Validation loss = 0.15789330005645752
Validation loss = 0.1579694300889969
Validation loss = 0.15848003327846527
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.15949533879756927
Validation loss = 0.15608836710453033
Validation loss = 0.15597526729106903
Validation loss = 0.15582416951656342
Validation loss = 0.1568177044391632
Validation loss = 0.15605469048023224
Validation loss = 0.15726347267627716
Validation loss = 0.1558009684085846
Validation loss = 0.15636104345321655
Validation loss = 0.15746134519577026
Validation loss = 0.15667667984962463
Validation loss = 0.15707965195178986
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1632934808731079
Validation loss = 0.15625983476638794
Validation loss = 0.15870963037014008
Validation loss = 0.15688073635101318
Validation loss = 0.15907235443592072
Validation loss = 0.1577094942331314
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.16201023757457733
Validation loss = 0.1601828932762146
Validation loss = 0.15582232177257538
Validation loss = 0.15674303472042084
Validation loss = 0.15856772661209106
Validation loss = 0.1566428691148758
Validation loss = 0.15721890330314636
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 275      |
| Iteration     | 18       |
| MaximumReturn | 1.88e+03 |
| MinimumReturn | -959     |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.15867860615253448
Validation loss = 0.15581843256950378
Validation loss = 0.1550576388835907
Validation loss = 0.15733377635478973
Validation loss = 0.15682938694953918
Validation loss = 0.15771546959877014
Validation loss = 0.15564359724521637
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15781986713409424
Validation loss = 0.15700218081474304
Validation loss = 0.15657635033130646
Validation loss = 0.1559968888759613
Validation loss = 0.15706810355186462
Validation loss = 0.1560886651277542
Validation loss = 0.15669870376586914
Validation loss = 0.15752191841602325
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.15692321956157684
Validation loss = 0.15342339873313904
Validation loss = 0.15654754638671875
Validation loss = 0.15459060668945312
Validation loss = 0.15550823509693146
Validation loss = 0.15522268414497375
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.15931054949760437
Validation loss = 0.15496838092803955
Validation loss = 0.15607646107673645
Validation loss = 0.1557212471961975
Validation loss = 0.15528421103954315
Validation loss = 0.15608306229114532
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.15911665558815002
Validation loss = 0.15409693121910095
Validation loss = 0.15527573227882385
Validation loss = 0.15591660141944885
Validation loss = 0.15530326962471008
Validation loss = 0.15699324011802673
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 453      |
| Iteration     | 19       |
| MaximumReturn | 1.91e+03 |
| MinimumReturn | -750     |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.15786109864711761
Validation loss = 0.15516360104084015
Validation loss = 0.1539917290210724
Validation loss = 0.15467900037765503
Validation loss = 0.15346738696098328
Validation loss = 0.15557792782783508
Validation loss = 0.15505631268024445
Validation loss = 0.15496228635311127
Validation loss = 0.15344926714897156
Validation loss = 0.15450190007686615
Validation loss = 0.15414975583553314
Validation loss = 0.15388348698616028
Validation loss = 0.1543990820646286
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15675123035907745
Validation loss = 0.15498913824558258
Validation loss = 0.15431490540504456
Validation loss = 0.15619993209838867
Validation loss = 0.15552112460136414
Validation loss = 0.1552022248506546
Validation loss = 0.1557973176240921
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.15678749978542328
Validation loss = 0.15174047648906708
Validation loss = 0.15174290537834167
Validation loss = 0.15378621220588684
Validation loss = 0.15347035229206085
Validation loss = 0.15303468704223633
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.15883591771125793
Validation loss = 0.1525050401687622
Validation loss = 0.15348833799362183
Validation loss = 0.15479806065559387
Validation loss = 0.15457876026630402
Validation loss = 0.15492025017738342
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.15896007418632507
Validation loss = 0.1530667096376419
Validation loss = 0.1541304737329483
Validation loss = 0.154060497879982
Validation loss = 0.1528557538986206
Validation loss = 0.1529119312763214
Validation loss = 0.15344174206256866
Validation loss = 0.15751683712005615
Validation loss = 0.15387901663780212
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 984      |
| Iteration     | 20       |
| MaximumReturn | 1.88e+03 |
| MinimumReturn | -915     |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.15570269525051117
Validation loss = 0.1528232991695404
Validation loss = 0.1529148668050766
Validation loss = 0.15372979640960693
Validation loss = 0.15324078500270844
Validation loss = 0.15494322776794434
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15823544561862946
Validation loss = 0.15288327634334564
Validation loss = 0.1539553552865982
Validation loss = 0.15350963175296783
Validation loss = 0.15476387739181519
Validation loss = 0.15559528768062592
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1543879359960556
Validation loss = 0.15155957639217377
Validation loss = 0.15280628204345703
Validation loss = 0.15199226140975952
Validation loss = 0.15390847623348236
Validation loss = 0.15274152159690857
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.15588198602199554
Validation loss = 0.1518443524837494
Validation loss = 0.15348108112812042
Validation loss = 0.1543222814798355
Validation loss = 0.1534443348646164
Validation loss = 0.1534612774848938
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.15602083504199982
Validation loss = 0.1531301587820053
Validation loss = 0.1525997370481491
Validation loss = 0.1533660739660263
Validation loss = 0.15431508421897888
Validation loss = 0.15409456193447113
Validation loss = 0.15301162004470825
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.04e+03 |
| Iteration     | 21       |
| MaximumReturn | 2.22e+03 |
| MinimumReturn | -574     |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1551981121301651
Validation loss = 0.15078561007976532
Validation loss = 0.15072789788246155
Validation loss = 0.15126076340675354
Validation loss = 0.15119752287864685
Validation loss = 0.15071912109851837
Validation loss = 0.15081873536109924
Validation loss = 0.15052534639835358
Validation loss = 0.15176749229431152
Validation loss = 0.15026462078094482
Validation loss = 0.15060974657535553
Validation loss = 0.1519591212272644
Validation loss = 0.15074588358402252
Validation loss = 0.15124531090259552
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15509556233882904
Validation loss = 0.15023155510425568
Validation loss = 0.15173767507076263
Validation loss = 0.15106572210788727
Validation loss = 0.15156260132789612
Validation loss = 0.15267223119735718
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.15366318821907043
Validation loss = 0.14844900369644165
Validation loss = 0.1490040272474289
Validation loss = 0.15006905794143677
Validation loss = 0.15033751726150513
Validation loss = 0.15038689970970154
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.15525101125240326
Validation loss = 0.15148019790649414
Validation loss = 0.15205933153629303
Validation loss = 0.15231870114803314
Validation loss = 0.1515188217163086
Validation loss = 0.15264621376991272
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.15592968463897705
Validation loss = 0.14989933371543884
Validation loss = 0.15098688006401062
Validation loss = 0.15023095905780792
Validation loss = 0.15139809250831604
Validation loss = 0.15099787712097168
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.14e+03 |
| Iteration     | 22       |
| MaximumReturn | 2.08e+03 |
| MinimumReturn | -907     |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.15474875271320343
Validation loss = 0.14798176288604736
Validation loss = 0.14937354624271393
Validation loss = 0.14964085817337036
Validation loss = 0.15087853372097015
Validation loss = 0.15004996955394745
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15230514109134674
Validation loss = 0.1498667448759079
Validation loss = 0.14979438483715057
Validation loss = 0.15046434104442596
Validation loss = 0.1510387659072876
Validation loss = 0.15103624761104584
Validation loss = 0.15087082982063293
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.15141713619232178
Validation loss = 0.14841830730438232
Validation loss = 0.14946436882019043
Validation loss = 0.14876067638397217
Validation loss = 0.14986340701580048
Validation loss = 0.14890974760055542
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.15516087412834167
Validation loss = 0.1495882123708725
Validation loss = 0.15098275244235992
Validation loss = 0.15136103332042694
Validation loss = 0.1513618379831314
Validation loss = 0.15046130120754242
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.15250158309936523
Validation loss = 0.14894701540470123
Validation loss = 0.1498805433511734
Validation loss = 0.14977388083934784
Validation loss = 0.1503799557685852
Validation loss = 0.14965057373046875
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.84e+03 |
| Iteration     | 23       |
| MaximumReturn | 1.97e+03 |
| MinimumReturn | 1.62e+03 |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.15062913298606873
Validation loss = 0.14845257997512817
Validation loss = 0.14792293310165405
Validation loss = 0.14893029630184174
Validation loss = 0.14864914119243622
Validation loss = 0.14881430566310883
Validation loss = 0.14995761215686798
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15210510790348053
Validation loss = 0.14852432906627655
Validation loss = 0.1501651406288147
Validation loss = 0.14891111850738525
Validation loss = 0.15015894174575806
Validation loss = 0.1484399288892746
Validation loss = 0.14989791810512543
Validation loss = 0.14830712974071503
Validation loss = 0.1489071398973465
Validation loss = 0.15007361769676208
Validation loss = 0.15090376138687134
Validation loss = 0.14850734174251556
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1490502953529358
Validation loss = 0.14753077924251556
Validation loss = 0.1465143859386444
Validation loss = 0.1477075219154358
Validation loss = 0.14769558608531952
Validation loss = 0.14814777672290802
Validation loss = 0.14850454032421112
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.15105116367340088
Validation loss = 0.1488182097673416
Validation loss = 0.1493169665336609
Validation loss = 0.1499778777360916
Validation loss = 0.1487831175327301
Validation loss = 0.1501268893480301
Validation loss = 0.14951853454113007
Validation loss = 0.1495424509048462
Validation loss = 0.14823535084724426
Validation loss = 0.14954595267772675
Validation loss = 0.14977551996707916
Validation loss = 0.14872309565544128
Validation loss = 0.14881397783756256
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.15003518760204315
Validation loss = 0.14632612466812134
Validation loss = 0.14848411083221436
Validation loss = 0.1492551863193512
Validation loss = 0.148795947432518
Validation loss = 0.14768707752227783
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | 882       |
| Iteration     | 24        |
| MaximumReturn | 2.24e+03  |
| MinimumReturn | -1.01e+03 |
| TotalSamples  | 104000    |
-----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.14943145215511322
Validation loss = 0.1463308036327362
Validation loss = 0.14735062420368195
Validation loss = 0.14707298576831818
Validation loss = 0.1486564576625824
Validation loss = 0.14705871045589447
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14973972737789154
Validation loss = 0.1468227654695511
Validation loss = 0.14747372269630432
Validation loss = 0.14762744307518005
Validation loss = 0.14746955037117004
Validation loss = 0.14707490801811218
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1494579017162323
Validation loss = 0.14685915410518646
Validation loss = 0.146793395280838
Validation loss = 0.14739027619361877
Validation loss = 0.14669981598854065
Validation loss = 0.14606769382953644
Validation loss = 0.1466967612504959
Validation loss = 0.14770129323005676
Validation loss = 0.14667226374149323
Validation loss = 0.14642904698848724
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1509060561656952
Validation loss = 0.1470905840396881
Validation loss = 0.14749735593795776
Validation loss = 0.14728040993213654
Validation loss = 0.14753995835781097
Validation loss = 0.14818686246871948
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1506342589855194
Validation loss = 0.14752179384231567
Validation loss = 0.1481625884771347
Validation loss = 0.1469918042421341
Validation loss = 0.14877936244010925
Validation loss = 0.14694777131080627
Validation loss = 0.1476239413022995
Validation loss = 0.14717920124530792
Validation loss = 0.14705391228199005
Validation loss = 0.14731015264987946
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.55e+03 |
| Iteration     | 25       |
| MaximumReturn | 2.11e+03 |
| MinimumReturn | -809     |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.14965508878231049
Validation loss = 0.14734020829200745
Validation loss = 0.14581157267093658
Validation loss = 0.14767900109291077
Validation loss = 0.14746306836605072
Validation loss = 0.1473110020160675
Validation loss = 0.14653903245925903
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1489696353673935
Validation loss = 0.1470586508512497
Validation loss = 0.14734993875026703
Validation loss = 0.1472989320755005
Validation loss = 0.1467335969209671
Validation loss = 0.14734794199466705
Validation loss = 0.14661961793899536
Validation loss = 0.1464361846446991
Validation loss = 0.14800594747066498
Validation loss = 0.14726820588111877
Validation loss = 0.14722411334514618
Validation loss = 0.14682579040527344
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14811387658119202
Validation loss = 0.14559724926948547
Validation loss = 0.14622630178928375
Validation loss = 0.14556486904621124
Validation loss = 0.1469230353832245
Validation loss = 0.14576824009418488
Validation loss = 0.14653974771499634
Validation loss = 0.14656449854373932
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1484648585319519
Validation loss = 0.14589065313339233
Validation loss = 0.14801619946956635
Validation loss = 0.14833568036556244
Validation loss = 0.14741593599319458
Validation loss = 0.14682859182357788
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1497129648923874
Validation loss = 0.14567168056964874
Validation loss = 0.14648211002349854
Validation loss = 0.1464093029499054
Validation loss = 0.14794906973838806
Validation loss = 0.14645899832248688
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.98e+03 |
| Iteration     | 26       |
| MaximumReturn | 2.38e+03 |
| MinimumReturn | 787      |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1476876437664032
Validation loss = 0.14429032802581787
Validation loss = 0.14644503593444824
Validation loss = 0.14634045958518982
Validation loss = 0.14556418359279633
Validation loss = 0.14636583626270294
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14933814108371735
Validation loss = 0.145528182387352
Validation loss = 0.1465812474489212
Validation loss = 0.1476273536682129
Validation loss = 0.14623954892158508
Validation loss = 0.1461455076932907
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1478588581085205
Validation loss = 0.1440076380968094
Validation loss = 0.14570537209510803
Validation loss = 0.1452876776456833
Validation loss = 0.14549127221107483
Validation loss = 0.14515461027622223
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.14857885241508484
Validation loss = 0.14510753750801086
Validation loss = 0.1460167020559311
Validation loss = 0.14623025059700012
Validation loss = 0.14683406054973602
Validation loss = 0.1458972841501236
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1479988843202591
Validation loss = 0.14476163685321808
Validation loss = 0.1463892012834549
Validation loss = 0.14597256481647491
Validation loss = 0.14667336642742157
Validation loss = 0.14566214382648468
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.22e+03 |
| Iteration     | 27       |
| MaximumReturn | 2.52e+03 |
| MinimumReturn | 2.03e+03 |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.14638091623783112
Validation loss = 0.14452029764652252
Validation loss = 0.14541782438755035
Validation loss = 0.14554396271705627
Validation loss = 0.14493827521800995
Validation loss = 0.1447707563638687
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14719077944755554
Validation loss = 0.14428821206092834
Validation loss = 0.14492633938789368
Validation loss = 0.14616800844669342
Validation loss = 0.14428874850273132
Validation loss = 0.14503780007362366
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14728428423404694
Validation loss = 0.14297157526016235
Validation loss = 0.14363990724086761
Validation loss = 0.1441112458705902
Validation loss = 0.14459867775440216
Validation loss = 0.14404454827308655
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.14710621535778046
Validation loss = 0.14399860799312592
Validation loss = 0.14573846757411957
Validation loss = 0.14524222910404205
Validation loss = 0.14431768655776978
Validation loss = 0.14522291719913483
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14692184329032898
Validation loss = 0.14547045528888702
Validation loss = 0.1441577970981598
Validation loss = 0.14509770274162292
Validation loss = 0.14483226835727692
Validation loss = 0.14546160399913788
Validation loss = 0.14479100704193115
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.24e+03 |
| Iteration     | 28       |
| MaximumReturn | 2.36e+03 |
| MinimumReturn | 2.08e+03 |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1465885192155838
Validation loss = 0.14326046407222748
Validation loss = 0.14369596540927887
Validation loss = 0.14406661689281464
Validation loss = 0.14465470612049103
Validation loss = 0.14325278997421265
Validation loss = 0.14357306063175201
Validation loss = 0.14439146220684052
Validation loss = 0.14407619833946228
Validation loss = 0.14331789314746857
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14534813165664673
Validation loss = 0.14293980598449707
Validation loss = 0.14270693063735962
Validation loss = 0.14358942210674286
Validation loss = 0.1440669745206833
Validation loss = 0.14385762810707092
Validation loss = 0.1435258388519287
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14496569335460663
Validation loss = 0.14238952100276947
Validation loss = 0.14297237992286682
Validation loss = 0.14323021471500397
Validation loss = 0.1445094496011734
Validation loss = 0.14297239482402802
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1448848694562912
Validation loss = 0.1434047669172287
Validation loss = 0.14424385130405426
Validation loss = 0.14387507736682892
Validation loss = 0.1440528929233551
Validation loss = 0.14376696944236755
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14545056223869324
Validation loss = 0.1435067355632782
Validation loss = 0.1439000815153122
Validation loss = 0.14479012787342072
Validation loss = 0.14451079070568085
Validation loss = 0.14256994426250458
Validation loss = 0.14379382133483887
Validation loss = 0.1442205309867859
Validation loss = 0.14308108389377594
Validation loss = 0.14450697600841522
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.28e+03 |
| Iteration     | 29       |
| MaximumReturn | 2.16e+03 |
| MinimumReturn | 348      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.14699363708496094
Validation loss = 0.1429290473461151
Validation loss = 0.14374524354934692
Validation loss = 0.14278165996074677
Validation loss = 0.14302809536457062
Validation loss = 0.14371582865715027
Validation loss = 0.1439085602760315
Validation loss = 0.14291797578334808
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14657311141490936
Validation loss = 0.14347468316555023
Validation loss = 0.14378118515014648
Validation loss = 0.14282064139842987
Validation loss = 0.14293287694454193
Validation loss = 0.14302577078342438
Validation loss = 0.14364822208881378
Validation loss = 0.14297117292881012
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14609499275684357
Validation loss = 0.1413765251636505
Validation loss = 0.14201460778713226
Validation loss = 0.1437564343214035
Validation loss = 0.14254336059093475
Validation loss = 0.14226852357387543
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.145342156291008
Validation loss = 0.14276926219463348
Validation loss = 0.1436210721731186
Validation loss = 0.14257478713989258
Validation loss = 0.14298760890960693
Validation loss = 0.14260390400886536
Validation loss = 0.14337092638015747
Validation loss = 0.14484114944934845
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1459815800189972
Validation loss = 0.14117726683616638
Validation loss = 0.14322884380817413
Validation loss = 0.14366543292999268
Validation loss = 0.14288440346717834
Validation loss = 0.14319023489952087
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.1e+03  |
| Iteration     | 30       |
| MaximumReturn | 2.53e+03 |
| MinimumReturn | 1.29e+03 |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.14534932374954224
Validation loss = 0.14223821461200714
Validation loss = 0.14247697591781616
Validation loss = 0.14216715097427368
Validation loss = 0.1424013078212738
Validation loss = 0.14250147342681885
Validation loss = 0.14236484467983246
Validation loss = 0.14248666167259216
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1460554599761963
Validation loss = 0.1418617069721222
Validation loss = 0.14309397339820862
Validation loss = 0.14264200627803802
Validation loss = 0.14223892986774445
Validation loss = 0.1426205188035965
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1428578794002533
Validation loss = 0.14080969989299774
Validation loss = 0.14175456762313843
Validation loss = 0.14130091667175293
Validation loss = 0.14199601113796234
Validation loss = 0.14159345626831055
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.14410141110420227
Validation loss = 0.14142517745494843
Validation loss = 0.14205965399742126
Validation loss = 0.14206229150295258
Validation loss = 0.1424179971218109
Validation loss = 0.14329713582992554
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14376984536647797
Validation loss = 0.1402723491191864
Validation loss = 0.14250344038009644
Validation loss = 0.14189398288726807
Validation loss = 0.1418372392654419
Validation loss = 0.14206883311271667
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.6e+03  |
| Iteration     | 31       |
| MaximumReturn | 2.39e+03 |
| MinimumReturn | -233     |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.14398135244846344
Validation loss = 0.14021986722946167
Validation loss = 0.14117562770843506
Validation loss = 0.14069247245788574
Validation loss = 0.14130978286266327
Validation loss = 0.14125283062458038
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14380288124084473
Validation loss = 0.13922882080078125
Validation loss = 0.14200110733509064
Validation loss = 0.14140237867832184
Validation loss = 0.14066524803638458
Validation loss = 0.14104768633842468
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14211063086986542
Validation loss = 0.13934697210788727
Validation loss = 0.14083589613437653
Validation loss = 0.14035019278526306
Validation loss = 0.14006377756595612
Validation loss = 0.13973018527030945
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1430874466896057
Validation loss = 0.14011584222316742
Validation loss = 0.1408977508544922
Validation loss = 0.14063343405723572
Validation loss = 0.14155182242393494
Validation loss = 0.14124999940395355
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14500555396080017
Validation loss = 0.139802023768425
Validation loss = 0.1412046253681183
Validation loss = 0.1404062807559967
Validation loss = 0.1402224749326706
Validation loss = 0.1407683938741684
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.23e+03 |
| Iteration     | 32       |
| MaximumReturn | 2.4e+03  |
| MinimumReturn | 1.98e+03 |
| TotalSamples  | 136000   |
----------------------------
