Logging to experiments/invertedPendulum/IPA01/Tue-01-Nov-2022-07-59-07-PM-CDT_invertedPendulum_trpo_iteration_20_seed2231
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7780511975288391
Validation loss = 0.38600629568099976
Validation loss = 0.3737563192844391
Validation loss = 0.31876349449157715
Validation loss = 0.3069640100002289
Validation loss = 0.2970317304134369
Validation loss = 0.28150737285614014
Validation loss = 0.28343647718429565
Validation loss = 0.24996837973594666
Validation loss = 0.2587572932243347
Validation loss = 0.23624515533447266
Validation loss = 0.23320327699184418
Validation loss = 0.21947111189365387
Validation loss = 0.21081188321113586
Validation loss = 0.20771285891532898
Validation loss = 0.20835234224796295
Validation loss = 0.23493388295173645
Validation loss = 0.20147936046123505
Validation loss = 0.21007126569747925
Validation loss = 0.19525860249996185
Validation loss = 0.2016078531742096
Validation loss = 0.20727086067199707
Validation loss = 0.19207026064395905
Validation loss = 0.1928500235080719
Validation loss = 0.18874618411064148
Validation loss = 0.18073168396949768
Validation loss = 0.2142999917268753
Validation loss = 0.19114409387111664
Validation loss = 0.19002628326416016
Validation loss = 0.17492946982383728
Validation loss = 0.18053381145000458
Validation loss = 0.18375559151172638
Validation loss = 0.20850001275539398
Validation loss = 0.1878892332315445
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7698932886123657
Validation loss = 0.3922859728336334
Validation loss = 0.3890976309776306
Validation loss = 0.3253512382507324
Validation loss = 0.30339840054512024
Validation loss = 0.291920006275177
Validation loss = 0.284990519285202
Validation loss = 0.2595720589160919
Validation loss = 0.2444075047969818
Validation loss = 0.23341795802116394
Validation loss = 0.23792225122451782
Validation loss = 0.22912828624248505
Validation loss = 0.22563572227954865
Validation loss = 0.2268534004688263
Validation loss = 0.21702562272548676
Validation loss = 0.20681439340114594
Validation loss = 0.2100425809621811
Validation loss = 0.21770642697811127
Validation loss = 0.20095929503440857
Validation loss = 0.1867632418870926
Validation loss = 0.20685505867004395
Validation loss = 0.2095886468887329
Validation loss = 0.1870935708284378
Validation loss = 0.18190963566303253
Validation loss = 0.1895250827074051
Validation loss = 0.1938708871603012
Validation loss = 0.1900862753391266
Validation loss = 0.18485434353351593
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.756770670413971
Validation loss = 0.42226797342300415
Validation loss = 0.3524254858493805
Validation loss = 0.31760844588279724
Validation loss = 0.30192556977272034
Validation loss = 0.30149006843566895
Validation loss = 0.283148854970932
Validation loss = 0.26159238815307617
Validation loss = 0.2542758882045746
Validation loss = 0.24937961995601654
Validation loss = 0.2395739108324051
Validation loss = 0.22408051788806915
Validation loss = 0.2171149104833603
Validation loss = 0.21324582397937775
Validation loss = 0.2208762913942337
Validation loss = 0.2002352923154831
Validation loss = 0.1882311999797821
Validation loss = 0.20499317348003387
Validation loss = 0.19984574615955353
Validation loss = 0.18583904206752777
Validation loss = 0.183376282453537
Validation loss = 0.17449837923049927
Validation loss = 0.18927669525146484
Validation loss = 0.2000974863767624
Validation loss = 0.21582506597042084
Validation loss = 0.19210010766983032
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7651005387306213
Validation loss = 0.37064722180366516
Validation loss = 0.35548558831214905
Validation loss = 0.32260388135910034
Validation loss = 0.30800408124923706
Validation loss = 0.31372684240341187
Validation loss = 0.28970274329185486
Validation loss = 0.26658859848976135
Validation loss = 0.2454240471124649
Validation loss = 0.2453550398349762
Validation loss = 0.2427075356245041
Validation loss = 0.21951650083065033
Validation loss = 0.22940631210803986
Validation loss = 0.2426544427871704
Validation loss = 0.23099692165851593
Validation loss = 0.20933233201503754
Validation loss = 0.23209495842456818
Validation loss = 0.21239659190177917
Validation loss = 0.18517546355724335
Validation loss = 0.19716601073741913
Validation loss = 0.1964278519153595
Validation loss = 0.21954217553138733
Validation loss = 0.19270597398281097
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7659491896629333
Validation loss = 0.39547428488731384
Validation loss = 0.33822131156921387
Validation loss = 0.3123926520347595
Validation loss = 0.3036096394062042
Validation loss = 0.29332074522972107
Validation loss = 0.2719024121761322
Validation loss = 0.2865521311759949
Validation loss = 0.24768635630607605
Validation loss = 0.23732824623584747
Validation loss = 0.2441355437040329
Validation loss = 0.22766336798667908
Validation loss = 0.21922054886817932
Validation loss = 0.22171668708324432
Validation loss = 0.2198428362607956
Validation loss = 0.2055591493844986
Validation loss = 0.19073301553726196
Validation loss = 0.21708472073078156
Validation loss = 0.2060093879699707
Validation loss = 0.2114601582288742
Validation loss = 0.18753916025161743
Validation loss = 0.20818032324314117
Validation loss = 0.19777114689350128
Validation loss = 0.20268893241882324
Validation loss = 0.20440582931041718
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -31.3    |
| Iteration     | 0        |
| MaximumReturn | -0.0478  |
| MinimumReturn | -84.5    |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3625786602497101
Validation loss = 0.1975451409816742
Validation loss = 0.18045923113822937
Validation loss = 0.1564212590456009
Validation loss = 0.15923838317394257
Validation loss = 0.13500984013080597
Validation loss = 0.12180476635694504
Validation loss = 0.12917114794254303
Validation loss = 0.129352867603302
Validation loss = 0.12190880626440048
Validation loss = 0.1099400669336319
Validation loss = 0.1025310829281807
Validation loss = 0.10206711292266846
Validation loss = 0.11725074797868729
Validation loss = 0.09885317087173462
Validation loss = 0.10157332569360733
Validation loss = 0.10364681482315063
Validation loss = 0.11136581003665924
Validation loss = 0.11829273402690887
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.33295756578445435
Validation loss = 0.1952221542596817
Validation loss = 0.17238034307956696
Validation loss = 0.15012621879577637
Validation loss = 0.14273007214069366
Validation loss = 0.13982506096363068
Validation loss = 0.12105169147253036
Validation loss = 0.14101527631282806
Validation loss = 0.12965816259384155
Validation loss = 0.1155112013220787
Validation loss = 0.13086163997650146
Validation loss = 0.11883699148893356
Validation loss = 0.1186869814991951
Validation loss = 0.1213703602552414
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3254735469818115
Validation loss = 0.18463250994682312
Validation loss = 0.1667909324169159
Validation loss = 0.16146324574947357
Validation loss = 0.1498010903596878
Validation loss = 0.14035123586654663
Validation loss = 0.14421655237674713
Validation loss = 0.1385975480079651
Validation loss = 0.12078426778316498
Validation loss = 0.12038543820381165
Validation loss = 0.12697075307369232
Validation loss = 0.1037578210234642
Validation loss = 0.11163268983364105
Validation loss = 0.11156854033470154
Validation loss = 0.10962628573179245
Validation loss = 0.10199414193630219
Validation loss = 0.1164567843079567
Validation loss = 0.0967244878411293
Validation loss = 0.12173091620206833
Validation loss = 0.09266570955514908
Validation loss = 0.10103447735309601
Validation loss = 0.1101050078868866
Validation loss = 0.11981616169214249
Validation loss = 0.125978484749794
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3427777886390686
Validation loss = 0.1997118592262268
Validation loss = 0.17484734952449799
Validation loss = 0.16554754972457886
Validation loss = 0.14989468455314636
Validation loss = 0.14625442028045654
Validation loss = 0.12161961197853088
Validation loss = 0.1317400336265564
Validation loss = 0.1404857337474823
Validation loss = 0.13752467930316925
Validation loss = 0.1325002908706665
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.31127074360847473
Validation loss = 0.19334106147289276
Validation loss = 0.16287435591220856
Validation loss = 0.16003957390785217
Validation loss = 0.1587868332862854
Validation loss = 0.15745264291763306
Validation loss = 0.1441802978515625
Validation loss = 0.15386879444122314
Validation loss = 0.12038642913103104
Validation loss = 0.12046165764331818
Validation loss = 0.12374988198280334
Validation loss = 0.11893937736749649
Validation loss = 0.11681711673736572
Validation loss = 0.11246427893638611
Validation loss = 0.11627280712127686
Validation loss = 0.10263124108314514
Validation loss = 0.10177222639322281
Validation loss = 0.12072103470563889
Validation loss = 0.10577745735645294
Validation loss = 0.10512376576662064
Validation loss = 0.11247700452804565
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0945  |
| Iteration     | 1        |
| MaximumReturn | -0.045   |
| MinimumReturn | -0.156   |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2079015076160431
Validation loss = 0.09492796659469604
Validation loss = 0.07203548401594162
Validation loss = 0.07001883536577225
Validation loss = 0.06188047677278519
Validation loss = 0.059483602643013
Validation loss = 0.05588698759675026
Validation loss = 0.05635523796081543
Validation loss = 0.05673939734697342
Validation loss = 0.04982369393110275
Validation loss = 0.053547751158475876
Validation loss = 0.05003586411476135
Validation loss = 0.04750046133995056
Validation loss = 0.05085132271051407
Validation loss = 0.04759259894490242
Validation loss = 0.060384999960660934
Validation loss = 0.05612628161907196
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.21859237551689148
Validation loss = 0.09626957029104233
Validation loss = 0.08165976405143738
Validation loss = 0.07271125912666321
Validation loss = 0.07674974948167801
Validation loss = 0.06622819602489471
Validation loss = 0.061437152326107025
Validation loss = 0.06447151303291321
Validation loss = 0.05978517606854439
Validation loss = 0.05575289577245712
Validation loss = 0.052813753485679626
Validation loss = 0.05262333154678345
Validation loss = 0.048236798495054245
Validation loss = 0.055624958127737045
Validation loss = 0.05116037651896477
Validation loss = 0.061895258724689484
Validation loss = 0.06269320845603943
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.21572202444076538
Validation loss = 0.09755965322256088
Validation loss = 0.07805798947811127
Validation loss = 0.06980930268764496
Validation loss = 0.0643615573644638
Validation loss = 0.06469264626502991
Validation loss = 0.06498577445745468
Validation loss = 0.05829526484012604
Validation loss = 0.061179161071777344
Validation loss = 0.06029944121837616
Validation loss = 0.05467013269662857
Validation loss = 0.05235043540596962
Validation loss = 0.046733152121305466
Validation loss = 0.04492907598614693
Validation loss = 0.060507163405418396
Validation loss = 0.04627218097448349
Validation loss = 0.05161995813250542
Validation loss = 0.049059316515922546
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1879056990146637
Validation loss = 0.09709406644105911
Validation loss = 0.07961822301149368
Validation loss = 0.07601507753133774
Validation loss = 0.0685613751411438
Validation loss = 0.06619639694690704
Validation loss = 0.08253641426563263
Validation loss = 0.07916978746652603
Validation loss = 0.07401474565267563
Validation loss = 0.06969599425792694
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2118244767189026
Validation loss = 0.09762565791606903
Validation loss = 0.07560311257839203
Validation loss = 0.06434650719165802
Validation loss = 0.06531880050897598
Validation loss = 0.06904346495866776
Validation loss = 0.05096779018640518
Validation loss = 0.05469119921326637
Validation loss = 0.05823739618062973
Validation loss = 0.05506701394915581
Validation loss = 0.062307924032211304
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00164 |
| Iteration     | 2        |
| MaximumReturn | -0.00129 |
| MinimumReturn | -0.00198 |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0629986822605133
Validation loss = 0.04023280367255211
Validation loss = 0.05253642797470093
Validation loss = 0.05429065600037575
Validation loss = 0.05098876357078552
Validation loss = 0.0471605621278286
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07118778675794601
Validation loss = 0.07283860445022583
Validation loss = 0.07105489075183868
Validation loss = 0.05934585630893707
Validation loss = 0.049887385219335556
Validation loss = 0.051958054304122925
Validation loss = 0.046228986233472824
Validation loss = 0.04149559512734413
Validation loss = 0.03873136267066002
Validation loss = 0.03855903446674347
Validation loss = 0.03738095983862877
Validation loss = 0.050268929451704025
Validation loss = 0.04598368704319
Validation loss = 0.04419812932610512
Validation loss = 0.040944769978523254
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06722160428762436
Validation loss = 0.05563409999012947
Validation loss = 0.051389604806900024
Validation loss = 0.04512445628643036
Validation loss = 0.07207860052585602
Validation loss = 0.04616894945502281
Validation loss = 0.05282985791563988
Validation loss = 0.044286999851465225
Validation loss = 0.0452103316783905
Validation loss = 0.04271354153752327
Validation loss = 0.0420382022857666
Validation loss = 0.044320594519376755
Validation loss = 0.044840749353170395
Validation loss = 0.04376193881034851
Validation loss = 0.03634852170944214
Validation loss = 0.03767925873398781
Validation loss = 0.037832606583833694
Validation loss = 0.03681259974837303
Validation loss = 0.045396074652671814
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0704273134469986
Validation loss = 0.056390147656202316
Validation loss = 0.05466967448592186
Validation loss = 0.05465860664844513
Validation loss = 0.06984101235866547
Validation loss = 0.056047070771455765
Validation loss = 0.05181384086608887
Validation loss = 0.06351511925458908
Validation loss = 0.04798819124698639
Validation loss = 0.04956476390361786
Validation loss = 0.049202680587768555
Validation loss = 0.04393164813518524
Validation loss = 0.04374287649989128
Validation loss = 0.04494622349739075
Validation loss = 0.04754990339279175
Validation loss = 0.04736723005771637
Validation loss = 0.04740365222096443
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06871947646141052
Validation loss = 0.05979003384709358
Validation loss = 0.0537416934967041
Validation loss = 0.05132018029689789
Validation loss = 0.0496482290327549
Validation loss = 0.057877182960510254
Validation loss = 0.05103098973631859
Validation loss = 0.04424532875418663
Validation loss = 0.05424179509282112
Validation loss = 0.04114128649234772
Validation loss = 0.03957211226224899
Validation loss = 0.04257972538471222
Validation loss = 0.04310179129242897
Validation loss = 0.041263312101364136
Validation loss = 0.04460838437080383
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -32.9    |
| Iteration     | 3        |
| MaximumReturn | -0.533   |
| MinimumReturn | -60.5    |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.15119504928588867
Validation loss = 0.04487316310405731
Validation loss = 0.03487210348248482
Validation loss = 0.035804059356451035
Validation loss = 0.030506979674100876
Validation loss = 0.02162967249751091
Validation loss = 0.022790618240833282
Validation loss = 0.028888173401355743
Validation loss = 0.029075458645820618
Validation loss = 0.026799866929650307
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14946991205215454
Validation loss = 0.04354912415146828
Validation loss = 0.039407648146152496
Validation loss = 0.04139993339776993
Validation loss = 0.03265063837170601
Validation loss = 0.02410697750747204
Validation loss = 0.023407787084579468
Validation loss = 0.02148650959134102
Validation loss = 0.030511867254972458
Validation loss = 0.024593161419034004
Validation loss = 0.01711527816951275
Validation loss = 0.020004477351903915
Validation loss = 0.02093014121055603
Validation loss = 0.014761349186301231
Validation loss = 0.01755533367395401
Validation loss = 0.015861330553889275
Validation loss = 0.012995018623769283
Validation loss = 0.014805938117206097
Validation loss = 0.014881362207233906
Validation loss = 0.015034975484013557
Validation loss = 0.014724381268024445
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11811128258705139
Validation loss = 0.041393399238586426
Validation loss = 0.03191863372921944
Validation loss = 0.03252997249364853
Validation loss = 0.0289858877658844
Validation loss = 0.026181986555457115
Validation loss = 0.022768352180719376
Validation loss = 0.019687557592988014
Validation loss = 0.02476929873228073
Validation loss = 0.017719006165862083
Validation loss = 0.027029745280742645
Validation loss = 0.02117720991373062
Validation loss = 0.015957744792103767
Validation loss = 0.016879845410585403
Validation loss = 0.01632758043706417
Validation loss = 0.01609853468835354
Validation loss = 0.01493220217525959
Validation loss = 0.017738111317157745
Validation loss = 0.017547549679875374
Validation loss = 0.016282973811030388
Validation loss = 0.016355052590370178
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.18965542316436768
Validation loss = 0.05740082263946533
Validation loss = 0.044766075909137726
Validation loss = 0.03863127902150154
Validation loss = 0.03344389423727989
Validation loss = 0.030646147206425667
Validation loss = 0.02391975373029709
Validation loss = 0.026202987879514694
Validation loss = 0.023953232914209366
Validation loss = 0.027762042358517647
Validation loss = 0.020328160375356674
Validation loss = 0.02793949469923973
Validation loss = 0.019410323351621628
Validation loss = 0.03008471243083477
Validation loss = 0.02044186182320118
Validation loss = 0.021565193310379982
Validation loss = 0.018673863261938095
Validation loss = 0.017799535766243935
Validation loss = 0.022455336526036263
Validation loss = 0.017749447375535965
Validation loss = 0.027031494304537773
Validation loss = 0.01723393239080906
Validation loss = 0.022992577403783798
Validation loss = 0.020315125584602356
Validation loss = 0.018543679267168045
Validation loss = 0.0202026404440403
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1386488527059555
Validation loss = 0.04140911623835564
Validation loss = 0.03485913947224617
Validation loss = 0.031373217701911926
Validation loss = 0.03152726963162422
Validation loss = 0.03340648487210274
Validation loss = 0.03501279652118683
Validation loss = 0.02051137387752533
Validation loss = 0.026987381279468536
Validation loss = 0.02630751021206379
Validation loss = 0.03385978192090988
Validation loss = 0.025519898161292076
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -45.4    |
| Iteration     | 4        |
| MaximumReturn | -29.3    |
| MinimumReturn | -57.5    |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05316052585840225
Validation loss = 0.024705270305275917
Validation loss = 0.016854478046298027
Validation loss = 0.021977150812745094
Validation loss = 0.0204123854637146
Validation loss = 0.014522373676300049
Validation loss = 0.013912556692957878
Validation loss = 0.01962645910680294
Validation loss = 0.016029853373765945
Validation loss = 0.016824182122945786
Validation loss = 0.013361791148781776
Validation loss = 0.012161526829004288
Validation loss = 0.013461872935295105
Validation loss = 0.011724742129445076
Validation loss = 0.015975844115018845
Validation loss = 0.012777472846210003
Validation loss = 0.014460092410445213
Validation loss = 0.010891435667872429
Validation loss = 0.014737844467163086
Validation loss = 0.013843024149537086
Validation loss = 0.010218496434390545
Validation loss = 0.01849217154085636
Validation loss = 0.012906895950436592
Validation loss = 0.013208972290158272
Validation loss = 0.00959248561412096
Validation loss = 0.008960230275988579
Validation loss = 0.00909606646746397
Validation loss = 0.00836579967290163
Validation loss = 0.011991308070719242
Validation loss = 0.011142966337502003
Validation loss = 0.009214673191308975
Validation loss = 0.008545051328837872
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05648229643702507
Validation loss = 0.013709450140595436
Validation loss = 0.020336143672466278
Validation loss = 0.013465824536979198
Validation loss = 0.013775353319942951
Validation loss = 0.012947860173881054
Validation loss = 0.011591215617954731
Validation loss = 0.013975691981613636
Validation loss = 0.01055097859352827
Validation loss = 0.013597902841866016
Validation loss = 0.012720073573291302
Validation loss = 0.009856360033154488
Validation loss = 0.01190158911049366
Validation loss = 0.012110142037272453
Validation loss = 0.009418227709829807
Validation loss = 0.014141673222184181
Validation loss = 0.01099022664129734
Validation loss = 0.010971296578645706
Validation loss = 0.010633023455739021
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.052308715879917145
Validation loss = 0.02319917641580105
Validation loss = 0.01570037379860878
Validation loss = 0.012300023809075356
Validation loss = 0.010962193831801414
Validation loss = 0.009256074205040932
Validation loss = 0.00964678917080164
Validation loss = 0.010662770830094814
Validation loss = 0.01075677014887333
Validation loss = 0.011919456534087658
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.052394844591617584
Validation loss = 0.017699938267469406
Validation loss = 0.014128496870398521
Validation loss = 0.011665435507893562
Validation loss = 0.013351705856621265
Validation loss = 0.012651845812797546
Validation loss = 0.011266720481216908
Validation loss = 0.010494032874703407
Validation loss = 0.012050561606884003
Validation loss = 0.01176445372402668
Validation loss = 0.009942470118403435
Validation loss = 0.00975660141557455
Validation loss = 0.010860739275813103
Validation loss = 0.013243727385997772
Validation loss = 0.010331884026527405
Validation loss = 0.013731499202549458
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06604165583848953
Validation loss = 0.02291012555360794
Validation loss = 0.01979391649365425
Validation loss = 0.01425243727862835
Validation loss = 0.015502017922699451
Validation loss = 0.014750835485756397
Validation loss = 0.012553131207823753
Validation loss = 0.016762670129537582
Validation loss = 0.011870302259922028
Validation loss = 0.012657351791858673
Validation loss = 0.013255812227725983
Validation loss = 0.013660860247910023
Validation loss = 0.011658388189971447
Validation loss = 0.012699735350906849
Validation loss = 0.01130260992795229
Validation loss = 0.009871652349829674
Validation loss = 0.008830046281218529
Validation loss = 0.010664092376828194
Validation loss = 0.007684953510761261
Validation loss = 0.014189991168677807
Validation loss = 0.009946564212441444
Validation loss = 0.011899974197149277
Validation loss = 0.00853061955422163
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0662  |
| Iteration     | 5        |
| MaximumReturn | -0.0292  |
| MinimumReturn | -0.344   |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03993447870016098
Validation loss = 0.013426503166556358
Validation loss = 0.014997425489127636
Validation loss = 0.009595034644007683
Validation loss = 0.010552367195487022
Validation loss = 0.009024136699736118
Validation loss = 0.008875120431184769
Validation loss = 0.008479101583361626
Validation loss = 0.007442844100296497
Validation loss = 0.007555314805358648
Validation loss = 0.011323281563818455
Validation loss = 0.010176600888371468
Validation loss = 0.007407038006931543
Validation loss = 0.013615247793495655
Validation loss = 0.012525794096291065
Validation loss = 0.008833217434585094
Validation loss = 0.015673568472266197
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03808837756514549
Validation loss = 0.012013951316475868
Validation loss = 0.008440769277513027
Validation loss = 0.008325043134391308
Validation loss = 0.008130425587296486
Validation loss = 0.009489034302532673
Validation loss = 0.010340459644794464
Validation loss = 0.007741256151348352
Validation loss = 0.008650545962154865
Validation loss = 0.008155814372003078
Validation loss = 0.008363304659724236
Validation loss = 0.010424504056572914
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04526166990399361
Validation loss = 0.012296001426875591
Validation loss = 0.012043386697769165
Validation loss = 0.007827738299965858
Validation loss = 0.0073224217630922794
Validation loss = 0.009212765842676163
Validation loss = 0.02080782689154148
Validation loss = 0.007918769493699074
Validation loss = 0.01062091626226902
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.047840289771556854
Validation loss = 0.01067165657877922
Validation loss = 0.009060155600309372
Validation loss = 0.008832046762108803
Validation loss = 0.011574653908610344
Validation loss = 0.011458359658718109
Validation loss = 0.010946078225970268
Validation loss = 0.009800274856388569
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024452844634652138
Validation loss = 0.01326768659055233
Validation loss = 0.008488869294524193
Validation loss = 0.009241003543138504
Validation loss = 0.011624094098806381
Validation loss = 0.010718153789639473
Validation loss = 0.00820214580744505
Validation loss = 0.0077855708077549934
Validation loss = 0.009259245358407497
Validation loss = 0.006880541332066059
Validation loss = 0.007600455079227686
Validation loss = 0.00643769558519125
Validation loss = 0.008327298797667027
Validation loss = 0.007849802263081074
Validation loss = 0.01019429974257946
Validation loss = 0.006868706550449133
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0516  |
| Iteration     | 6        |
| MaximumReturn | -0.0292  |
| MinimumReturn | -0.141   |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01830924116075039
Validation loss = 0.0073200189508497715
Validation loss = 0.005448583979159594
Validation loss = 0.009046302177011967
Validation loss = 0.006460167933255434
Validation loss = 0.0066529568284749985
Validation loss = 0.006394417956471443
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01397909689694643
Validation loss = 0.006873355712741613
Validation loss = 0.005801532417535782
Validation loss = 0.007828107103705406
Validation loss = 0.006782765034586191
Validation loss = 0.007219785358756781
Validation loss = 0.010612576268613338
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015857329592108727
Validation loss = 0.008822658099234104
Validation loss = 0.006214378401637077
Validation loss = 0.007169784512370825
Validation loss = 0.007457787171006203
Validation loss = 0.00690206466242671
Validation loss = 0.0075660995207726955
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0243864506483078
Validation loss = 0.009343153797090054
Validation loss = 0.006960459519177675
Validation loss = 0.00685952277854085
Validation loss = 0.007775719743221998
Validation loss = 0.007247932255268097
Validation loss = 0.006402648985385895
Validation loss = 0.006186546292155981
Validation loss = 0.006938605103641748
Validation loss = 0.00890930276364088
Validation loss = 0.010685020126402378
Validation loss = 0.007071055006235838
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018615560606122017
Validation loss = 0.007185372058302164
Validation loss = 0.007475914433598518
Validation loss = 0.005960680078715086
Validation loss = 0.007285944651812315
Validation loss = 0.006379058584570885
Validation loss = 0.006633320357650518
Validation loss = 0.011050347238779068
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0329  |
| Iteration     | 7        |
| MaximumReturn | -0.0247  |
| MinimumReturn | -0.0449  |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01190691627562046
Validation loss = 0.00712159788236022
Validation loss = 0.009207604452967644
Validation loss = 0.00537552684545517
Validation loss = 0.009188473224639893
Validation loss = 0.006182000041007996
Validation loss = 0.006803068798035383
Validation loss = 0.006484252400696278
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013527494855225086
Validation loss = 0.008716044947504997
Validation loss = 0.013786672614514828
Validation loss = 0.015264769084751606
Validation loss = 0.007309581618756056
Validation loss = 0.006791268941015005
Validation loss = 0.005544612649828196
Validation loss = 0.0068465773947536945
Validation loss = 0.010842067189514637
Validation loss = 0.005953171756118536
Validation loss = 0.004784044343978167
Validation loss = 0.007957628928124905
Validation loss = 0.008369402959942818
Validation loss = 0.008022471331059933
Validation loss = 0.011950020678341389
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01577592082321644
Validation loss = 0.007664327044039965
Validation loss = 0.00611790269613266
Validation loss = 0.006991744972765446
Validation loss = 0.006765623576939106
Validation loss = 0.007471115328371525
Validation loss = 0.013490142300724983
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01955079473555088
Validation loss = 0.010180121287703514
Validation loss = 0.009220318868756294
Validation loss = 0.00644913362339139
Validation loss = 0.007116796914488077
Validation loss = 0.0066610341891646385
Validation loss = 0.005394549574702978
Validation loss = 0.006684844382107258
Validation loss = 0.017014456912875175
Validation loss = 0.009523128159344196
Validation loss = 0.010891244746744633
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01474505104124546
Validation loss = 0.008308691903948784
Validation loss = 0.008125587366521358
Validation loss = 0.006803818512707949
Validation loss = 0.005748044699430466
Validation loss = 0.0058701494708657265
Validation loss = 0.005350693594664335
Validation loss = 0.005899805575609207
Validation loss = 0.007105835247784853
Validation loss = 0.0053804474882781506
Validation loss = 0.005362392868846655
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0151  |
| Iteration     | 8        |
| MaximumReturn | -0.0113  |
| MinimumReturn | -0.0197  |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012029491364955902
Validation loss = 0.016427455469965935
Validation loss = 0.010269388556480408
Validation loss = 0.008072842843830585
Validation loss = 0.006832240615040064
Validation loss = 0.0058901747688651085
Validation loss = 0.008688694797456264
Validation loss = 0.0074579776264727116
Validation loss = 0.005458002910017967
Validation loss = 0.012294905260205269
Validation loss = 0.006844233721494675
Validation loss = 0.005509749986231327
Validation loss = 0.009302148595452309
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03040623851120472
Validation loss = 0.008252061903476715
Validation loss = 0.007588478736579418
Validation loss = 0.006558459252119064
Validation loss = 0.007081985007971525
Validation loss = 0.00510534830391407
Validation loss = 0.0059507619589567184
Validation loss = 0.005898468196392059
Validation loss = 0.005834759678691626
Validation loss = 0.007035426329821348
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013281285762786865
Validation loss = 0.011031901463866234
Validation loss = 0.006969366688281298
Validation loss = 0.009504400193691254
Validation loss = 0.013474204577505589
Validation loss = 0.007180626038461924
Validation loss = 0.0062108878046274185
Validation loss = 0.009795108810067177
Validation loss = 0.008615691214799881
Validation loss = 0.009738210588693619
Validation loss = 0.011887626722455025
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014399019069969654
Validation loss = 0.007292288821190596
Validation loss = 0.008286209776997566
Validation loss = 0.010832207277417183
Validation loss = 0.008967160247266293
Validation loss = 0.007177470717579126
Validation loss = 0.0047689094208180904
Validation loss = 0.005836294963955879
Validation loss = 0.00920109637081623
Validation loss = 0.008641000837087631
Validation loss = 0.006945435889065266
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00952359288930893
Validation loss = 0.007038730662316084
Validation loss = 0.006526768207550049
Validation loss = 0.006395682692527771
Validation loss = 0.006528213620185852
Validation loss = 0.013376174494624138
Validation loss = 0.006484326906502247
Validation loss = 0.007344105280935764
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.033   |
| Iteration     | 9        |
| MaximumReturn | -0.0242  |
| MinimumReturn | -0.0454  |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0075607565231621265
Validation loss = 0.004907290451228619
Validation loss = 0.011366400867700577
Validation loss = 0.0053568072617053986
Validation loss = 0.0064527736976742744
Validation loss = 0.006067460868507624
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009665005840361118
Validation loss = 0.0074484702199697495
Validation loss = 0.00563954608514905
Validation loss = 0.006139670964330435
Validation loss = 0.006875542923808098
Validation loss = 0.007859021425247192
Validation loss = 0.005881721619516611
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008013726212084293
Validation loss = 0.005558872129768133
Validation loss = 0.0055203246884047985
Validation loss = 0.01047607883810997
Validation loss = 0.010149265639483929
Validation loss = 0.008676882833242416
Validation loss = 0.005958696361631155
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010879681445658207
Validation loss = 0.006121261510998011
Validation loss = 0.008135272189974785
Validation loss = 0.010136506520211697
Validation loss = 0.009040684439241886
Validation loss = 0.0059174126945436
Validation loss = 0.014698096551001072
Validation loss = 0.004849330056458712
Validation loss = 0.004934247117489576
Validation loss = 0.005840042605996132
Validation loss = 0.006191185209900141
Validation loss = 0.006572787184268236
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010906185954809189
Validation loss = 0.006045771762728691
Validation loss = 0.013634900562465191
Validation loss = 0.007477388717234135
Validation loss = 0.010293848812580109
Validation loss = 0.006419245153665543
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.5    |
| Iteration     | 10       |
| MaximumReturn | -0.0737  |
| MinimumReturn | -35.7    |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011822985485196114
Validation loss = 0.01570805534720421
Validation loss = 0.008074616082012653
Validation loss = 0.0068554626777768135
Validation loss = 0.005718459375202656
Validation loss = 0.012040252797305584
Validation loss = 0.008601645939052105
Validation loss = 0.0044999513775110245
Validation loss = 0.006206968333572149
Validation loss = 0.005659104790538549
Validation loss = 0.004164048004895449
Validation loss = 0.005069386679679155
Validation loss = 0.005192217417061329
Validation loss = 0.005060217343270779
Validation loss = 0.00816158764064312
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010980905964970589
Validation loss = 0.007267667446285486
Validation loss = 0.011209632270038128
Validation loss = 0.009437576867640018
Validation loss = 0.0067615569569170475
Validation loss = 0.006020680069923401
Validation loss = 0.005418696440756321
Validation loss = 0.007268084678798914
Validation loss = 0.004222015384584665
Validation loss = 0.008322769775986671
Validation loss = 0.004646802321076393
Validation loss = 0.005940491333603859
Validation loss = 0.005469792056828737
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014678570441901684
Validation loss = 0.005761099047958851
Validation loss = 0.007546287961304188
Validation loss = 0.008196813985705376
Validation loss = 0.010327734984457493
Validation loss = 0.006029644515365362
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010829594917595387
Validation loss = 0.006206788122653961
Validation loss = 0.005125357769429684
Validation loss = 0.006548245903104544
Validation loss = 0.00682535907253623
Validation loss = 0.006995783653110266
Validation loss = 0.006170589942485094
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014233460649847984
Validation loss = 0.0048975395038723946
Validation loss = 0.00650609889999032
Validation loss = 0.007120081689208746
Validation loss = 0.005579366814345121
Validation loss = 0.00739726796746254
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0718  |
| Iteration     | 11       |
| MaximumReturn | -0.037   |
| MinimumReturn | -0.124   |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0064120180904865265
Validation loss = 0.007285867817699909
Validation loss = 0.005329768173396587
Validation loss = 0.007927298545837402
Validation loss = 0.005164771340787411
Validation loss = 0.004449930973351002
Validation loss = 0.010793345049023628
Validation loss = 0.005735006649047136
Validation loss = 0.0089722890406847
Validation loss = 0.004751271568238735
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02035827748477459
Validation loss = 0.00668904697522521
Validation loss = 0.008422994054853916
Validation loss = 0.004229710437357426
Validation loss = 0.004333001561462879
Validation loss = 0.004659860860556364
Validation loss = 0.006880498491227627
Validation loss = 0.009961964562535286
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0126038808375597
Validation loss = 0.008795787580311298
Validation loss = 0.0060806116089224815
Validation loss = 0.005176804028451443
Validation loss = 0.006469567772001028
Validation loss = 0.005276319570839405
Validation loss = 0.005076760426163673
Validation loss = 0.0059358058497309685
Validation loss = 0.005886566825211048
Validation loss = 0.006293705198913813
Validation loss = 0.004569434095174074
Validation loss = 0.005756546277552843
Validation loss = 0.011070871725678444
Validation loss = 0.007062231656163931
Validation loss = 0.006934531033039093
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012895499356091022
Validation loss = 0.008640424348413944
Validation loss = 0.004868372343480587
Validation loss = 0.006850662641227245
Validation loss = 0.007902994751930237
Validation loss = 0.006563582923263311
Validation loss = 0.00425931578502059
Validation loss = 0.005114196799695492
Validation loss = 0.005531530827283859
Validation loss = 0.006869778037071228
Validation loss = 0.00543248001486063
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008637203834950924
Validation loss = 0.00907721184194088
Validation loss = 0.009046178311109543
Validation loss = 0.00786803662776947
Validation loss = 0.0041292579844594
Validation loss = 0.008379151113331318
Validation loss = 0.004760367330163717
Validation loss = 0.005272495560348034
Validation loss = 0.005726681090891361
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -100     |
| Iteration     | 12       |
| MaximumReturn | -81.4    |
| MinimumReturn | -109     |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012953492812812328
Validation loss = 0.004738889634609222
Validation loss = 0.0033256136812269688
Validation loss = 0.005750749725848436
Validation loss = 0.00464387284591794
Validation loss = 0.006834466475993395
Validation loss = 0.0033767528366297483
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013590590097010136
Validation loss = 0.006105585023760796
Validation loss = 0.0037314894143491983
Validation loss = 0.004200024530291557
Validation loss = 0.003657266264781356
Validation loss = 0.004244634415954351
Validation loss = 0.006540198344737291
Validation loss = 0.004706906154751778
Validation loss = 0.0067847431637346745
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02109339088201523
Validation loss = 0.0060768770053982735
Validation loss = 0.004162842407822609
Validation loss = 0.005949981976300478
Validation loss = 0.004784639459103346
Validation loss = 0.004938183352351189
Validation loss = 0.009690143167972565
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02074185013771057
Validation loss = 0.006880227942019701
Validation loss = 0.0037397725973278284
Validation loss = 0.003945514559745789
Validation loss = 0.004479534458369017
Validation loss = 0.004163526929914951
Validation loss = 0.0035744979977607727
Validation loss = 0.0047300392761826515
Validation loss = 0.003915673121809959
Validation loss = 0.00602097949013114
Validation loss = 0.004186397418379784
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014873393811285496
Validation loss = 0.005062641575932503
Validation loss = 0.006971588358283043
Validation loss = 0.0036208683159202337
Validation loss = 0.004760788753628731
Validation loss = 0.0036087073385715485
Validation loss = 0.003951937425881624
Validation loss = 0.004745647311210632
Validation loss = 0.0037052640691399574
Validation loss = 0.009103839285671711
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -64.7    |
| Iteration     | 13       |
| MaximumReturn | -44.1    |
| MinimumReturn | -85.9    |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006254528183490038
Validation loss = 0.004700314719229937
Validation loss = 0.0034465419594198465
Validation loss = 0.004018811974674463
Validation loss = 0.006916425656527281
Validation loss = 0.003310221480205655
Validation loss = 0.004337092861533165
Validation loss = 0.0037025318015366793
Validation loss = 0.003555161179974675
Validation loss = 0.0032613463699817657
Validation loss = 0.006927784066647291
Validation loss = 0.00436040572822094
Validation loss = 0.0038481128867715597
Validation loss = 0.003542880527675152
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012548909522593021
Validation loss = 0.005560426507145166
Validation loss = 0.005937348585575819
Validation loss = 0.0034480940084904432
Validation loss = 0.0035388681571930647
Validation loss = 0.0047682481817901134
Validation loss = 0.004463102202862501
Validation loss = 0.0036979420110583305
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01526492740958929
Validation loss = 0.005040242802351713
Validation loss = 0.004068697802722454
Validation loss = 0.005297364201396704
Validation loss = 0.0038658659905195236
Validation loss = 0.004110123496502638
Validation loss = 0.005133334081619978
Validation loss = 0.004403164144605398
Validation loss = 0.005054328124970198
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011707422323524952
Validation loss = 0.0049097128212451935
Validation loss = 0.0035558214876800776
Validation loss = 0.009389818646013737
Validation loss = 0.0038241397123783827
Validation loss = 0.004642844200134277
Validation loss = 0.003322441363707185
Validation loss = 0.004645667504519224
Validation loss = 0.003117985324934125
Validation loss = 0.0067073204554617405
Validation loss = 0.0035836268216371536
Validation loss = 0.003837011521682143
Validation loss = 0.003283621044829488
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008469394408166409
Validation loss = 0.0037801226135343313
Validation loss = 0.0034702932462096214
Validation loss = 0.005287452135235071
Validation loss = 0.005492817144840956
Validation loss = 0.003427331568673253
Validation loss = 0.003482260974124074
Validation loss = 0.003667206736281514
Validation loss = 0.0031220782548189163
Validation loss = 0.0037154133897274733
Validation loss = 0.003808090230450034
Validation loss = 0.0039200312457978725
Validation loss = 0.00561286648735404
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.63    |
| Iteration     | 14       |
| MaximumReturn | -0.0447  |
| MinimumReturn | -13.3    |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008029495365917683
Validation loss = 0.00601551216095686
Validation loss = 0.004618512466549873
Validation loss = 0.005604124628007412
Validation loss = 0.00303158862516284
Validation loss = 0.00407498748973012
Validation loss = 0.0032132130581885576
Validation loss = 0.003618184244260192
Validation loss = 0.0033263815566897392
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004335850477218628
Validation loss = 0.004161791410297155
Validation loss = 0.004520502872765064
Validation loss = 0.004030974116176367
Validation loss = 0.004266583826392889
Validation loss = 0.005059498827904463
Validation loss = 0.0029347985982894897
Validation loss = 0.003851896384730935
Validation loss = 0.005802375264465809
Validation loss = 0.003328816732391715
Validation loss = 0.005498551297932863
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009231476113200188
Validation loss = 0.010215365327894688
Validation loss = 0.004846860654652119
Validation loss = 0.00666018296033144
Validation loss = 0.005020636599510908
Validation loss = 0.004374735988676548
Validation loss = 0.005167476367205381
Validation loss = 0.004007479175925255
Validation loss = 0.003435090184211731
Validation loss = 0.0033431926276534796
Validation loss = 0.0043241288512945175
Validation loss = 0.004037537146359682
Validation loss = 0.005275224335491657
Validation loss = 0.004264787305146456
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007143684662878513
Validation loss = 0.003914153203368187
Validation loss = 0.0034692236222326756
Validation loss = 0.0038436842150986195
Validation loss = 0.00498855859041214
Validation loss = 0.006146193481981754
Validation loss = 0.0037301566917449236
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0144041758030653
Validation loss = 0.005711508449167013
Validation loss = 0.0037305490113794804
Validation loss = 0.0037381541915237904
Validation loss = 0.004064845386892557
Validation loss = 0.004218247253447771
Validation loss = 0.0037815358955413103
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00229 |
| Iteration     | 15       |
| MaximumReturn | -0.00175 |
| MinimumReturn | -0.00286 |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0056392415426671505
Validation loss = 0.005509753245860338
Validation loss = 0.003953748382627964
Validation loss = 0.0044610039331018925
Validation loss = 0.006605975329875946
Validation loss = 0.002935076365247369
Validation loss = 0.006252720020711422
Validation loss = 0.005617150105535984
Validation loss = 0.003905393648892641
Validation loss = 0.006337822414934635
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008569324389100075
Validation loss = 0.0031821958255022764
Validation loss = 0.005081640090793371
Validation loss = 0.004648812115192413
Validation loss = 0.0034435291308909655
Validation loss = 0.009127886034548283
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005931212101131678
Validation loss = 0.004186608828604221
Validation loss = 0.004042529501020908
Validation loss = 0.004339300561696291
Validation loss = 0.00473759463056922
Validation loss = 0.004771762993186712
Validation loss = 0.007352322340011597
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0053711230866611
Validation loss = 0.003948681056499481
Validation loss = 0.004102539271116257
Validation loss = 0.005582371260970831
Validation loss = 0.0029245305340737104
Validation loss = 0.0042388723231852055
Validation loss = 0.004930160474032164
Validation loss = 0.004665655549615622
Validation loss = 0.003807817352935672
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006866644136607647
Validation loss = 0.0050821867771446705
Validation loss = 0.004000449553132057
Validation loss = 0.004730186425149441
Validation loss = 0.0066825211979448795
Validation loss = 0.003674070816487074
Validation loss = 0.0033677420578897
Validation loss = 0.004844839684665203
Validation loss = 0.0031872657127678394
Validation loss = 0.00804698746651411
Validation loss = 0.003648706478998065
Validation loss = 0.0032132200431078672
Validation loss = 0.005603193771094084
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00197  |
| Iteration     | 16        |
| MaximumReturn | -0.000729 |
| MinimumReturn | -0.0215   |
| TotalSamples  | 29988     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006944891531020403
Validation loss = 0.004272495396435261
Validation loss = 0.004312637262046337
Validation loss = 0.003681421745568514
Validation loss = 0.00397331640124321
Validation loss = 0.0037723896093666553
Validation loss = 0.004134788643568754
Validation loss = 0.0055985720828175545
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005642719101160765
Validation loss = 0.0041343276388943195
Validation loss = 0.005440588109195232
Validation loss = 0.0057012783363461494
Validation loss = 0.003656201995909214
Validation loss = 0.004393463488668203
Validation loss = 0.012524119578301907
Validation loss = 0.0050589595921337605
Validation loss = 0.004557922016829252
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004490254446864128
Validation loss = 0.0035318350419402122
Validation loss = 0.0029767260421067476
Validation loss = 0.0031975258607417345
Validation loss = 0.010689947754144669
Validation loss = 0.0042972746305167675
Validation loss = 0.004566533956676722
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007729426957666874
Validation loss = 0.005486724432557821
Validation loss = 0.004447567742317915
Validation loss = 0.0054027009755373
Validation loss = 0.003742499742656946
Validation loss = 0.0034717582166194916
Validation loss = 0.0038118362426757812
Validation loss = 0.004729146137833595
Validation loss = 0.003143852110952139
Validation loss = 0.0038547362200915813
Validation loss = 0.004585206974297762
Validation loss = 0.0035722574684768915
Validation loss = 0.005275169853121042
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009568113833665848
Validation loss = 0.0030361523386090994
Validation loss = 0.0035070579033344984
Validation loss = 0.005775092635303736
Validation loss = 0.004089061636477709
Validation loss = 0.0034762730356305838
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0232  |
| Iteration     | 17       |
| MaximumReturn | -0.00359 |
| MinimumReturn | -0.0624  |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009637748822569847
Validation loss = 0.005635435692965984
Validation loss = 0.0027952950913459063
Validation loss = 0.002981652272865176
Validation loss = 0.003242307109758258
Validation loss = 0.0036996991839259863
Validation loss = 0.0027737529017031193
Validation loss = 0.002855906030163169
Validation loss = 0.004013545345515013
Validation loss = 0.004188180435448885
Validation loss = 0.0033272465225309134
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003759833751246333
Validation loss = 0.002785664051771164
Validation loss = 0.004139848984777927
Validation loss = 0.003113767597824335
Validation loss = 0.003956727217882872
Validation loss = 0.0037633611354976892
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005609628278762102
Validation loss = 0.0025896013248711824
Validation loss = 0.002920391270890832
Validation loss = 0.003131744684651494
Validation loss = 0.0024994045961648226
Validation loss = 0.003988912794739008
Validation loss = 0.003172490978613496
Validation loss = 0.003889404470100999
Validation loss = 0.0048586418852210045
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003996805287897587
Validation loss = 0.002517211949452758
Validation loss = 0.004482823424041271
Validation loss = 0.0038780169561505318
Validation loss = 0.005182013846933842
Validation loss = 0.00458493223413825
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006023027468472719
Validation loss = 0.0040416838601231575
Validation loss = 0.003342034062370658
Validation loss = 0.0032463399693369865
Validation loss = 0.004725067876279354
Validation loss = 0.003899762174114585
Validation loss = 0.002839028835296631
Validation loss = 0.0034661174286156893
Validation loss = 0.002982053440064192
Validation loss = 0.003872393863275647
Validation loss = 0.004116421565413475
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -34.1    |
| Iteration     | 18       |
| MaximumReturn | -0.108   |
| MinimumReturn | -68.9    |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0062652695924043655
Validation loss = 0.0027395151555538177
Validation loss = 0.0026501077227294445
Validation loss = 0.002557381521910429
Validation loss = 0.0025722624268382788
Validation loss = 0.0032525784336030483
Validation loss = 0.003508907277137041
Validation loss = 0.0030252484139055014
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0047430796548724174
Validation loss = 0.0023667123168706894
Validation loss = 0.0026631210930645466
Validation loss = 0.003780280239880085
Validation loss = 0.0031708432361483574
Validation loss = 0.0023562253918498755
Validation loss = 0.0054108379408717155
Validation loss = 0.0024672073777765036
Validation loss = 0.0034469838719815016
Validation loss = 0.0025118435733020306
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015478486195206642
Validation loss = 0.0033399402163922787
Validation loss = 0.0035138269886374474
Validation loss = 0.002771558240056038
Validation loss = 0.0038859331980347633
Validation loss = 0.0031710707116872072
Validation loss = 0.004694728646427393
Validation loss = 0.003257579868659377
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005969919729977846
Validation loss = 0.0027307779528200626
Validation loss = 0.002229384146630764
Validation loss = 0.0027392120100557804
Validation loss = 0.0029509221203625202
Validation loss = 0.0038459496572613716
Validation loss = 0.0027391088660806417
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005606143735349178
Validation loss = 0.004494840279221535
Validation loss = 0.004927522502839565
Validation loss = 0.0034215631894767284
Validation loss = 0.002546407748013735
Validation loss = 0.002353579504415393
Validation loss = 0.0025159353390336037
Validation loss = 0.0025719653349369764
Validation loss = 0.003284633858129382
Validation loss = 0.002615583362057805
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.72    |
| Iteration     | 19       |
| MaximumReturn | -0.0331  |
| MinimumReturn | -30.1    |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002912355586886406
Validation loss = 0.007925380021333694
Validation loss = 0.0027654387522488832
Validation loss = 0.002685925457626581
Validation loss = 0.00303807039745152
Validation loss = 0.002890867181122303
Validation loss = 0.004605989903211594
Validation loss = 0.002365743275731802
Validation loss = 0.0028069501277059317
Validation loss = 0.005077695939689875
Validation loss = 0.005800810642540455
Validation loss = 0.002543749287724495
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004898194223642349
Validation loss = 0.0033105670008808374
Validation loss = 0.0041077821515500546
Validation loss = 0.004146701656281948
Validation loss = 0.0026951339095830917
Validation loss = 0.0027347090654075146
Validation loss = 0.002428928855806589
Validation loss = 0.0030074038077145815
Validation loss = 0.0023663993924856186
Validation loss = 0.0024143834598362446
Validation loss = 0.00295357801951468
Validation loss = 0.0024965840857475996
Validation loss = 0.0030921832658350468
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007754303980618715
Validation loss = 0.0031468106899410486
Validation loss = 0.0031123547814786434
Validation loss = 0.002862414112314582
Validation loss = 0.002534269355237484
Validation loss = 0.0023855757899582386
Validation loss = 0.003949436824768782
Validation loss = 0.0035495376214385033
Validation loss = 0.004026534501463175
Validation loss = 0.002427254803478718
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0031932371202856302
Validation loss = 0.0029085264541208744
Validation loss = 0.0026442264206707478
Validation loss = 0.0024986013304442167
Validation loss = 0.0039000248070806265
Validation loss = 0.003771538147702813
Validation loss = 0.004395540803670883
Validation loss = 0.002863379893824458
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0027631698176264763
Validation loss = 0.0023902608081698418
Validation loss = 0.0038646406028419733
Validation loss = 0.0026105830911546946
Validation loss = 0.003694219281896949
Validation loss = 0.00268829264678061
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.27    |
| Iteration     | 20       |
| MaximumReturn | -0.0335  |
| MinimumReturn | -67      |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002581328386440873
Validation loss = 0.0025751080829650164
Validation loss = 0.004587726667523384
Validation loss = 0.0033794944174587727
Validation loss = 0.0024424688890576363
Validation loss = 0.003156704595312476
Validation loss = 0.0034122534561902285
Validation loss = 0.0027807250153273344
Validation loss = 0.0030238174367696047
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003244734136387706
Validation loss = 0.002712755464017391
Validation loss = 0.0030109258368611336
Validation loss = 0.002235833089798689
Validation loss = 0.004252834245562553
Validation loss = 0.0023990152403712273
Validation loss = 0.002834140323102474
Validation loss = 0.002947681350633502
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003840077668428421
Validation loss = 0.0034987872932106256
Validation loss = 0.003385021351277828
Validation loss = 0.0033536190167069435
Validation loss = 0.006997481919825077
Validation loss = 0.003512370865792036
Validation loss = 0.002300517400726676
Validation loss = 0.002576562575995922
Validation loss = 0.0036927228793501854
Validation loss = 0.002534797415137291
Validation loss = 0.0067322878167033195
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0033303878735750914
Validation loss = 0.003330620238557458
Validation loss = 0.0023796483874320984
Validation loss = 0.002570268465206027
Validation loss = 0.002446884987875819
Validation loss = 0.0040648700669407845
Validation loss = 0.0034344810992479324
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003905281890183687
Validation loss = 0.0021109762601554394
Validation loss = 0.002512771636247635
Validation loss = 0.002261199988424778
Validation loss = 0.0027870710473507643
Validation loss = 0.0020217096898704767
Validation loss = 0.0034272349439561367
Validation loss = 0.00205271621234715
Validation loss = 0.0034470632672309875
Validation loss = 0.002238547895103693
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -21.6    |
| Iteration     | 21       |
| MaximumReturn | -0.134   |
| MinimumReturn | -72.4    |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0033124347683042288
Validation loss = 0.0023458502255380154
Validation loss = 0.0022420119494199753
Validation loss = 0.0023200202267616987
Validation loss = 0.0031090232077986
Validation loss = 0.002206298755481839
Validation loss = 0.003036394016817212
Validation loss = 0.0025563423987478018
Validation loss = 0.002904243068769574
Validation loss = 0.002450331347063184
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003022031858563423
Validation loss = 0.0023602936416864395
Validation loss = 0.003479570848867297
Validation loss = 0.002147706225514412
Validation loss = 0.003609923180192709
Validation loss = 0.0038813003338873386
Validation loss = 0.0026874677278101444
Validation loss = 0.003537769429385662
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004326595924794674
Validation loss = 0.003983897622674704
Validation loss = 0.0023745775688439608
Validation loss = 0.0028105711098760366
Validation loss = 0.003460589563474059
Validation loss = 0.0019491956336423755
Validation loss = 0.003596663009375334
Validation loss = 0.0021459453273564577
Validation loss = 0.007332492619752884
Validation loss = 0.0043984390795230865
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002990351291373372
Validation loss = 0.0023413822054862976
Validation loss = 0.00265154498629272
Validation loss = 0.002480501774698496
Validation loss = 0.0022397751454263926
Validation loss = 0.004892136435955763
Validation loss = 0.0032119345851242542
Validation loss = 0.002279478358104825
Validation loss = 0.002691621193662286
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0038773079868406057
Validation loss = 0.003557790769264102
Validation loss = 0.00381516944617033
Validation loss = 0.0024663021322339773
Validation loss = 0.0026616151444613934
Validation loss = 0.0019965663086622953
Validation loss = 0.002531052567064762
Validation loss = 0.003289818298071623
Validation loss = 0.00325818732380867
Validation loss = 0.0025780699215829372
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -16.5    |
| Iteration     | 22       |
| MaximumReturn | -0.0771  |
| MinimumReturn | -60.3    |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0031440879683941603
Validation loss = 0.0027447366155683994
Validation loss = 0.0029394717421382666
Validation loss = 0.0032041952945291996
Validation loss = 0.0026435796171426773
Validation loss = 0.00200355751439929
Validation loss = 0.002658676356077194
Validation loss = 0.002284483751282096
Validation loss = 0.0021737306378781796
Validation loss = 0.0035459778737276793
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0026945939753204584
Validation loss = 0.0026300139725208282
Validation loss = 0.0018457341939210892
Validation loss = 0.0031639437656849623
Validation loss = 0.0024115501437336206
Validation loss = 0.0032613270450383425
Validation loss = 0.0031955502927303314
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0031632804311811924
Validation loss = 0.003166708629578352
Validation loss = 0.004076552577316761
Validation loss = 0.0028019824530929327
Validation loss = 0.0031375600956380367
Validation loss = 0.0027092737145721912
Validation loss = 0.0033011813648045063
Validation loss = 0.0020249197259545326
Validation loss = 0.002195333130657673
Validation loss = 0.0018751624738797545
Validation loss = 0.013307109475135803
Validation loss = 0.003354822052642703
Validation loss = 0.0018184359651058912
Validation loss = 0.0021582143381237984
Validation loss = 0.0022276744712144136
Validation loss = 0.0027925779577344656
Validation loss = 0.0025337268598377705
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002628863090649247
Validation loss = 0.0020958215463906527
Validation loss = 0.0026276777498424053
Validation loss = 0.002288804855197668
Validation loss = 0.003031013300642371
Validation loss = 0.001827672473154962
Validation loss = 0.003643640549853444
Validation loss = 0.004401812329888344
Validation loss = 0.0024079042486846447
Validation loss = 0.0033511179499328136
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0042067463509738445
Validation loss = 0.0025694037321954966
Validation loss = 0.0021152023691684008
Validation loss = 0.002044498221948743
Validation loss = 0.0021201069466769695
Validation loss = 0.004703727550804615
Validation loss = 0.0019959339406341314
Validation loss = 0.004264848306775093
Validation loss = 0.0018655704334378242
Validation loss = 0.0033956305123865604
Validation loss = 0.0020118572283536196
Validation loss = 0.00214603403583169
Validation loss = 0.0023847525008022785
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00653 |
| Iteration     | 23       |
| MaximumReturn | -0.00432 |
| MinimumReturn | -0.0183  |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002377380384132266
Validation loss = 0.0018836141098290682
Validation loss = 0.003565707476809621
Validation loss = 0.006003128830343485
Validation loss = 0.0025021997280418873
Validation loss = 0.00342579185962677
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003165376838296652
Validation loss = 0.003018661867827177
Validation loss = 0.002785841468721628
Validation loss = 0.0027014208026230335
Validation loss = 0.0022830856032669544
Validation loss = 0.003378016874194145
Validation loss = 0.0020901046227663755
Validation loss = 0.0028300643898546696
Validation loss = 0.002244671108201146
Validation loss = 0.002707625273615122
Validation loss = 0.002629829104989767
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0030523831956088543
Validation loss = 0.002767916303128004
Validation loss = 0.002490270882844925
Validation loss = 0.0027691286522895098
Validation loss = 0.003055906156077981
Validation loss = 0.002084175357595086
Validation loss = 0.0028264725115150213
Validation loss = 0.00828862190246582
Validation loss = 0.0019469361286610365
Validation loss = 0.002130612963810563
Validation loss = 0.0021924152970314026
Validation loss = 0.0031783278100192547
Validation loss = 0.00274636154063046
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005543106235563755
Validation loss = 0.002980007091537118
Validation loss = 0.0026463870890438557
Validation loss = 0.0029477016068995
Validation loss = 0.002268511336296797
Validation loss = 0.0024594992864876986
Validation loss = 0.0026148040778934956
Validation loss = 0.0030923732556402683
Validation loss = 0.0025206576101481915
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023107375018298626
Validation loss = 0.0027969428338110447
Validation loss = 0.0023181752767413855
Validation loss = 0.002600234467536211
Validation loss = 0.0020916168577969074
Validation loss = 0.002904030727222562
Validation loss = 0.004010417032986879
Validation loss = 0.002292582066729665
Validation loss = 0.0025041447952389717
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -52.4    |
| Iteration     | 24       |
| MaximumReturn | -0.0943  |
| MinimumReturn | -104     |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0022518308833241463
Validation loss = 0.0017579200211912394
Validation loss = 0.002307847375050187
Validation loss = 0.0021413143258541822
Validation loss = 0.0018531025853008032
Validation loss = 0.0029009650461375713
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005564550403505564
Validation loss = 0.00198403955437243
Validation loss = 0.0018437451217323542
Validation loss = 0.0021960181184113026
Validation loss = 0.00252211163751781
Validation loss = 0.0020410961005836725
Validation loss = 0.0023575597442686558
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021710891742259264
Validation loss = 0.002491910243406892
Validation loss = 0.0021087212953716516
Validation loss = 0.0026462418027222157
Validation loss = 0.0024987231008708477
Validation loss = 0.0027829082682728767
Validation loss = 0.001975859282538295
Validation loss = 0.002650212263688445
Validation loss = 0.0018706744303926826
Validation loss = 0.0029129108879715204
Validation loss = 0.00198711222037673
Validation loss = 0.001733332290314138
Validation loss = 0.002196658868342638
Validation loss = 0.003044780110940337
Validation loss = 0.002200172282755375
Validation loss = 0.001978529617190361
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0035928909201174974
Validation loss = 0.0024892885703593493
Validation loss = 0.0024696902837604284
Validation loss = 0.0018106895731762052
Validation loss = 0.002202124334871769
Validation loss = 0.002210373757407069
Validation loss = 0.0023337057791650295
Validation loss = 0.001918453024700284
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004199942108243704
Validation loss = 0.003629223210737109
Validation loss = 0.00206626090221107
Validation loss = 0.0019022879423573613
Validation loss = 0.0018924511969089508
Validation loss = 0.0026957381051033735
Validation loss = 0.0018907005432993174
Validation loss = 0.002110303845256567
Validation loss = 0.0023767033126205206
Validation loss = 0.0027455680537968874
Validation loss = 0.0018196338787674904
Validation loss = 0.0031074951402843
Validation loss = 0.0016114404425024986
Validation loss = 0.002208191668614745
Validation loss = 0.0028907035011798143
Validation loss = 0.0019386112689971924
Validation loss = 0.001842867350205779
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.38    |
| Iteration     | 25       |
| MaximumReturn | -0.00125 |
| MinimumReturn | -39.9    |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0034867634531110525
Validation loss = 0.0019336872501298785
Validation loss = 0.002796486020088196
Validation loss = 0.0023040694650262594
Validation loss = 0.002024464076384902
Validation loss = 0.0023155969101935625
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005832013674080372
Validation loss = 0.0019892423879355192
Validation loss = 0.00303971697576344
Validation loss = 0.0018847316969186068
Validation loss = 0.002560931723564863
Validation loss = 0.0017797481268644333
Validation loss = 0.0020103915594518185
Validation loss = 0.0019218415254727006
Validation loss = 0.0019754129461944103
Validation loss = 0.0018476230325177312
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0020010618027299643
Validation loss = 0.002478182315826416
Validation loss = 0.002856410341337323
Validation loss = 0.002076344331726432
Validation loss = 0.0021279556676745415
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0021354961208999157
Validation loss = 0.0017411973094567657
Validation loss = 0.001734371529892087
Validation loss = 0.001821807469241321
Validation loss = 0.0031582529190927744
Validation loss = 0.002383006038144231
Validation loss = 0.0022326128091663122
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003961289767175913
Validation loss = 0.0023437978234142065
Validation loss = 0.0020092104095965624
Validation loss = 0.001814918126910925
Validation loss = 0.0025885440409183502
Validation loss = 0.0024387838784605265
Validation loss = 0.0018260172801092267
Validation loss = 0.0035222924780100584
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.19     |
| Iteration     | 26        |
| MaximumReturn | -0.000872 |
| MinimumReturn | -20.3     |
| TotalSamples  | 46648     |
-----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0022600500378757715
Validation loss = 0.0022762410808354616
Validation loss = 0.0020238838624209166
Validation loss = 0.003132315119728446
Validation loss = 0.0029655720572918653
Validation loss = 0.002471789252012968
Validation loss = 0.0027224032673984766
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00245493371039629
Validation loss = 0.0020913619082421064
Validation loss = 0.0028687117155641317
Validation loss = 0.0020998676773160696
Validation loss = 0.001862526754848659
Validation loss = 0.0015698496717959642
Validation loss = 0.002427618717774749
Validation loss = 0.0031657919753342867
Validation loss = 0.0017308687092736363
Validation loss = 0.0016315445536747575
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00439623324200511
Validation loss = 0.0017728511011227965
Validation loss = 0.002040875842794776
Validation loss = 0.0016790296649560332
Validation loss = 0.0020334862638264894
Validation loss = 0.002564670518040657
Validation loss = 0.002384832361713052
Validation loss = 0.0016203295672312379
Validation loss = 0.0017968183383345604
Validation loss = 0.0019417741568759084
Validation loss = 0.0022564346436411142
Validation loss = 0.0022055632434785366
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0029907359275966883
Validation loss = 0.0020818000193685293
Validation loss = 0.0018946139607578516
Validation loss = 0.002674748422577977
Validation loss = 0.002214513486251235
Validation loss = 0.0023824924137443304
Validation loss = 0.0017052182229235768
Validation loss = 0.0029052074532955885
Validation loss = 0.0019949134439229965
Validation loss = 0.0017188024939969182
Validation loss = 0.002117762342095375
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0022433155681937933
Validation loss = 0.002861459506675601
Validation loss = 0.002397694857791066
Validation loss = 0.0032090130262076855
Validation loss = 0.0015888545894995332
Validation loss = 0.0027123328763991594
Validation loss = 0.0017168946797028184
Validation loss = 0.0020869832951575518
Validation loss = 0.001826368272304535
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.3     |
| Iteration     | 27       |
| MaximumReturn | -0.0516  |
| MinimumReturn | -40.2    |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016158801736310124
Validation loss = 0.00193116907030344
Validation loss = 0.0018320552771911025
Validation loss = 0.0022693059872835875
Validation loss = 0.0018520309822633862
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002038056030869484
Validation loss = 0.0015020134160295129
Validation loss = 0.00208092387765646
Validation loss = 0.0017686669016256928
Validation loss = 0.001635360880754888
Validation loss = 0.0018728311406448483
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015730232698842883
Validation loss = 0.0020808137487620115
Validation loss = 0.0017504064599052072
Validation loss = 0.002197387395426631
Validation loss = 0.0018964163027703762
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0020706206560134888
Validation loss = 0.0022569000720977783
Validation loss = 0.0016567958518862724
Validation loss = 0.0022424401249736547
Validation loss = 0.002819392830133438
Validation loss = 0.0018741250969469547
Validation loss = 0.0025727718602865934
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0020749391987919807
Validation loss = 0.0016932968283072114
Validation loss = 0.002618687227368355
Validation loss = 0.002195977605879307
Validation loss = 0.001683745882473886
Validation loss = 0.0031478393357247114
Validation loss = 0.001809517270885408
Validation loss = 0.0018043542513623834
Validation loss = 0.001869343570433557
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -13.5    |
| Iteration     | 28       |
| MaximumReturn | -0.00247 |
| MinimumReturn | -52.5    |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021500205621123314
Validation loss = 0.0050318604335188866
Validation loss = 0.0026206562761217356
Validation loss = 0.0017528221942484379
Validation loss = 0.002428055042400956
Validation loss = 0.002075768308714032
Validation loss = 0.0021906737238168716
Validation loss = 0.0023525357246398926
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014737611636519432
Validation loss = 0.0013908015098422766
Validation loss = 0.0015669310232624412
Validation loss = 0.002175606321543455
Validation loss = 0.0018579729367047548
Validation loss = 0.001774835167452693
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0024437326937913895
Validation loss = 0.0024326404090970755
Validation loss = 0.0016323304735124111
Validation loss = 0.0017607201589271426
Validation loss = 0.0026265326887369156
Validation loss = 0.0020874429028481245
Validation loss = 0.001830038265325129
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0021117678843438625
Validation loss = 0.0015779237728565931
Validation loss = 0.0015695024048909545
Validation loss = 0.0019119507633149624
Validation loss = 0.002138775773346424
Validation loss = 0.0018137656152248383
Validation loss = 0.0015788349555805326
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014911158941686153
Validation loss = 0.0013261677231639624
Validation loss = 0.0017143561271950603
Validation loss = 0.0035176074597984552
Validation loss = 0.0015183858340606093
Validation loss = 0.0018247081898152828
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -11.6     |
| Iteration     | 29        |
| MaximumReturn | -0.000981 |
| MinimumReturn | -65.7     |
| TotalSamples  | 51646     |
-----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002628108486533165
Validation loss = 0.0027561860624700785
Validation loss = 0.004086493980139494
Validation loss = 0.0014208449283614755
Validation loss = 0.0013804025948047638
Validation loss = 0.0023365176748484373
Validation loss = 0.0013977766502648592
Validation loss = 0.0015555898426100612
Validation loss = 0.0019216096261516213
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002053037751466036
Validation loss = 0.001462416024878621
Validation loss = 0.0018150517717003822
Validation loss = 0.0014448999427258968
Validation loss = 0.002695185597985983
Validation loss = 0.002260895911604166
Validation loss = 0.0017744839424267411
Validation loss = 0.0016999277286231518
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002336470177397132
Validation loss = 0.00163253431674093
Validation loss = 0.0015569545794278383
Validation loss = 0.0016224473947659135
Validation loss = 0.0015135608846321702
Validation loss = 0.0017982751596719027
Validation loss = 0.0021518024150282145
Validation loss = 0.0018537035211920738
Validation loss = 0.002667646389454603
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017980141565203667
Validation loss = 0.001541402656584978
Validation loss = 0.0016426598886027932
Validation loss = 0.0022969855926930904
Validation loss = 0.0021183297503739595
Validation loss = 0.001498302910476923
Validation loss = 0.0032426307443529367
Validation loss = 0.0013719425769522786
Validation loss = 0.0018986621871590614
Validation loss = 0.0017523820279166102
Validation loss = 0.001768258516676724
Validation loss = 0.0017066848231479526
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0022241349797695875
Validation loss = 0.001462593674659729
Validation loss = 0.0016611553728580475
Validation loss = 0.002104185987263918
Validation loss = 0.001700210734270513
Validation loss = 0.001629913691431284
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.07    |
| Iteration     | 30       |
| MaximumReturn | -0.00069 |
| MinimumReturn | -52.5    |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0026022084057331085
Validation loss = 0.0017545708687976003
Validation loss = 0.0015908990753814578
Validation loss = 0.0017057915683835745
Validation loss = 0.0016471290728077292
Validation loss = 0.0020566803868860006
Validation loss = 0.0016458475729450583
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015671217115595937
Validation loss = 0.001745637971907854
Validation loss = 0.0022087404504418373
Validation loss = 0.002491266466677189
Validation loss = 0.0013278730912134051
Validation loss = 0.0014437163481488824
Validation loss = 0.0019492021529003978
Validation loss = 0.0015115996357053518
Validation loss = 0.003149483585730195
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021098197903484106
Validation loss = 0.001432124525308609
Validation loss = 0.0014879262307658792
Validation loss = 0.002819895977154374
Validation loss = 0.0013532473240047693
Validation loss = 0.0025082470383495092
Validation loss = 0.0018864350859075785
Validation loss = 0.0014609401114284992
Validation loss = 0.0018308449070900679
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001681054593063891
Validation loss = 0.0016254064394161105
Validation loss = 0.0022747849579900503
Validation loss = 0.0017454022308811545
Validation loss = 0.002100344281643629
Validation loss = 0.0027999873273074627
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0019780753646045923
Validation loss = 0.00146455317735672
Validation loss = 0.0015895923133939505
Validation loss = 0.0015829072799533606
Validation loss = 0.0022439821623265743
Validation loss = 0.0015766366850584745
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -81.9    |
| Iteration     | 31       |
| MaximumReturn | -46.4    |
| MinimumReturn | -106     |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00693471310660243
Validation loss = 0.001313149114139378
Validation loss = 0.0015682486118748784
Validation loss = 0.0015478512505069375
Validation loss = 0.0015935893170535564
Validation loss = 0.0024474982637912035
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0041002328507602215
Validation loss = 0.0016929603880271316
Validation loss = 0.0016113201854750514
Validation loss = 0.002392367459833622
Validation loss = 0.0016767238266766071
Validation loss = 0.0013010787079110742
Validation loss = 0.0014224020997062325
Validation loss = 0.0014530039625242352
Validation loss = 0.0018516473937779665
Validation loss = 0.001887326710857451
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018626457313075662
Validation loss = 0.0017573920777067542
Validation loss = 0.0016833373811095953
Validation loss = 0.0017025640700012445
Validation loss = 0.0017089148750528693
Validation loss = 0.00271727261133492
Validation loss = 0.0017708450322970748
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0034691551700234413
Validation loss = 0.001807167544029653
Validation loss = 0.0017753943102434278
Validation loss = 0.0015927047934383154
Validation loss = 0.001557431649416685
Validation loss = 0.0013652390334755182
Validation loss = 0.001315972418524325
Validation loss = 0.0019393102265894413
Validation loss = 0.0017521748086437583
Validation loss = 0.0020553080830723047
Validation loss = 0.0015175496228039265
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0030384764540940523
Validation loss = 0.0017511919140815735
Validation loss = 0.0012111753458157182
Validation loss = 0.001297950861044228
Validation loss = 0.0013363490579649806
Validation loss = 0.0023687744978815317
Validation loss = 0.002039628103375435
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -49.1    |
| Iteration     | 32       |
| MaximumReturn | -0.00354 |
| MinimumReturn | -110     |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016160205705091357
Validation loss = 0.0013738134875893593
Validation loss = 0.0021108377259224653
Validation loss = 0.0017427325947210193
Validation loss = 0.0013321327278390527
Validation loss = 0.0015462868614122272
Validation loss = 0.0019073522416874766
Validation loss = 0.0021574008278548717
Validation loss = 0.0015165174845606089
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002361319260671735
Validation loss = 0.0013433931162580848
Validation loss = 0.001394483377225697
Validation loss = 0.0021154689602553844
Validation loss = 0.002922683721408248
Validation loss = 0.0019291127100586891
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015359834069386125
Validation loss = 0.0015511995879933238
Validation loss = 0.0016925960080698133
Validation loss = 0.0013316560070961714
Validation loss = 0.0015318852383643389
Validation loss = 0.001840750570409
Validation loss = 0.0017390652792528272
Validation loss = 0.0018200691556558013
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014533463399857283
Validation loss = 0.0014666777569800615
Validation loss = 0.0019419315503910184
Validation loss = 0.0016919550253078341
Validation loss = 0.0013764071045443416
Validation loss = 0.001828602864407003
Validation loss = 0.0012739136582240462
Validation loss = 0.001541007892228663
Validation loss = 0.0026263133622705936
Validation loss = 0.0013608428416773677
Validation loss = 0.001804901985451579
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001513410941697657
Validation loss = 0.0013399856397882104
Validation loss = 0.0012630739947780967
Validation loss = 0.0015798339154571295
Validation loss = 0.0016984955873340368
Validation loss = 0.0015606144443154335
Validation loss = 0.0014717195881530643
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -70.7    |
| Iteration     | 33       |
| MaximumReturn | -6.93    |
| MinimumReturn | -107     |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002221366623416543
Validation loss = 0.0012530562235042453
Validation loss = 0.001223913044668734
Validation loss = 0.0017738108290359378
Validation loss = 0.0011853344039991498
Validation loss = 0.001124922069720924
Validation loss = 0.001170747447758913
Validation loss = 0.0013018909376114607
Validation loss = 0.0012375343358144164
Validation loss = 0.001241666148416698
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0027969861403107643
Validation loss = 0.001634886022657156
Validation loss = 0.0012377066304907203
Validation loss = 0.0014161966973915696
Validation loss = 0.00197090907022357
Validation loss = 0.0016265438171103597
Validation loss = 0.001682675676420331
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0033979834988713264
Validation loss = 0.001153401331976056
Validation loss = 0.0018394963117316365
Validation loss = 0.0012858572881668806
Validation loss = 0.001227649045176804
Validation loss = 0.001491851988248527
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005399217363446951
Validation loss = 0.0013132140738889575
Validation loss = 0.0015828785253688693
Validation loss = 0.001733524026349187
Validation loss = 0.0013451504055410624
Validation loss = 0.0012802336132153869
Validation loss = 0.0013075339375063777
Validation loss = 0.0013788824435323477
Validation loss = 0.0011156306136399508
Validation loss = 0.0018414469668641686
Validation loss = 0.0013818617444485426
Validation loss = 0.0015981349861249328
Validation loss = 0.0013046993408352137
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0026340270414948463
Validation loss = 0.0015495790867134929
Validation loss = 0.0017080960096791387
Validation loss = 0.001249975524842739
Validation loss = 0.002100761514157057
Validation loss = 0.0013861129991710186
Validation loss = 0.0013000101316720247
Validation loss = 0.001368763973005116
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -48.1    |
| Iteration     | 34       |
| MaximumReturn | -0.139   |
| MinimumReturn | -91.9    |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00161875004414469
Validation loss = 0.0021678737830370665
Validation loss = 0.001656776643358171
Validation loss = 0.0012469247449189425
Validation loss = 0.0014797517796978354
Validation loss = 0.0014293571002781391
Validation loss = 0.0015719584189355373
Validation loss = 0.001017119036987424
Validation loss = 0.0013285301392897964
Validation loss = 0.001307087019085884
Validation loss = 0.002090111840516329
Validation loss = 0.0014074727660045028
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013007191009819508
Validation loss = 0.001249058754183352
Validation loss = 0.001386543968692422
Validation loss = 0.0011770677519962192
Validation loss = 0.0019154867623001337
Validation loss = 0.0014365490060299635
Validation loss = 0.001449044793844223
Validation loss = 0.0015546779613941908
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015597965102642775
Validation loss = 0.0015924155013635755
Validation loss = 0.0022879927419126034
Validation loss = 0.0014858575304970145
Validation loss = 0.0015065307961776853
Validation loss = 0.0011494924547150731
Validation loss = 0.0014131765346974134
Validation loss = 0.0011990482453256845
Validation loss = 0.0014074442442506552
Validation loss = 0.0012537760194391012
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001800746307708323
Validation loss = 0.0012436803663149476
Validation loss = 0.0015616208547726274
Validation loss = 0.001598162460140884
Validation loss = 0.0014014395419508219
Validation loss = 0.0014488797169178724
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002296393271535635
Validation loss = 0.001517156371846795
Validation loss = 0.0013969412539154291
Validation loss = 0.0015771251637488604
Validation loss = 0.0013802476460114121
Validation loss = 0.0015142393531277776
Validation loss = 0.0012883719755336642
Validation loss = 0.0013446870725601912
Validation loss = 0.0014657098799943924
Validation loss = 0.0019956082105636597
Validation loss = 0.0014198424760252237
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.78     |
| Iteration     | 35        |
| MaximumReturn | -0.000688 |
| MinimumReturn | -18.8     |
| TotalSamples  | 61642     |
-----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0019699023105204105
Validation loss = 0.0017028322909027338
Validation loss = 0.0014291866682469845
Validation loss = 0.0011468082666397095
Validation loss = 0.0015179929323494434
Validation loss = 0.0012356924125924706
Validation loss = 0.0018232446163892746
Validation loss = 0.002403900260105729
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001497825374826789
Validation loss = 0.0014218935975804925
Validation loss = 0.0013641486875712872
Validation loss = 0.0014637967105954885
Validation loss = 0.0013336236588656902
Validation loss = 0.0013585005654022098
Validation loss = 0.0014108087634667754
Validation loss = 0.0037828460335731506
Validation loss = 0.0012858769623562694
Validation loss = 0.0011019686935469508
Validation loss = 0.0029621419962495565
Validation loss = 0.0012814967194572091
Validation loss = 0.0017363778315484524
Validation loss = 0.0011634321417659521
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001827100059017539
Validation loss = 0.0013487788382917643
Validation loss = 0.0016484226798638701
Validation loss = 0.0011905624996870756
Validation loss = 0.0016862310003489256
Validation loss = 0.0018439902924001217
Validation loss = 0.0011798065388575196
Validation loss = 0.001507097971625626
Validation loss = 0.0011744346702471375
Validation loss = 0.001531171496026218
Validation loss = 0.0011741056805476546
Validation loss = 0.0012277585919946432
Validation loss = 0.001177323400042951
Validation loss = 0.0013776534469798207
Validation loss = 0.001295660506002605
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013549269642680883
Validation loss = 0.0011439179070293903
Validation loss = 0.0013860190520063043
Validation loss = 0.0013155770720914006
Validation loss = 0.0014102068962529302
Validation loss = 0.0013488963013514876
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002734631998464465
Validation loss = 0.0010745153995230794
Validation loss = 0.0011572296498343349
Validation loss = 0.00134559185244143
Validation loss = 0.0021186124067753553
Validation loss = 0.0017833388410508633
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -23.2    |
| Iteration     | 36       |
| MaximumReturn | -0.00943 |
| MinimumReturn | -74.3    |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002010076306760311
Validation loss = 0.0013823548797518015
Validation loss = 0.001157575286924839
Validation loss = 0.0012587156379595399
Validation loss = 0.0013363059842959046
Validation loss = 0.0009850891074165702
Validation loss = 0.0014137658290565014
Validation loss = 0.0015492842067033052
Validation loss = 0.0020692863035947084
Validation loss = 0.001311051775701344
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00113385368604213
Validation loss = 0.001391350757330656
Validation loss = 0.0011322417994961143
Validation loss = 0.0012629665434360504
Validation loss = 0.001101141213439405
Validation loss = 0.0013318819692358375
Validation loss = 0.001209067995660007
Validation loss = 0.0024712514132261276
Validation loss = 0.0011178550776094198
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001353341736830771
Validation loss = 0.001506311702542007
Validation loss = 0.0012170013505965471
Validation loss = 0.0016936489846557379
Validation loss = 0.0015952322864905
Validation loss = 0.0012898248387500644
Validation loss = 0.0013132335152477026
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001599400187842548
Validation loss = 0.0010840328177437186
Validation loss = 0.0017125161830335855
Validation loss = 0.0012062167515978217
Validation loss = 0.0012902658199891448
Validation loss = 0.0011418532812967896
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012913604732602835
Validation loss = 0.0013041227357462049
Validation loss = 0.0017458194633945823
Validation loss = 0.001161377876996994
Validation loss = 0.001371895195916295
Validation loss = 0.0011634321417659521
Validation loss = 0.001178051345050335
Validation loss = 0.0016291354550048709
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -26.2     |
| Iteration     | 37        |
| MaximumReturn | -0.000857 |
| MinimumReturn | -111      |
| TotalSamples  | 64974     |
-----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015016170218586922
Validation loss = 0.0017079667886719108
Validation loss = 0.0012889443896710873
Validation loss = 0.001124909846112132
Validation loss = 0.0015721828676760197
Validation loss = 0.002810132224112749
Validation loss = 0.0015539918094873428
Validation loss = 0.0018611076520755887
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001313664484769106
Validation loss = 0.0020952625200152397
Validation loss = 0.0010720140999183059
Validation loss = 0.0013530142605304718
Validation loss = 0.001131691737100482
Validation loss = 0.0016305680619552732
Validation loss = 0.001260977005586028
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010716032702475786
Validation loss = 0.0011728478129953146
Validation loss = 0.001233331742696464
Validation loss = 0.0015234168386086822
Validation loss = 0.0015672605950385332
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012198600452393293
Validation loss = 0.0013081565266475081
Validation loss = 0.0013630513567477465
Validation loss = 0.0017930172616615891
Validation loss = 0.0019304003799334168
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018928521312773228
Validation loss = 0.0011461622780188918
Validation loss = 0.002522155176848173
Validation loss = 0.001986744347959757
Validation loss = 0.0010684860171750188
Validation loss = 0.0010345919290557504
Validation loss = 0.0010568380821496248
Validation loss = 0.0010487178806215525
Validation loss = 0.00135308806784451
Validation loss = 0.001620562165044248
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -12.3     |
| Iteration     | 38        |
| MaximumReturn | -0.000614 |
| MinimumReturn | -81.2     |
| TotalSamples  | 66640     |
-----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012736230855807662
Validation loss = 0.0018209669506177306
Validation loss = 0.001185184228233993
Validation loss = 0.001108389813452959
Validation loss = 0.001398567110300064
Validation loss = 0.0015737359644845128
Validation loss = 0.0012717044446617365
Validation loss = 0.0010974686592817307
Validation loss = 0.0015726201236248016
Validation loss = 0.00171689095441252
Validation loss = 0.0014485999708995223
Validation loss = 0.001152454991824925
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010698545956984162
Validation loss = 0.0012873357627540827
Validation loss = 0.0015027186600491405
Validation loss = 0.0014897988876327872
Validation loss = 0.0013129784492775798
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017436284106224775
Validation loss = 0.0024421962443739176
Validation loss = 0.0013496587052941322
Validation loss = 0.001963738352060318
Validation loss = 0.0013086650287732482
Validation loss = 0.0016853391425684094
Validation loss = 0.0012207338586449623
Validation loss = 0.001345297903753817
Validation loss = 0.0018253574380651116
Validation loss = 0.001388603588566184
Validation loss = 0.001251491135917604
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011737477034330368
Validation loss = 0.00154897291213274
Validation loss = 0.0014078285312280059
Validation loss = 0.0019644428975880146
Validation loss = 0.0010615857318043709
Validation loss = 0.0012088388903066516
Validation loss = 0.0012146461522206664
Validation loss = 0.0012126874644309282
Validation loss = 0.0010895876912400126
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001685418887063861
Validation loss = 0.001242573605850339
Validation loss = 0.0010726357577368617
Validation loss = 0.0013549822615459561
Validation loss = 0.001218917197547853
Validation loss = 0.0016418861923739314
Validation loss = 0.001417286810465157
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -23.4     |
| Iteration     | 39        |
| MaximumReturn | -0.000939 |
| MinimumReturn | -121      |
| TotalSamples  | 68306     |
-----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014826164115220308
Validation loss = 0.0011928065214306116
Validation loss = 0.0011592565570026636
Validation loss = 0.001541912672109902
Validation loss = 0.0017524862196296453
Validation loss = 0.0018145674839615822
Validation loss = 0.0012121513718739152
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013112332671880722
Validation loss = 0.001277661882340908
Validation loss = 0.0013530043652281165
Validation loss = 0.0015447406331077218
Validation loss = 0.0014591102954000235
Validation loss = 0.001587338512763381
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001408335636369884
Validation loss = 0.0013520198408514261
Validation loss = 0.0015452106017619371
Validation loss = 0.0011581324506551027
Validation loss = 0.001149989664554596
Validation loss = 0.0015029637143015862
Validation loss = 0.0011037593940272927
Validation loss = 0.0016112886369228363
Validation loss = 0.001044316217303276
Validation loss = 0.0011931797489523888
Validation loss = 0.0014367038384079933
Validation loss = 0.0011831733863800764
Validation loss = 0.0013665505684912205
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0030303392559289932
Validation loss = 0.0014716731384396553
Validation loss = 0.0012016990222036839
Validation loss = 0.0012550193350762129
Validation loss = 0.001219850149936974
Validation loss = 0.001321396790444851
Validation loss = 0.0012486230116337538
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011568008922040462
Validation loss = 0.0012777952942997217
Validation loss = 0.0011220703599974513
Validation loss = 0.0011879445519298315
Validation loss = 0.0014604071620851755
Validation loss = 0.0013625504216179252
Validation loss = 0.0012940799351781607
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -20.4     |
| Iteration     | 40        |
| MaximumReturn | -0.000683 |
| MinimumReturn | -127      |
| TotalSamples  | 69972     |
-----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013844697969034314
Validation loss = 0.0012910548830404878
Validation loss = 0.001269191619940102
Validation loss = 0.0014438909711316228
Validation loss = 0.001263323938474059
Validation loss = 0.0012747820001095533
Validation loss = 0.0011617100099101663
Validation loss = 0.0010884171351790428
Validation loss = 0.0013108780840411782
Validation loss = 0.0012962823966518044
Validation loss = 0.001520020654425025
Validation loss = 0.0014122207649052143
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001069410121999681
Validation loss = 0.0010150710586458445
Validation loss = 0.001400792971253395
Validation loss = 0.0017418465577065945
Validation loss = 0.001084717339836061
Validation loss = 0.0010368319926783442
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012403596192598343
Validation loss = 0.0011127000907436013
Validation loss = 0.0012741019017994404
Validation loss = 0.001092239050194621
Validation loss = 0.0018728162394836545
Validation loss = 0.0013296045362949371
Validation loss = 0.0011937901144847274
Validation loss = 0.001176371704787016
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010887120151892304
Validation loss = 0.0014134147204458714
Validation loss = 0.0011420523514971137
Validation loss = 0.0015414590016007423
Validation loss = 0.0012906930642202497
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010459944605827332
Validation loss = 0.0023500174283981323
Validation loss = 0.0010111414594575763
Validation loss = 0.0011371973669156432
Validation loss = 0.0011064648861065507
Validation loss = 0.0017912319162860513
Validation loss = 0.001265278784558177
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10.9    |
| Iteration     | 41       |
| MaximumReturn | -0.00069 |
| MinimumReturn | -125     |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012547929072752595
Validation loss = 0.0015555359423160553
Validation loss = 0.0018477786798030138
Validation loss = 0.0011629083892330527
Validation loss = 0.0014660252491012216
Validation loss = 0.0015404014848172665
Validation loss = 0.0012171331327408552
Validation loss = 0.001208127592690289
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010874283034354448
Validation loss = 0.0011916229268535972
Validation loss = 0.0014739208854734898
Validation loss = 0.001149908290244639
Validation loss = 0.001326060271821916
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012799060204997659
Validation loss = 0.001126437564380467
Validation loss = 0.001033602631650865
Validation loss = 0.0011875330237671733
Validation loss = 0.0011669030645862222
Validation loss = 0.0010907768737524748
Validation loss = 0.0013004716020077467
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011933898786082864
Validation loss = 0.0015939815202727914
Validation loss = 0.0011526510352268815
Validation loss = 0.0011391803855076432
Validation loss = 0.0017271260730922222
Validation loss = 0.001184910535812378
Validation loss = 0.0022529165726155043
Validation loss = 0.0011218097060918808
Validation loss = 0.0014972073258832097
Validation loss = 0.0012331785401329398
Validation loss = 0.0010868021054193377
Validation loss = 0.0009754772763699293
Validation loss = 0.0009716307395137846
Validation loss = 0.0015784645220264792
Validation loss = 0.0011657449649646878
Validation loss = 0.001179652288556099
Validation loss = 0.0012816343223676085
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014214374823495746
Validation loss = 0.0012942490866407752
Validation loss = 0.0014086395967751741
Validation loss = 0.0010355350095778704
Validation loss = 0.0018434120574966073
Validation loss = 0.0012413676595315337
Validation loss = 0.0014119076076894999
Validation loss = 0.0014709393726661801
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -29.3    |
| Iteration     | 42       |
| MaximumReturn | -0.00128 |
| MinimumReturn | -116     |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001383741619065404
Validation loss = 0.0012278882786631584
Validation loss = 0.001311254221946001
Validation loss = 0.0013602003455162048
Validation loss = 0.0022263445425778627
Validation loss = 0.0009440650464966893
Validation loss = 0.0011641605524346232
Validation loss = 0.00118345080409199
Validation loss = 0.0013087759725749493
Validation loss = 0.001191959367133677
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010665941517800093
Validation loss = 0.00103329261764884
Validation loss = 0.0011564177693799138
Validation loss = 0.001711391843855381
Validation loss = 0.0012003201991319656
Validation loss = 0.0014999854611232877
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012273145839571953
Validation loss = 0.0011637858115136623
Validation loss = 0.0010655643418431282
Validation loss = 0.001027714111842215
Validation loss = 0.0011264036875218153
Validation loss = 0.0012091385433450341
Validation loss = 0.0013682892313227057
Validation loss = 0.0009505839552730322
Validation loss = 0.001278055366128683
Validation loss = 0.0012070629745721817
Validation loss = 0.0016794221010059118
Validation loss = 0.0013020741753280163
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001061973045580089
Validation loss = 0.002359758596867323
Validation loss = 0.0011651997920125723
Validation loss = 0.0013617672957479954
Validation loss = 0.0011694608256220818
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013965250691398978
Validation loss = 0.0013668597675859928
Validation loss = 0.0011878286022692919
Validation loss = 0.0013818279840052128
Validation loss = 0.001173487980850041
Validation loss = 0.0012358746025711298
Validation loss = 0.0015002340078353882
Validation loss = 0.00167627923656255
Validation loss = 0.001100043999031186
Validation loss = 0.0010857838205993176
Validation loss = 0.001172037678770721
Validation loss = 0.0014014176558703184
Validation loss = 0.0010544967371970415
Validation loss = 0.001136677572503686
Validation loss = 0.0008976294775493443
Validation loss = 0.0010983464308083057
Validation loss = 0.002001143293455243
Validation loss = 0.001225935760885477
Validation loss = 0.001169251510873437
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -13.1     |
| Iteration     | 43        |
| MaximumReturn | -0.000757 |
| MinimumReturn | -114      |
| TotalSamples  | 74970     |
-----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012057104613631964
Validation loss = 0.0011338491458445787
Validation loss = 0.0016108258860185742
Validation loss = 0.0010383815970271826
Validation loss = 0.0010014440631493926
Validation loss = 0.001840959070250392
Validation loss = 0.0011059140088036656
Validation loss = 0.00109092949423939
Validation loss = 0.0010066413087770343
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013422905467450619
Validation loss = 0.001209973357617855
Validation loss = 0.0010264499578624964
Validation loss = 0.0013602571561932564
Validation loss = 0.0011305913794785738
Validation loss = 0.0011975987581536174
Validation loss = 0.0013029567198827863
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001063046627677977
Validation loss = 0.0010054479353129864
Validation loss = 0.001245685270987451
Validation loss = 0.0010597177315503359
Validation loss = 0.0010420042090117931
Validation loss = 0.0011949753388762474
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016903607174754143
Validation loss = 0.001511581358499825
Validation loss = 0.0009571796981617808
Validation loss = 0.0015731201274320483
Validation loss = 0.0013605220010504127
Validation loss = 0.0011473236372694373
Validation loss = 0.0013337015407159925
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001336991204880178
Validation loss = 0.0012000825954601169
Validation loss = 0.0011911134934052825
Validation loss = 0.0011710101971402764
Validation loss = 0.0010859776521101594
Validation loss = 0.0010787997161969543
Validation loss = 0.0011593728559091687
Validation loss = 0.0012952268589287996
Validation loss = 0.0018148313974961638
Validation loss = 0.0012674060417339206
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -18.7     |
| Iteration     | 44        |
| MaximumReturn | -0.000748 |
| MinimumReturn | -133      |
| TotalSamples  | 76636     |
-----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013065957464277744
Validation loss = 0.0013450788101181388
Validation loss = 0.0013522532535716891
Validation loss = 0.0013171641621738672
Validation loss = 0.0012029920471832156
Validation loss = 0.0011226380011066794
Validation loss = 0.0012639061314985156
Validation loss = 0.001114420359954238
Validation loss = 0.001388403819873929
Validation loss = 0.0014494984643533826
Validation loss = 0.0010666103335097432
Validation loss = 0.0015097473515197635
Validation loss = 0.0009449755889363587
Validation loss = 0.0009085522033274174
Validation loss = 0.0013152717147022486
Validation loss = 0.0012382875429466367
Validation loss = 0.001618399634025991
Validation loss = 0.0010792106622830033
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001181487343274057
Validation loss = 0.0012938998406752944
Validation loss = 0.001242691883817315
Validation loss = 0.001052170293405652
Validation loss = 0.0013394062407314777
Validation loss = 0.0011946824379265308
Validation loss = 0.0013929608976468444
Validation loss = 0.0010221665725111961
Validation loss = 0.0013749586651101708
Validation loss = 0.0014396249316632748
Validation loss = 0.0012995036086067557
Validation loss = 0.001109468750655651
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011348065454512835
Validation loss = 0.0010833382839336991
Validation loss = 0.001237989403307438
Validation loss = 0.0014588907361030579
Validation loss = 0.0011279778555035591
Validation loss = 0.0014353276928886771
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011646825587376952
Validation loss = 0.001631607417948544
Validation loss = 0.0010515974136069417
Validation loss = 0.0014688413357362151
Validation loss = 0.0012917974963784218
Validation loss = 0.0011208804789930582
Validation loss = 0.001262978301383555
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011495763901621103
Validation loss = 0.001261934288777411
Validation loss = 0.001116486731916666
Validation loss = 0.0010258541442453861
Validation loss = 0.0009923051111400127
Validation loss = 0.0016141512896865606
Validation loss = 0.0011385714169591665
Validation loss = 0.0014544319128617644
Validation loss = 0.0011635849950835109
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -80      |
| Iteration     | 45       |
| MaximumReturn | -0.0139  |
| MinimumReturn | -152     |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011483642738312483
Validation loss = 0.001092403195798397
Validation loss = 0.001041654497385025
Validation loss = 0.0011627781204879284
Validation loss = 0.0012992069823667407
Validation loss = 0.0015488695353269577
Validation loss = 0.0009705888805910945
Validation loss = 0.0010967109119519591
Validation loss = 0.0013479735935106874
Validation loss = 0.0009486645576544106
Validation loss = 0.0017664190381765366
Validation loss = 0.0013693496584892273
Validation loss = 0.0011436970671638846
Validation loss = 0.001039869268424809
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001687999116256833
Validation loss = 0.0012299364898353815
Validation loss = 0.0013709550257772207
Validation loss = 0.0011498881503939629
Validation loss = 0.0011164239840582013
Validation loss = 0.001151671283878386
Validation loss = 0.0012006190372630954
Validation loss = 0.0009727069991640747
Validation loss = 0.0010869215475395322
Validation loss = 0.0010139098158106208
Validation loss = 0.0013131817104294896
Validation loss = 0.0013564244145527482
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011554294032976031
Validation loss = 0.0014592017978429794
Validation loss = 0.001228082343004644
Validation loss = 0.001941575319506228
Validation loss = 0.002028124872595072
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011296202428638935
Validation loss = 0.0009406316676177084
Validation loss = 0.00111993751488626
Validation loss = 0.0010157558135688305
Validation loss = 0.0013165882555767894
Validation loss = 0.0009661014191806316
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013358708238229156
Validation loss = 0.0011382133234292269
Validation loss = 0.0014830066356807947
Validation loss = 0.0011874374467879534
Validation loss = 0.0009229870047420263
Validation loss = 0.001984191359952092
Validation loss = 0.001031898078508675
Validation loss = 0.001175298122689128
Validation loss = 0.0011518403189256787
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -73.4    |
| Iteration     | 46       |
| MaximumReturn | -0.00162 |
| MinimumReturn | -146     |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014393872115761042
Validation loss = 0.001300831907428801
Validation loss = 0.0010669591138139367
Validation loss = 0.0009936289861798286
Validation loss = 0.0014086372684687376
Validation loss = 0.0011187521740794182
Validation loss = 0.0011062147095799446
Validation loss = 0.001087522367015481
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009095807326957583
Validation loss = 0.0009689292637631297
Validation loss = 0.0011566515313461423
Validation loss = 0.0010821584146469831
Validation loss = 0.0011501610279083252
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001378838554956019
Validation loss = 0.001084880786947906
Validation loss = 0.001358367153443396
Validation loss = 0.0010351710952818394
Validation loss = 0.0011878330260515213
Validation loss = 0.0010731620714068413
Validation loss = 0.0013356672134250402
Validation loss = 0.001408708281815052
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001063620438799262
Validation loss = 0.0010888517135754228
Validation loss = 0.0010032837744802237
Validation loss = 0.0012778386007994413
Validation loss = 0.0010359156876802444
Validation loss = 0.001309462240897119
Validation loss = 0.0010577343637123704
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011894613271579146
Validation loss = 0.0008912502089515328
Validation loss = 0.0012514138361439109
Validation loss = 0.0011069170432165265
Validation loss = 0.0009777609957382083
Validation loss = 0.0009291284950450063
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26.3    |
| Iteration     | 47       |
| MaximumReturn | -0.00118 |
| MinimumReturn | -134     |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010703245643526316
Validation loss = 0.001059304689988494
Validation loss = 0.0010082616936415434
Validation loss = 0.0010964472312480211
Validation loss = 0.0008397387573495507
Validation loss = 0.0011172719532623887
Validation loss = 0.0012689760187640786
Validation loss = 0.0011937140952795744
Validation loss = 0.0009454889222979546
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012364017311483622
Validation loss = 0.001131668919697404
Validation loss = 0.0013512986479327083
Validation loss = 0.0011070834007114172
Validation loss = 0.0009865891188383102
Validation loss = 0.001212242990732193
Validation loss = 0.0012465374311432242
Validation loss = 0.0013124628458172083
Validation loss = 0.0009563054190948606
Validation loss = 0.0014258997980505228
Validation loss = 0.0010880535701289773
Validation loss = 0.0010078724008053541
Validation loss = 0.0010359000880271196
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001267757616005838
Validation loss = 0.0011346029350534081
Validation loss = 0.0010771850356832147
Validation loss = 0.0009860147256404161
Validation loss = 0.0012418028200045228
Validation loss = 0.0011012686882168055
Validation loss = 0.0011767575051635504
Validation loss = 0.0009587170789018273
Validation loss = 0.0011479860404506326
Validation loss = 0.001452036784030497
Validation loss = 0.0010389836970716715
Validation loss = 0.0014263363555073738
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012767143780365586
Validation loss = 0.0016795728588476777
Validation loss = 0.0011259634047746658
Validation loss = 0.0014093421632423997
Validation loss = 0.0010884437942877412
Validation loss = 0.001303633558563888
Validation loss = 0.0013388559455052018
Validation loss = 0.0010237793903797865
Validation loss = 0.0011307754321023822
Validation loss = 0.0015066565247252584
Validation loss = 0.001101373927667737
Validation loss = 0.0010409053647890687
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010142866522073746
Validation loss = 0.0010706025641411543
Validation loss = 0.0011828221613541245
Validation loss = 0.0009252011659555137
Validation loss = 0.0010084660025313497
Validation loss = 0.0010682297870516777
Validation loss = 0.001084687071852386
Validation loss = 0.0011213995749130845
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -107     |
| Iteration     | 48       |
| MaximumReturn | -35.4    |
| MinimumReturn | -141     |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001587317674420774
Validation loss = 0.0009618867770768702
Validation loss = 0.0008416890050284564
Validation loss = 0.0011398682836443186
Validation loss = 0.0015324648702517152
Validation loss = 0.0013314516982063651
Validation loss = 0.0010269307531416416
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001220298116095364
Validation loss = 0.0009374310611747205
Validation loss = 0.0010032965801656246
Validation loss = 0.0010941452346742153
Validation loss = 0.00105441571213305
Validation loss = 0.0012435492826625705
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010008341632783413
Validation loss = 0.0013810020172968507
Validation loss = 0.0009721837122924626
Validation loss = 0.0011488643940538168
Validation loss = 0.0009080619784072042
Validation loss = 0.0009768909076228738
Validation loss = 0.0010186663130298257
Validation loss = 0.0013193645281717181
Validation loss = 0.0010509360581636429
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016938616754487157
Validation loss = 0.0009557675803080201
Validation loss = 0.0009755265200510621
Validation loss = 0.0008816497283987701
Validation loss = 0.0010728155029937625
Validation loss = 0.0010419916361570358
Validation loss = 0.0010457364842295647
Validation loss = 0.0008891064208000898
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013641145778819919
Validation loss = 0.0009587611421011388
Validation loss = 0.0009488451760262251
Validation loss = 0.0009920679731294513
Validation loss = 0.001042232383042574
Validation loss = 0.0008845261763781309
Validation loss = 0.0012177194003015757
Validation loss = 0.0011367610422894359
Validation loss = 0.0010278044501319528
Validation loss = 0.0010501332581043243
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -17.7     |
| Iteration     | 49        |
| MaximumReturn | -0.000631 |
| MinimumReturn | -100      |
| TotalSamples  | 84966     |
-----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010602993424981833
Validation loss = 0.0008808543789200485
Validation loss = 0.0011981704737991095
Validation loss = 0.0012054394464939833
Validation loss = 0.0011770070996135473
Validation loss = 0.001047163619659841
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014255945570766926
Validation loss = 0.00103279622271657
Validation loss = 0.0016258243704214692
Validation loss = 0.0010822337353602052
Validation loss = 0.0012598040048033
Validation loss = 0.0011716799344867468
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001077577704563737
Validation loss = 0.0010021532652899623
Validation loss = 0.0015480640577152371
Validation loss = 0.0008345243404619396
Validation loss = 0.0010014987783506513
Validation loss = 0.0010423250496387482
Validation loss = 0.001070477650500834
Validation loss = 0.0011781826615333557
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001104541472159326
Validation loss = 0.001138646504841745
Validation loss = 0.0010054638842120767
Validation loss = 0.0009824407752603292
Validation loss = 0.000900440732948482
Validation loss = 0.0008940459229052067
Validation loss = 0.000880433595739305
Validation loss = 0.001190102193504572
Validation loss = 0.0008717128657735884
Validation loss = 0.0009409304475411773
Validation loss = 0.001248190295882523
Validation loss = 0.0010688198963180184
Validation loss = 0.000945508829317987
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000981461489573121
Validation loss = 0.0013082569930702448
Validation loss = 0.0010430978145450354
Validation loss = 0.001084383693523705
Validation loss = 0.0011124358279630542
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -27.9     |
| Iteration     | 50        |
| MaximumReturn | -0.000733 |
| MinimumReturn | -112      |
| TotalSamples  | 86632     |
-----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000900993007235229
Validation loss = 0.0009864409221336246
Validation loss = 0.0017780349589884281
Validation loss = 0.0009500705637037754
Validation loss = 0.0008740291814319789
Validation loss = 0.0009423960582353175
Validation loss = 0.0009844311280176044
Validation loss = 0.0010860523907467723
Validation loss = 0.0010825545759871602
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000924685038626194
Validation loss = 0.0011126353638246655
Validation loss = 0.0014295793371275067
Validation loss = 0.0009621165227144957
Validation loss = 0.0009515346027910709
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009960092138499022
Validation loss = 0.0011118584079667926
Validation loss = 0.0011821077205240726
Validation loss = 0.0010690375929698348
Validation loss = 0.0011936207301914692
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012615361483767629
Validation loss = 0.001086721895262599
Validation loss = 0.001301631098613143
Validation loss = 0.0015863722655922174
Validation loss = 0.0009633234003558755
Validation loss = 0.001121880253776908
Validation loss = 0.0014453413896262646
Validation loss = 0.0009591198177076876
Validation loss = 0.0012932310346513987
Validation loss = 0.0012124772183597088
Validation loss = 0.0011020679958164692
Validation loss = 0.0017652750248089433
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000995659502223134
Validation loss = 0.0012931335950270295
Validation loss = 0.001147635979577899
Validation loss = 0.000999865005724132
Validation loss = 0.000950137444306165
Validation loss = 0.0009872557129710913
Validation loss = 0.0010358382714912295
Validation loss = 0.001162121188826859
Validation loss = 0.0016072278376668692
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -137     |
| Iteration     | 51       |
| MaximumReturn | -113     |
| MinimumReturn | -153     |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015096775023266673
Validation loss = 0.0009244914981536567
Validation loss = 0.000942425976973027
Validation loss = 0.0011488862801343203
Validation loss = 0.0010607942240312696
Validation loss = 0.0009141562040895224
Validation loss = 0.0010210222098976374
Validation loss = 0.0009377945098094642
Validation loss = 0.0011711929691955447
Validation loss = 0.0011951371561735868
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013816739665344357
Validation loss = 0.00108642119448632
Validation loss = 0.0010794360423460603
Validation loss = 0.00121628912165761
Validation loss = 0.000988470041193068
Validation loss = 0.0008978616097010672
Validation loss = 0.0012835039524361491
Validation loss = 0.0008587035699747503
Validation loss = 0.0009843873558565974
Validation loss = 0.0011719396570697427
Validation loss = 0.0009913387475535274
Validation loss = 0.0011955489171668887
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010496140457689762
Validation loss = 0.0008653867989778519
Validation loss = 0.0013313774252310395
Validation loss = 0.000994230154901743
Validation loss = 0.0008969879127107561
Validation loss = 0.0011761838104575872
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012439527781680226
Validation loss = 0.0009235640754923224
Validation loss = 0.0008513288921676576
Validation loss = 0.0009435496758669615
Validation loss = 0.001061358256265521
Validation loss = 0.0009510497911833227
Validation loss = 0.0010267260950058699
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018185245571658015
Validation loss = 0.0008888095035217702
Validation loss = 0.00099081767257303
Validation loss = 0.000878272345289588
Validation loss = 0.0009718326618894935
Validation loss = 0.0009230499854311347
Validation loss = 0.0009591472917236388
Validation loss = 0.0012207954423502088
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -124     |
| Iteration     | 52       |
| MaximumReturn | -96.5    |
| MinimumReturn | -145     |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00135226605925709
Validation loss = 0.0008717961027286947
Validation loss = 0.0010126439156010747
Validation loss = 0.0009858146077021956
Validation loss = 0.0009992378763854504
Validation loss = 0.0013177756918594241
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011714895954355597
Validation loss = 0.0009853437077254057
Validation loss = 0.0008479501702822745
Validation loss = 0.0012703544925898314
Validation loss = 0.0010662532877177
Validation loss = 0.0009036408737301826
Validation loss = 0.0009399389382451773
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011125901946797967
Validation loss = 0.0008118693949654698
Validation loss = 0.0011579447891563177
Validation loss = 0.0009454185492359102
Validation loss = 0.001635868800804019
Validation loss = 0.0009237558697350323
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00101157883182168
Validation loss = 0.0011320457560941577
Validation loss = 0.0014229287626221776
Validation loss = 0.0008677471778355539
Validation loss = 0.001198665937408805
Validation loss = 0.0011114398948848248
Validation loss = 0.0009785880101844668
Validation loss = 0.0011750402627512813
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010230434127151966
Validation loss = 0.0009205368696711957
Validation loss = 0.0010140154045075178
Validation loss = 0.0011390730505809188
Validation loss = 0.001020094146952033
Validation loss = 0.0010950297582894564
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -32.1    |
| Iteration     | 53       |
| MaximumReturn | -0.00103 |
| MinimumReturn | -108     |
| TotalSamples  | 91630    |
----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010808836668729782
Validation loss = 0.0009573635179549456
Validation loss = 0.0010248507605865598
Validation loss = 0.0010458753677085042
Validation loss = 0.0008757865289226174
Validation loss = 0.001096611376851797
Validation loss = 0.0011651732493191957
Validation loss = 0.0010801558382809162
Validation loss = 0.0008444339036941528
Validation loss = 0.0008915567304939032
Validation loss = 0.0007890437263995409
Validation loss = 0.0013001656625419855
Validation loss = 0.0010180632816627622
Validation loss = 0.0008791310829110444
Validation loss = 0.0010862728813663125
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011625515762716532
Validation loss = 0.0009491352830082178
Validation loss = 0.0008176719420589507
Validation loss = 0.0011341521749272943
Validation loss = 0.0008948417962528765
Validation loss = 0.0009453343809582293
Validation loss = 0.0008732066489756107
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008431577007286251
Validation loss = 0.0011315123410895467
Validation loss = 0.0009970975806936622
Validation loss = 0.0009484693291597068
Validation loss = 0.0010838736779987812
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011150265345349908
Validation loss = 0.0010629682801663876
Validation loss = 0.0009814072400331497
Validation loss = 0.000961051438935101
Validation loss = 0.0012378975516185164
Validation loss = 0.0009117142180912197
Validation loss = 0.0010001537157222629
Validation loss = 0.0008742852951399982
Validation loss = 0.0008564124582335353
Validation loss = 0.0008166910847648978
Validation loss = 0.001009879051707685
Validation loss = 0.0009375486988574266
Validation loss = 0.001064390642568469
Validation loss = 0.0009519961313344538
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009528956725262105
Validation loss = 0.0008749151602387428
Validation loss = 0.0015042497543618083
Validation loss = 0.0009587210952304304
Validation loss = 0.0008686674409545958
Validation loss = 0.0011579146375879645
Validation loss = 0.001006941543892026
Validation loss = 0.0010975676123052835
Validation loss = 0.0010523315286263824
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -4.02     |
| Iteration     | 54        |
| MaximumReturn | -0.000688 |
| MinimumReturn | -65.7     |
| TotalSamples  | 93296     |
-----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009625338716432452
Validation loss = 0.0011222948087379336
Validation loss = 0.0010157787473872304
Validation loss = 0.001039341907016933
Validation loss = 0.0009461361332796514
Validation loss = 0.0008818312780931592
Validation loss = 0.001009877072647214
Validation loss = 0.0011498932726681232
Validation loss = 0.0010574620682746172
Validation loss = 0.0011109050828963518
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011197917629033327
Validation loss = 0.0009549262467771769
Validation loss = 0.0009602465433999896
Validation loss = 0.001034482498653233
Validation loss = 0.0008992466027848423
Validation loss = 0.0008636136190034449
Validation loss = 0.0014476818032562733
Validation loss = 0.0008949097828008235
Validation loss = 0.0009484891779720783
Validation loss = 0.000904313987120986
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010627593146637082
Validation loss = 0.0011462251422926784
Validation loss = 0.001121061504818499
Validation loss = 0.0009687367710284889
Validation loss = 0.0009563809726387262
Validation loss = 0.0011660984018817544
Validation loss = 0.0008974060183390975
Validation loss = 0.0009771274635568261
Validation loss = 0.0013355648843571544
Validation loss = 0.0008568576886318624
Validation loss = 0.0013006816152483225
Validation loss = 0.0009449124336242676
Validation loss = 0.0009245413239113986
Validation loss = 0.0009532181429676712
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009351015323773026
Validation loss = 0.0009219404892064631
Validation loss = 0.001092372345738113
Validation loss = 0.0010823247721418738
Validation loss = 0.0010510854190215468
Validation loss = 0.0008103761938400567
Validation loss = 0.0010680460836738348
Validation loss = 0.001179856015369296
Validation loss = 0.0009134099818766117
Validation loss = 0.0008051070035435259
Validation loss = 0.001147352159023285
Validation loss = 0.000934457581024617
Validation loss = 0.0011713956482708454
Validation loss = 0.0008331037824973464
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001159511273726821
Validation loss = 0.001227806555107236
Validation loss = 0.0009840733837336302
Validation loss = 0.0010029536206275225
Validation loss = 0.0009262851090170443
Validation loss = 0.0012768504675477743
Validation loss = 0.001041620853357017
Validation loss = 0.0008456649375148118
Validation loss = 0.0009081758325919509
Validation loss = 0.0010869888355955482
Validation loss = 0.0009115171269513667
Validation loss = 0.0011322747450321913
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -79.8    |
| Iteration     | 55       |
| MaximumReturn | -0.148   |
| MinimumReturn | -136     |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0017781630158424377
Validation loss = 0.0008907249430194497
Validation loss = 0.00203969469293952
Validation loss = 0.001001947675831616
Validation loss = 0.0008397961501032114
Validation loss = 0.0012252031592652202
Validation loss = 0.001003363635390997
Validation loss = 0.0008882951806299388
Validation loss = 0.0008648520451970398
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0021211609710007906
Validation loss = 0.0011023781262338161
Validation loss = 0.0008350497228093445
Validation loss = 0.0009060467127710581
Validation loss = 0.0012220765929669142
Validation loss = 0.0009865056490525603
Validation loss = 0.001019218354485929
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004339489620178938
Validation loss = 0.0010206646984443069
Validation loss = 0.0009554467396810651
Validation loss = 0.000950725341681391
Validation loss = 0.001115020364522934
Validation loss = 0.0010813457192853093
Validation loss = 0.000876135949511081
Validation loss = 0.0009353237692266703
Validation loss = 0.0012090300442650914
Validation loss = 0.0018503968603909016
Validation loss = 0.0010177731746807694
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0018859651172533631
Validation loss = 0.0009576149750500917
Validation loss = 0.0009853111114352942
Validation loss = 0.0009884563041850924
Validation loss = 0.0008283564820885658
Validation loss = 0.0010323880705982447
Validation loss = 0.001133822719566524
Validation loss = 0.0008756723836995661
Validation loss = 0.0010221941629424691
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002149922540411353
Validation loss = 0.0010179552482441068
Validation loss = 0.0011058530071750283
Validation loss = 0.0008931940537877381
Validation loss = 0.0009029071079567075
Validation loss = 0.0009785136207938194
Validation loss = 0.00225654523819685
Validation loss = 0.001340352464467287
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -28.9     |
| Iteration     | 56        |
| MaximumReturn | -0.000693 |
| MinimumReturn | -130      |
| TotalSamples  | 96628     |
-----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001067815232090652
Validation loss = 0.0013664121506735682
Validation loss = 0.0009025533800013363
Validation loss = 0.0013054503360763192
Validation loss = 0.0010942722437903285
Validation loss = 0.000939996272791177
Validation loss = 0.0009640036732889712
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013964399695396423
Validation loss = 0.0011849829461425543
Validation loss = 0.0010375040583312511
Validation loss = 0.0010643514106050134
Validation loss = 0.0009542619809508324
Validation loss = 0.0013679270632565022
Validation loss = 0.0018837644020095468
Validation loss = 0.0011351589346304536
Validation loss = 0.0009344101999886334
Validation loss = 0.0008439913508482277
Validation loss = 0.0008993000374175608
Validation loss = 0.0009141629561781883
Validation loss = 0.000870425661560148
Validation loss = 0.0009475017432123423
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010592608014121652
Validation loss = 0.0011813532328233123
Validation loss = 0.0012106802314519882
Validation loss = 0.0011933090863749385
Validation loss = 0.0010098327184095979
Validation loss = 0.0010943537345156074
Validation loss = 0.0010360603919252753
Validation loss = 0.001014677924104035
Validation loss = 0.0013150623999536037
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008243453339673579
Validation loss = 0.0010822085896506906
Validation loss = 0.001337669906206429
Validation loss = 0.000874292163643986
Validation loss = 0.0012189327972009778
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001131289405748248
Validation loss = 0.0009831030620262027
Validation loss = 0.0010454870061948895
Validation loss = 0.0013797059655189514
Validation loss = 0.0014910236932337284
Validation loss = 0.0008827901910990477
Validation loss = 0.001964320195838809
Validation loss = 0.0012405610177665949
Validation loss = 0.0012708386639133096
Validation loss = 0.0011115972883999348
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -46.2    |
| Iteration     | 57       |
| MaximumReturn | -0.001   |
| MinimumReturn | -143     |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009060368756763637
Validation loss = 0.0010699928971007466
Validation loss = 0.001406542956829071
Validation loss = 0.00111352966632694
Validation loss = 0.001013348693959415
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001132151810452342
Validation loss = 0.0013733997475355864
Validation loss = 0.0010386403882876039
Validation loss = 0.0008627785136923194
Validation loss = 0.0011453337501734495
Validation loss = 0.0010964262764900923
Validation loss = 0.0010093399323523045
Validation loss = 0.0009889609646052122
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011589339701458812
Validation loss = 0.0010746511397883296
Validation loss = 0.0009941562311723828
Validation loss = 0.0011413861066102982
Validation loss = 0.0010219478281214833
Validation loss = 0.0009858770063146949
Validation loss = 0.0012485572369769216
Validation loss = 0.001139986445195973
Validation loss = 0.001117121777497232
Validation loss = 0.0010254469234496355
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009360104450024664
Validation loss = 0.0016030491096898913
Validation loss = 0.0008652879041619599
Validation loss = 0.0010657923994585872
Validation loss = 0.0009703334071673453
Validation loss = 0.002005500253289938
Validation loss = 0.0022317208349704742
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011368144769221544
Validation loss = 0.0016535541508346796
Validation loss = 0.0011115025263279676
Validation loss = 0.001141648506745696
Validation loss = 0.001070752157829702
Validation loss = 0.0015415697125717998
Validation loss = 0.0011157761327922344
Validation loss = 0.0010499622439965606
Validation loss = 0.0012816443340852857
Validation loss = 0.0011865014676004648
Validation loss = 0.0014762323116883636
Validation loss = 0.0012896991102024913
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -144     |
| Iteration     | 58       |
| MaximumReturn | -117     |
| MinimumReturn | -161     |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013396377908065915
Validation loss = 0.001346541102975607
Validation loss = 0.0010477423202246428
Validation loss = 0.0010993261821568012
Validation loss = 0.0009704872500151396
Validation loss = 0.0009449750650674105
Validation loss = 0.0009227224509231746
Validation loss = 0.0008739754557609558
Validation loss = 0.0014348274562507868
Validation loss = 0.0009082616888917983
Validation loss = 0.0024595586583018303
Validation loss = 0.0008497805683873594
Validation loss = 0.0011560397688299417
Validation loss = 0.001142527791671455
Validation loss = 0.0008796476759016514
Validation loss = 0.0007862245547585189
Validation loss = 0.0009635279420763254
Validation loss = 0.0023517280351370573
Validation loss = 0.0022727057803422213
Validation loss = 0.0013372108805924654
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011196215637028217
Validation loss = 0.0011062020203098655
Validation loss = 0.0008351036813110113
Validation loss = 0.0013726514298468828
Validation loss = 0.0009642389486543834
Validation loss = 0.0011052277404814959
Validation loss = 0.0014336700551211834
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000977679737843573
Validation loss = 0.0008840255322866142
Validation loss = 0.0011550098424777389
Validation loss = 0.0011671418324112892
Validation loss = 0.0016013431595638394
Validation loss = 0.0010479504708200693
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009406728786416352
Validation loss = 0.0010765972547233105
Validation loss = 0.0011912767076864839
Validation loss = 0.0016009989194571972
Validation loss = 0.0011255141580477357
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009507462382316589
Validation loss = 0.0010772163514047861
Validation loss = 0.0014764911029487848
Validation loss = 0.0009844501037150621
Validation loss = 0.0010091466829180717
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -150     |
| Iteration     | 59       |
| MaximumReturn | -121     |
| MinimumReturn | -168     |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009353369241580367
Validation loss = 0.000870177464094013
Validation loss = 0.0008541007409803569
Validation loss = 0.0013040180783718824
Validation loss = 0.0008326701354235411
Validation loss = 0.0008518893737345934
Validation loss = 0.000814905040897429
Validation loss = 0.0010652594501152635
Validation loss = 0.0009757413063198328
Validation loss = 0.0008468299638479948
Validation loss = 0.0008873474434949458
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009112911066040397
Validation loss = 0.0009590436238795519
Validation loss = 0.0009606048697605729
Validation loss = 0.0011200163280591369
Validation loss = 0.0010149392765015364
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009444007300771773
Validation loss = 0.0008670618408359587
Validation loss = 0.001187643501907587
Validation loss = 0.0009896630654111505
Validation loss = 0.0009310880559496582
Validation loss = 0.0024018881376832724
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009673641761764884
Validation loss = 0.000906847242731601
Validation loss = 0.0008953899377956986
Validation loss = 0.0009009891655296087
Validation loss = 0.0010236732196062803
Validation loss = 0.0007971654995344579
Validation loss = 0.0013559587532654405
Validation loss = 0.0008305993396788836
Validation loss = 0.0009188950061798096
Validation loss = 0.0012488863430917263
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012472773669287562
Validation loss = 0.001045773271471262
Validation loss = 0.0014583092415705323
Validation loss = 0.0011296052252873778
Validation loss = 0.0013190066674724221
Validation loss = 0.001170282019302249
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -122     |
| Iteration     | 60       |
| MaximumReturn | -90.3    |
| MinimumReturn | -144     |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010999598307535052
Validation loss = 0.0009484336478635669
Validation loss = 0.0009901500307023525
Validation loss = 0.0008242745534516871
Validation loss = 0.0008278457680717111
Validation loss = 0.0019416058203205466
Validation loss = 0.0009821448475122452
Validation loss = 0.0011836658231914043
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017079240642488003
Validation loss = 0.0009537505684420466
Validation loss = 0.0010788303334265947
Validation loss = 0.0008881256217136979
Validation loss = 0.0010726585751399398
Validation loss = 0.0009417557157576084
Validation loss = 0.0010148868896067142
Validation loss = 0.001089596888050437
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010716128163039684
Validation loss = 0.00102027237880975
Validation loss = 0.0010000914335250854
Validation loss = 0.0015782970003783703
Validation loss = 0.0008292353013530374
Validation loss = 0.000982006429694593
Validation loss = 0.0008998043485917151
Validation loss = 0.0010611267061904073
Validation loss = 0.0009731609607115388
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012666652910411358
Validation loss = 0.0009195469901897013
Validation loss = 0.0007695662206970155
Validation loss = 0.0008219869341701269
Validation loss = 0.0011275350116193295
Validation loss = 0.0010007189121097326
Validation loss = 0.0009783265413716435
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00146350113209337
Validation loss = 0.0010037020547315478
Validation loss = 0.0011371886357665062
Validation loss = 0.001395878498442471
Validation loss = 0.0009750018361955881
Validation loss = 0.0010269205085933208
Validation loss = 0.0007524441462010145
Validation loss = 0.0010569002479314804
Validation loss = 0.0008612651145085692
Validation loss = 0.0010058818152174354
Validation loss = 0.00096474913880229
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -63.9    |
| Iteration     | 61       |
| MaximumReturn | -0.0136  |
| MinimumReturn | -127     |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010531999869272113
Validation loss = 0.0009796697413548827
Validation loss = 0.001321313320659101
Validation loss = 0.0008960698032751679
Validation loss = 0.0009543764754198492
Validation loss = 0.000951351597905159
Validation loss = 0.001591977896168828
Validation loss = 0.0008021389367058873
Validation loss = 0.0009020284633152187
Validation loss = 0.0010072087170556188
Validation loss = 0.0008676244760863483
Validation loss = 0.0009270339505746961
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009378297836519778
Validation loss = 0.0021105906926095486
Validation loss = 0.0010316764237359166
Validation loss = 0.0009735958301462233
Validation loss = 0.0009082938195206225
Validation loss = 0.0008864216506481171
Validation loss = 0.0009097465663217008
Validation loss = 0.0009758495143614709
Validation loss = 0.0008517111418768764
Validation loss = 0.0007262742728926241
Validation loss = 0.00089937326265499
Validation loss = 0.0008437842479906976
Validation loss = 0.0009056237176991999
Validation loss = 0.0009140356560237706
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010421742917969823
Validation loss = 0.001238776370882988
Validation loss = 0.0013231451157480478
Validation loss = 0.0010224793804809451
Validation loss = 0.0010667096357792616
Validation loss = 0.0008246765937656164
Validation loss = 0.0009451869409531355
Validation loss = 0.0009273182949982584
Validation loss = 0.0008973354706540704
Validation loss = 0.0010917759500443935
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010333354584872723
Validation loss = 0.0008862414979375899
Validation loss = 0.0008623816538602114
Validation loss = 0.001085962401703
Validation loss = 0.0008965285960584879
Validation loss = 0.0017053803894668818
Validation loss = 0.000861881417222321
Validation loss = 0.000950982270296663
Validation loss = 0.0010814559645950794
Validation loss = 0.001532984315417707
Validation loss = 0.0009567592060193419
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008154450333677232
Validation loss = 0.001211361843161285
Validation loss = 0.0012251811567693949
Validation loss = 0.0008351547294296324
Validation loss = 0.0008414137409999967
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -110     |
| Iteration     | 62       |
| MaximumReturn | -70.2    |
| MinimumReturn | -129     |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000899344275239855
Validation loss = 0.0012416893150657415
Validation loss = 0.001007661921903491
Validation loss = 0.0011925006983801723
Validation loss = 0.0011253866832703352
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010024697985500097
Validation loss = 0.0009790800977498293
Validation loss = 0.0009838835103437304
Validation loss = 0.0008462523110210896
Validation loss = 0.0007983669638633728
Validation loss = 0.0013654048088937998
Validation loss = 0.0008506338926963508
Validation loss = 0.0008567574550397694
Validation loss = 0.0010734330862760544
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012466273037716746
Validation loss = 0.0008276249864138663
Validation loss = 0.0007739047869108617
Validation loss = 0.0010575493797659874
Validation loss = 0.0009426362812519073
Validation loss = 0.0008689988753758371
Validation loss = 0.0008983903680928051
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009737061918713152
Validation loss = 0.0008853453327901661
Validation loss = 0.0010905845556408167
Validation loss = 0.0008957931422628462
Validation loss = 0.0009760737302713096
Validation loss = 0.0009519927552901208
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008860652451403439
Validation loss = 0.0010390925453975797
Validation loss = 0.0009100406314246356
Validation loss = 0.0008871303871273994
Validation loss = 0.0010064677335321903
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -16.4     |
| Iteration     | 63        |
| MaximumReturn | -0.000721 |
| MinimumReturn | -125      |
| TotalSamples  | 108290    |
-----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008333564037457108
Validation loss = 0.0009418574045412242
Validation loss = 0.0008177342242561281
Validation loss = 0.0008567093755118549
Validation loss = 0.0008475917275063694
Validation loss = 0.0010634323116391897
Validation loss = 0.0009168450487777591
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007529165013693273
Validation loss = 0.0008570236386731267
Validation loss = 0.0009220261708833277
Validation loss = 0.0009014244424179196
Validation loss = 0.001114546786993742
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009969136444851756
Validation loss = 0.0012327670119702816
Validation loss = 0.002028476679697633
Validation loss = 0.0011350775603204966
Validation loss = 0.000825013208668679
Validation loss = 0.0012283750111237168
Validation loss = 0.0007826121873222291
Validation loss = 0.0021491260267794132
Validation loss = 0.0011742454953491688
Validation loss = 0.001553960028104484
Validation loss = 0.0007954083848744631
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016024013748392463
Validation loss = 0.0011034274939447641
Validation loss = 0.0007797139114700258
Validation loss = 0.0009828112088143826
Validation loss = 0.0010429001413285732
Validation loss = 0.0009309951565228403
Validation loss = 0.0012792834313586354
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012486593332141638
Validation loss = 0.0009293300681747496
Validation loss = 0.0010649377945810556
Validation loss = 0.0013665474252775311
Validation loss = 0.0008556860266253352
Validation loss = 0.0010419749887660146
Validation loss = 0.0009548432426527143
Validation loss = 0.0008626079652458429
Validation loss = 0.000999770243652165
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -144     |
| Iteration     | 64       |
| MaximumReturn | -97.8    |
| MinimumReturn | -166     |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008198050782084465
Validation loss = 0.0007946112891659141
Validation loss = 0.0008873931947164237
Validation loss = 0.0008680670289322734
Validation loss = 0.0008829472935758531
Validation loss = 0.0007733397069387138
Validation loss = 0.0011525640729814768
Validation loss = 0.001708265976049006
Validation loss = 0.00076303631067276
Validation loss = 0.000984257203526795
Validation loss = 0.0008235937566496432
Validation loss = 0.0011006547138094902
Validation loss = 0.0011150307254865766
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009461914305575192
Validation loss = 0.0009877316188067198
Validation loss = 0.001079247915185988
Validation loss = 0.0008945796289481223
Validation loss = 0.0008457096992060542
Validation loss = 0.0008250324171967804
Validation loss = 0.0008080198895186186
Validation loss = 0.000952790433075279
Validation loss = 0.0007134312181733549
Validation loss = 0.0007426031515933573
Validation loss = 0.0009364747093059123
Validation loss = 0.0007753754034638405
Validation loss = 0.0010744946775957942
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009090342209674418
Validation loss = 0.0009590390836820006
Validation loss = 0.0008859614026732743
Validation loss = 0.0009387305472046137
Validation loss = 0.0009443738381378353
Validation loss = 0.000989723252132535
Validation loss = 0.001172133139334619
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002824591239914298
Validation loss = 0.0007836093427613378
Validation loss = 0.0009955501882359385
Validation loss = 0.001269254949875176
Validation loss = 0.0010251690400764346
Validation loss = 0.0012095720740035176
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015929690562188625
Validation loss = 0.0008140477002598345
Validation loss = 0.001071713282726705
Validation loss = 0.0010302735026925802
Validation loss = 0.0009699680376797915
Validation loss = 0.0008130855276249349
Validation loss = 0.0009302727412432432
Validation loss = 0.0009565706131979823
Validation loss = 0.0012050641234964132
Validation loss = 0.0009864062303677201
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -143     |
| Iteration     | 65       |
| MaximumReturn | -109     |
| MinimumReturn | -164     |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021824552677571774
Validation loss = 0.0008165278122760355
Validation loss = 0.0009036324918270111
Validation loss = 0.0008735995506867766
Validation loss = 0.0007559569203294814
Validation loss = 0.0008827118435874581
Validation loss = 0.000814668892417103
Validation loss = 0.0008455083006992936
Validation loss = 0.0007808743976056576
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010427571833133698
Validation loss = 0.0007737096748314798
Validation loss = 0.000849165313411504
Validation loss = 0.0009851495269685984
Validation loss = 0.0010074280435219407
Validation loss = 0.0014037928776815534
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008403995889239013
Validation loss = 0.0012145135551691055
Validation loss = 0.0009496120037510991
Validation loss = 0.0008193780086003244
Validation loss = 0.0009363061981275678
Validation loss = 0.001001716940663755
Validation loss = 0.0007642018026672304
Validation loss = 0.0007953551248647273
Validation loss = 0.0007627755403518677
Validation loss = 0.000829449447337538
Validation loss = 0.0008514987421222031
Validation loss = 0.0008653260301798582
Validation loss = 0.0009624977828934789
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008901799446903169
Validation loss = 0.0009962876792997122
Validation loss = 0.0008635562844574451
Validation loss = 0.00070950883673504
Validation loss = 0.0012380961561575532
Validation loss = 0.0008952582138590515
Validation loss = 0.0011130970669910312
Validation loss = 0.000964743725489825
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008589727804064751
Validation loss = 0.0008408109424635768
Validation loss = 0.0008467711741104722
Validation loss = 0.0008598204585723579
Validation loss = 0.0008773996960371733
Validation loss = 0.0007975553744472563
Validation loss = 0.0008115879027172923
Validation loss = 0.0008531990461051464
Validation loss = 0.0007693013176321983
Validation loss = 0.001048556761816144
Validation loss = 0.0008034139173105359
Validation loss = 0.0010117649799212813
Validation loss = 0.0011976409005001187
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -30.3     |
| Iteration     | 66        |
| MaximumReturn | -0.000759 |
| MinimumReturn | -170      |
| TotalSamples  | 113288    |
-----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010674686636775732
Validation loss = 0.0007784349145367742
Validation loss = 0.0009318527881987393
Validation loss = 0.0008175601833499968
Validation loss = 0.0009248102433048189
Validation loss = 0.0009395038359798491
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007534482283517718
Validation loss = 0.000810096098575741
Validation loss = 0.0007640146650373936
Validation loss = 0.0008433268521912396
Validation loss = 0.0013244050787761807
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007824293570593
Validation loss = 0.0008217893191613257
Validation loss = 0.0007641004049219191
Validation loss = 0.0007803957560099661
Validation loss = 0.000914841890335083
Validation loss = 0.0008808375569060445
Validation loss = 0.0008045798167586327
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007308250642381608
Validation loss = 0.000823818554636091
Validation loss = 0.0008290013647638261
Validation loss = 0.0011658904841169715
Validation loss = 0.0010355382692068815
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001049437327310443
Validation loss = 0.0007704225135967135
Validation loss = 0.0008371433359570801
Validation loss = 0.0010621814290061593
Validation loss = 0.001592780346982181
Validation loss = 0.0007756787235848606
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -15.7    |
| Iteration     | 67       |
| MaximumReturn | -0.0007  |
| MinimumReturn | -140     |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012042514281347394
Validation loss = 0.0008365172543562949
Validation loss = 0.0008320368360728025
Validation loss = 0.0008673682459630072
Validation loss = 0.0008180720615200698
Validation loss = 0.0008770136046223342
Validation loss = 0.0013026255182921886
Validation loss = 0.0007387033547274768
Validation loss = 0.0009889674838632345
Validation loss = 0.0008153943344950676
Validation loss = 0.0010281705763190985
Validation loss = 0.0007727486663497984
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008092636126093566
Validation loss = 0.0008081137202680111
Validation loss = 0.0008336858591064811
Validation loss = 0.0009055049158632755
Validation loss = 0.0009582944330759346
Validation loss = 0.0010198079980909824
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006876183906570077
Validation loss = 0.0009502420434728265
Validation loss = 0.0008743673679418862
Validation loss = 0.0009399218251928687
Validation loss = 0.0008577781263738871
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009745967690832913
Validation loss = 0.000992038520053029
Validation loss = 0.0010046741226688027
Validation loss = 0.0008365889079868793
Validation loss = 0.0009000074351206422
Validation loss = 0.0010972021846100688
Validation loss = 0.0007370510138571262
Validation loss = 0.0007308134227059782
Validation loss = 0.0009617319446988404
Validation loss = 0.0008425463456660509
Validation loss = 0.000767317833378911
Validation loss = 0.0009153240243904293
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010309540666639805
Validation loss = 0.0015839282423257828
Validation loss = 0.0008409905130974948
Validation loss = 0.0009574694558978081
Validation loss = 0.0009799718391150236
Validation loss = 0.0010885351803153753
Validation loss = 0.0008825345430523157
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -15.4     |
| Iteration     | 68        |
| MaximumReturn | -0.000731 |
| MinimumReturn | -105      |
| TotalSamples  | 116620    |
-----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007515526376664639
Validation loss = 0.000854161218740046
Validation loss = 0.001212141360156238
Validation loss = 0.0008337507606483996
Validation loss = 0.0009339005337096751
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013020308688282967
Validation loss = 0.0013318214332684875
Validation loss = 0.0008813448948785663
Validation loss = 0.0009299608063884079
Validation loss = 0.0010637248633429408
Validation loss = 0.0008340129279531538
Validation loss = 0.0009508930379524827
Validation loss = 0.0009877556003630161
Validation loss = 0.0007063305238261819
Validation loss = 0.001586922793649137
Validation loss = 0.0008741132915019989
Validation loss = 0.0009862136794254184
Validation loss = 0.0007667943136766553
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007539069629274309
Validation loss = 0.0010066099930554628
Validation loss = 0.0017761074705049396
Validation loss = 0.0008478750823996961
Validation loss = 0.0008860473171807826
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000770221056882292
Validation loss = 0.0009216574253514409
Validation loss = 0.0007952959276735783
Validation loss = 0.0009927810169756413
Validation loss = 0.0007819290622137487
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009691717568784952
Validation loss = 0.0008346742833964527
Validation loss = 0.0009046785999089479
Validation loss = 0.0009061043383553624
Validation loss = 0.000905855733435601
Validation loss = 0.0009205820970237255
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -12.6     |
| Iteration     | 69        |
| MaximumReturn | -0.000739 |
| MinimumReturn | -128      |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011929335305467248
Validation loss = 0.0007646330050192773
Validation loss = 0.0008140924037434161
Validation loss = 0.0007582530379295349
Validation loss = 0.0007585555431433022
Validation loss = 0.0009247082052752376
Validation loss = 0.0009060131851583719
Validation loss = 0.0007842094055376947
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010506357066333294
Validation loss = 0.0014201444573700428
Validation loss = 0.0007737723062746227
Validation loss = 0.001446652109734714
Validation loss = 0.0006933239637874067
Validation loss = 0.00082072225632146
Validation loss = 0.0008380665676668286
Validation loss = 0.0012021107831969857
Validation loss = 0.0008213978726416826
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008021509856916964
Validation loss = 0.0008090237388387322
Validation loss = 0.0007548133726231754
Validation loss = 0.0012212716974318027
Validation loss = 0.0007896267343312502
Validation loss = 0.0008075321093201637
Validation loss = 0.0008162574958987534
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007396821747533977
Validation loss = 0.001018759561702609
Validation loss = 0.0007430921541526914
Validation loss = 0.0007859880570322275
Validation loss = 0.0008284916402772069
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008572171209380031
Validation loss = 0.000787472992669791
Validation loss = 0.0010306587209925056
Validation loss = 0.0008761771605350077
Validation loss = 0.0008261059410870075
Validation loss = 0.0008634051773697138
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -22.6     |
| Iteration     | 70        |
| MaximumReturn | -0.000679 |
| MinimumReturn | -143      |
| TotalSamples  | 119952    |
-----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00113546138163656
Validation loss = 0.0010807816870510578
Validation loss = 0.001071314443834126
Validation loss = 0.0008883636328391731
Validation loss = 0.0009965550852939487
Validation loss = 0.000840450229588896
Validation loss = 0.000814643397461623
Validation loss = 0.0007941632647998631
Validation loss = 0.0007836543372832239
Validation loss = 0.0007195647340267897
Validation loss = 0.0009336046641692519
Validation loss = 0.0008297768654301763
Validation loss = 0.0009663692326284945
Validation loss = 0.0006880600703880191
Validation loss = 0.0008417692733928561
Validation loss = 0.0008634676923975348
Validation loss = 0.0010095436591655016
Validation loss = 0.0007861002814024687
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000895396457053721
Validation loss = 0.0008416841155849397
Validation loss = 0.0007302043959498405
Validation loss = 0.0008894092170521617
Validation loss = 0.0008254671702161431
Validation loss = 0.0008102270658127964
Validation loss = 0.0010689928894862533
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008837133646011353
Validation loss = 0.0009852917864918709
Validation loss = 0.0012913551181554794
Validation loss = 0.0012317374348640442
Validation loss = 0.0008800272480584681
Validation loss = 0.001011512940749526
Validation loss = 0.0007288597989827394
Validation loss = 0.0009385321172885597
Validation loss = 0.0008900954271666706
Validation loss = 0.0012312981998547912
Validation loss = 0.0007467842078767717
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012807585299015045
Validation loss = 0.000906954170204699
Validation loss = 0.000791720172856003
Validation loss = 0.0007969365105964243
Validation loss = 0.0009574635769240558
Validation loss = 0.0009803909342736006
Validation loss = 0.0008598772692494094
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008331990684382617
Validation loss = 0.0009357754606753588
Validation loss = 0.0012976324651390314
Validation loss = 0.0009571718401275575
Validation loss = 0.0008713975548744202
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -167     |
| Iteration     | 71       |
| MaximumReturn | -146     |
| MinimumReturn | -179     |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001007408369332552
Validation loss = 0.0007337314891628921
Validation loss = 0.0008252258994616568
Validation loss = 0.0007349582738243043
Validation loss = 0.0007381642935797572
Validation loss = 0.0007188594317995012
Validation loss = 0.0007843360654078424
Validation loss = 0.0007276563555933535
Validation loss = 0.0007405357901006937
Validation loss = 0.0007625349680893123
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008271298138424754
Validation loss = 0.0007698945701122284
Validation loss = 0.0008125274907797575
Validation loss = 0.0007236342062242329
Validation loss = 0.0007769281510263681
Validation loss = 0.0008058787789195776
Validation loss = 0.0007755838450975716
Validation loss = 0.0008303996291942894
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008244319469667971
Validation loss = 0.0009019190329127014
Validation loss = 0.0007571550086140633
Validation loss = 0.0011417054338380694
Validation loss = 0.0008343463996425271
Validation loss = 0.001072112238034606
Validation loss = 0.0008887631702236831
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007462240173481405
Validation loss = 0.001499174046330154
Validation loss = 0.0008025318384170532
Validation loss = 0.0013861488550901413
Validation loss = 0.0007294176612049341
Validation loss = 0.0010053706355392933
Validation loss = 0.0008903627167455852
Validation loss = 0.0008086611633189023
Validation loss = 0.0006708012660965323
Validation loss = 0.0010425791842862964
Validation loss = 0.0007962171221151948
Validation loss = 0.0010864840587601066
Validation loss = 0.0007925350219011307
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007861198973841965
Validation loss = 0.001091116457246244
Validation loss = 0.0008468469022773206
Validation loss = 0.0008448870503343642
Validation loss = 0.000979032600298524
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -157     |
| Iteration     | 72       |
| MaximumReturn | -110     |
| MinimumReturn | -182     |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000713635643478483
Validation loss = 0.001068534329533577
Validation loss = 0.0008532092906534672
Validation loss = 0.0006757298833690584
Validation loss = 0.000915723794605583
Validation loss = 0.0007410876569338143
Validation loss = 0.0008717412129044533
Validation loss = 0.000864267407450825
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007985620177350938
Validation loss = 0.0007611211622133851
Validation loss = 0.000738361501134932
Validation loss = 0.0006970154936425388
Validation loss = 0.0007340596057474613
Validation loss = 0.0008677042787894607
Validation loss = 0.0007454308215528727
Validation loss = 0.000937152246478945
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007043576333671808
Validation loss = 0.0011404161341488361
Validation loss = 0.0008147241314873099
Validation loss = 0.0008478892268612981
Validation loss = 0.0008402709499932826
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007714323583059013
Validation loss = 0.0008484838181175292
Validation loss = 0.0010119315702468157
Validation loss = 0.0008771213470026851
Validation loss = 0.0008705937070772052
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010382060427218676
Validation loss = 0.000828031450510025
Validation loss = 0.0008194544934667647
Validation loss = 0.0007910421118140221
Validation loss = 0.0009777664672583342
Validation loss = 0.0008597302949056029
Validation loss = 0.0011048174928873777
Validation loss = 0.0007982642855495214
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -111     |
| Iteration     | 73       |
| MaximumReturn | -0.00303 |
| MinimumReturn | -165     |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007521322695538402
Validation loss = 0.0008544143056496978
Validation loss = 0.0007226758752949536
Validation loss = 0.0009372644708491862
Validation loss = 0.000699537864420563
Validation loss = 0.0007020227494649589
Validation loss = 0.0006495675770565867
Validation loss = 0.000864070316310972
Validation loss = 0.0006828338373452425
Validation loss = 0.0006710081361234188
Validation loss = 0.0018746454734355211
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007747914060018957
Validation loss = 0.0008059180108830333
Validation loss = 0.0009031401132233441
Validation loss = 0.0007818262092769146
Validation loss = 0.0007202087435871363
Validation loss = 0.0008130933856591582
Validation loss = 0.001016185968182981
Validation loss = 0.0007697256514802575
Validation loss = 0.0007935830508358777
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008285612566396594
Validation loss = 0.0007980269147083163
Validation loss = 0.0006741149118170142
Validation loss = 0.0007959394133649766
Validation loss = 0.0008732712012715638
Validation loss = 0.0008688995148986578
Validation loss = 0.0008009370067156851
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007918626070022583
Validation loss = 0.0008037547231651843
Validation loss = 0.0011310827685520053
Validation loss = 0.0007848187815397978
Validation loss = 0.0009318956290371716
Validation loss = 0.0008497585076838732
Validation loss = 0.0008049423922784626
Validation loss = 0.0007005520747043192
Validation loss = 0.0014052120968699455
Validation loss = 0.0008053171914070845
Validation loss = 0.0006989833782427013
Validation loss = 0.0006867915508337319
Validation loss = 0.0007526090485043824
Validation loss = 0.0007000156329013407
Validation loss = 0.0007354200934059918
Validation loss = 0.0007389288512058556
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014478318626061082
Validation loss = 0.0008692003320902586
Validation loss = 0.0008940473198890686
Validation loss = 0.0007550950394943357
Validation loss = 0.0007520159124396741
Validation loss = 0.0009195352322421968
Validation loss = 0.0007683983421884477
Validation loss = 0.0007675810484215617
Validation loss = 0.0007440827321261168
Validation loss = 0.0007499827770516276
Validation loss = 0.0010387991787865758
Validation loss = 0.0009327845764346421
Validation loss = 0.0007005129009485245
Validation loss = 0.0007588924490846694
Validation loss = 0.0006920292507857084
Validation loss = 0.0016371759120374918
Validation loss = 0.0007335864356718957
Validation loss = 0.001161108841188252
Validation loss = 0.0007289860513992608
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -145     |
| Iteration     | 74       |
| MaximumReturn | -117     |
| MinimumReturn | -168     |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007889075786806643
Validation loss = 0.0007652539061382413
Validation loss = 0.0012308649020269513
Validation loss = 0.0006742061232216656
Validation loss = 0.0008055925718508661
Validation loss = 0.0011300966143608093
Validation loss = 0.0008368749404326081
Validation loss = 0.0010373705299571157
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006519000744447112
Validation loss = 0.0009436816908419132
Validation loss = 0.0006576105952262878
Validation loss = 0.0009793519275262952
Validation loss = 0.0007796975551173091
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006917193531990051
Validation loss = 0.0007703836308792233
Validation loss = 0.0014240327291190624
Validation loss = 0.0007990758749656379
Validation loss = 0.0007692856597714126
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009661489748395979
Validation loss = 0.0007001887424848974
Validation loss = 0.0007840456091798842
Validation loss = 0.0008261207840405405
Validation loss = 0.0007200600230135024
Validation loss = 0.0007913868757896125
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008741228957660496
Validation loss = 0.0008831416489556432
Validation loss = 0.0007740378496237099
Validation loss = 0.0008602036978118122
Validation loss = 0.0008211135282181203
Validation loss = 0.0007946004043333232
Validation loss = 0.0008577597327530384
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -138     |
| Iteration     | 75       |
| MaximumReturn | -89.9    |
| MinimumReturn | -158     |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008426958229392767
Validation loss = 0.0007061238866299391
Validation loss = 0.0008655751007609069
Validation loss = 0.0006902788882143795
Validation loss = 0.0008545726304873824
Validation loss = 0.0006885461043566465
Validation loss = 0.0009032495436258614
Validation loss = 0.0007446784875355661
Validation loss = 0.0006398762343451381
Validation loss = 0.0006685355911031365
Validation loss = 0.0006893655518069863
Validation loss = 0.0009729472803883255
Validation loss = 0.0006551821134053171
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007192518096417189
Validation loss = 0.0007699078414589167
Validation loss = 0.0007557520875707269
Validation loss = 0.0007534653996117413
Validation loss = 0.0006932882824912667
Validation loss = 0.0008435520576313138
Validation loss = 0.000705284415744245
Validation loss = 0.0008707566885277629
Validation loss = 0.0006700746016576886
Validation loss = 0.0007395893335342407
Validation loss = 0.0007225530571304262
Validation loss = 0.0006822544964961708
Validation loss = 0.0009538506856188178
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007214418146759272
Validation loss = 0.0007034057634882629
Validation loss = 0.0008516340749338269
Validation loss = 0.0008662039181217551
Validation loss = 0.0007463076617568731
Validation loss = 0.0010262071155011654
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009323327685706317
Validation loss = 0.0007643039571121335
Validation loss = 0.0008470084285363555
Validation loss = 0.0007605566061101854
Validation loss = 0.0008470345055684447
Validation loss = 0.0006823746953159571
Validation loss = 0.0006535582942888141
Validation loss = 0.0007805113564245403
Validation loss = 0.0007764843758195639
Validation loss = 0.0011361349606886506
Validation loss = 0.0006571713602170348
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010221709962934256
Validation loss = 0.0008178415009751916
Validation loss = 0.0008199845324270427
Validation loss = 0.0007886228850111365
Validation loss = 0.0009015874820761383
Validation loss = 0.0012407430913299322
Validation loss = 0.00067971769021824
Validation loss = 0.0006226610857993364
Validation loss = 0.0005939508555456996
Validation loss = 0.0011441274546086788
Validation loss = 0.0008353021694347262
Validation loss = 0.00079921237193048
Validation loss = 0.0009592120768502355
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -118     |
| Iteration     | 76       |
| MaximumReturn | -0.758   |
| MinimumReturn | -166     |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006503015174530447
Validation loss = 0.0008216982241719961
Validation loss = 0.0006578120519407094
Validation loss = 0.0007936996989883482
Validation loss = 0.0007738943095318973
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008595625404268503
Validation loss = 0.0008615678525529802
Validation loss = 0.0008235107525251806
Validation loss = 0.0006706268759444356
Validation loss = 0.0008660152088850737
Validation loss = 0.0008964127046056092
Validation loss = 0.0009408923797309399
Validation loss = 0.0006997378077358007
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007348082144744694
Validation loss = 0.0006839941488578916
Validation loss = 0.0007932472508400679
Validation loss = 0.0011987024918198586
Validation loss = 0.0007199959945864975
Validation loss = 0.001442375360056758
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007269890047609806
Validation loss = 0.0007747912313789129
Validation loss = 0.0006397366523742676
Validation loss = 0.0008658320875838399
Validation loss = 0.0007678496767766774
Validation loss = 0.0007754186517558992
Validation loss = 0.0007367345970124006
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008900807588361204
Validation loss = 0.0006854183156974614
Validation loss = 0.0007875481387600303
Validation loss = 0.0007549597648903728
Validation loss = 0.0008004670962691307
Validation loss = 0.000878607330378145
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -115     |
| Iteration     | 77       |
| MaximumReturn | -0.00211 |
| MinimumReturn | -168     |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006970985559746623
Validation loss = 0.001167594688013196
Validation loss = 0.000639423553366214
Validation loss = 0.0007765386253595352
Validation loss = 0.0008590549696236849
Validation loss = 0.0006672824383713305
Validation loss = 0.0006992504931986332
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006751618930138648
Validation loss = 0.000706841645296663
Validation loss = 0.0007274238741956651
Validation loss = 0.0012244473909959197
Validation loss = 0.0007471715216524899
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006408729241229594
Validation loss = 0.0005979293491691351
Validation loss = 0.0007709735655225813
Validation loss = 0.0006731968023814261
Validation loss = 0.0008478947565890849
Validation loss = 0.0007145251729525626
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008664845954626799
Validation loss = 0.0011304243234917521
Validation loss = 0.000852831348311156
Validation loss = 0.0005802864907309413
Validation loss = 0.0007082384545356035
Validation loss = 0.0007764051551930606
Validation loss = 0.0007290653302334249
Validation loss = 0.0008019287488423288
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006164272781461477
Validation loss = 0.0007731498335488141
Validation loss = 0.000785678275860846
Validation loss = 0.0007036072202026844
Validation loss = 0.0011265772627666593
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -160     |
| Iteration     | 78       |
| MaximumReturn | -135     |
| MinimumReturn | -180     |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006001688889227808
Validation loss = 0.0006831057835370302
Validation loss = 0.0007223804714158177
Validation loss = 0.0007000029436312616
Validation loss = 0.0009304321138188243
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006732200272381306
Validation loss = 0.0006988486857153475
Validation loss = 0.0006659907521679997
Validation loss = 0.000780333939474076
Validation loss = 0.0006242236122488976
Validation loss = 0.000668227847199887
Validation loss = 0.0006726720021106303
Validation loss = 0.0006899487925693393
Validation loss = 0.0006606155657209456
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000833008554764092
Validation loss = 0.000848601630423218
Validation loss = 0.0006850244244560599
Validation loss = 0.0006447630003094673
Validation loss = 0.000715243979357183
Validation loss = 0.001132786856032908
Validation loss = 0.0008089970797300339
Validation loss = 0.0006232549203559756
Validation loss = 0.0007476508035324514
Validation loss = 0.0006918282015249133
Validation loss = 0.000818745989818126
Validation loss = 0.001480904407799244
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006633798475377262
Validation loss = 0.0006447577034123242
Validation loss = 0.0006399028934538364
Validation loss = 0.0010060147615149617
Validation loss = 0.0005828296998515725
Validation loss = 0.0006775708170607686
Validation loss = 0.0007623101118952036
Validation loss = 0.0007157515501603484
Validation loss = 0.000781612063292414
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007826301152817905
Validation loss = 0.000736541929654777
Validation loss = 0.0007737284759059548
Validation loss = 0.0006781465490348637
Validation loss = 0.0007046651444397867
Validation loss = 0.0007589327287860215
Validation loss = 0.0006536543951369822
Validation loss = 0.0008111749775707722
Validation loss = 0.0007561227539554238
Validation loss = 0.0007988120778463781
Validation loss = 0.0007599593955092132
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -160     |
| Iteration     | 79       |
| MaximumReturn | -116     |
| MinimumReturn | -181     |
| TotalSamples  | 134946   |
----------------------------
