Logging to experiments/half_cheetah/control-affine/halfcheetah_seed2341
Print configuration .....
{'env_name': 'half_cheetah', 'random_seeds': [4321, 2314, 2341, 3421], 'save_variables': False, 'model_save_dir': '/tmp/half_cheetah_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 40, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5782804489135742
Validation loss = 0.14535439014434814
Validation loss = 0.1016443595290184
Validation loss = 0.08924376964569092
Validation loss = 0.08439521491527557
Validation loss = 0.0821576863527298
Validation loss = 0.08114307373762131
Validation loss = 0.07884995639324188
Validation loss = 0.07467395067214966
Validation loss = 0.0795774757862091
Validation loss = 0.07326599210500717
Validation loss = 0.07161099463701248
Validation loss = 0.07137065380811691
Validation loss = 0.08026961237192154
Validation loss = 0.07335874438285828
Validation loss = 0.06946215033531189
Validation loss = 0.07367531955242157
Validation loss = 0.06794196367263794
Validation loss = 0.06855957210063934
Validation loss = 0.06986291706562042
Validation loss = 0.06804820895195007
Validation loss = 0.07688126713037491
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5901764631271362
Validation loss = 0.14623403549194336
Validation loss = 0.10173281282186508
Validation loss = 0.08987236768007278
Validation loss = 0.08364880084991455
Validation loss = 0.08098867535591125
Validation loss = 0.08229335397481918
Validation loss = 0.07994809001684189
Validation loss = 0.07645503431558609
Validation loss = 0.07368933409452438
Validation loss = 0.07484409213066101
Validation loss = 0.07643783092498779
Validation loss = 0.07084250450134277
Validation loss = 0.07311446219682693
Validation loss = 0.0709405317902565
Validation loss = 0.07064981758594513
Validation loss = 0.07038160413503647
Validation loss = 0.07341837882995605
Validation loss = 0.06928975135087967
Validation loss = 0.07313632220029831
Validation loss = 0.07219880819320679
Validation loss = 0.06861220300197601
Validation loss = 0.07141083478927612
Validation loss = 0.07161855697631836
Validation loss = 0.06914365291595459
Validation loss = 0.06764116883277893
Validation loss = 0.06950044631958008
Validation loss = 0.0676523894071579
Validation loss = 0.06861923635005951
Validation loss = 0.06922956556081772
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5982617139816284
Validation loss = 0.15084519982337952
Validation loss = 0.10258739441633224
Validation loss = 0.0908377468585968
Validation loss = 0.08610129356384277
Validation loss = 0.08316390216350555
Validation loss = 0.07918242365121841
Validation loss = 0.07605807483196259
Validation loss = 0.0812680795788765
Validation loss = 0.07430574297904968
Validation loss = 0.07496767491102219
Validation loss = 0.07387376576662064
Validation loss = 0.07467891275882721
Validation loss = 0.07223443686962128
Validation loss = 0.08464163541793823
Validation loss = 0.06930074095726013
Validation loss = 0.07054828107357025
Validation loss = 0.06881988793611526
Validation loss = 0.06921741366386414
Validation loss = 0.06896311044692993
Validation loss = 0.06886868923902512
Validation loss = 0.07212384045124054
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5820884704589844
Validation loss = 0.14651024341583252
Validation loss = 0.09994067251682281
Validation loss = 0.08778901398181915
Validation loss = 0.08450275659561157
Validation loss = 0.08092021942138672
Validation loss = 0.07976049929857254
Validation loss = 0.07556162774562836
Validation loss = 0.0849636048078537
Validation loss = 0.07704848051071167
Validation loss = 0.07316873967647552
Validation loss = 0.07366273552179337
Validation loss = 0.07201344519853592
Validation loss = 0.07096579670906067
Validation loss = 0.07506513595581055
Validation loss = 0.06985689699649811
Validation loss = 0.07546982169151306
Validation loss = 0.07091394066810608
Validation loss = 0.0688982903957367
Validation loss = 0.06842399388551712
Validation loss = 0.06932446360588074
Validation loss = 0.07478536665439606
Validation loss = 0.0691818818449974
Validation loss = 0.06607532501220703
Validation loss = 0.06820447742938995
Validation loss = 0.06663624942302704
Validation loss = 0.06791660189628601
Validation loss = 0.06966206431388855
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5680699348449707
Validation loss = 0.14448225498199463
Validation loss = 0.09997919946908951
Validation loss = 0.08788830041885376
Validation loss = 0.08284659683704376
Validation loss = 0.08310961723327637
Validation loss = 0.07713940739631653
Validation loss = 0.076102614402771
Validation loss = 0.07622178643941879
Validation loss = 0.07355359196662903
Validation loss = 0.072604238986969
Validation loss = 0.0750868171453476
Validation loss = 0.07335150241851807
Validation loss = 0.06996625661849976
Validation loss = 0.0696614682674408
Validation loss = 0.07136605679988861
Validation loss = 0.07977065443992615
Validation loss = 0.06929393112659454
Validation loss = 0.07381893694400787
Validation loss = 0.06752052158117294
Validation loss = 0.0694659948348999
Validation loss = 0.0679788589477539
Validation loss = 0.06691759824752808
Validation loss = 0.06975379586219788
Validation loss = 0.06614924222230911
Validation loss = 0.06623323261737823
Validation loss = 0.06802166253328323
Validation loss = 0.0676785409450531
Validation loss = 0.06686133146286011
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -393     |
| Iteration     | 0        |
| MaximumReturn | -310     |
| MinimumReturn | -521     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.14508411288261414
Validation loss = 0.10520897805690765
Validation loss = 0.10675792396068573
Validation loss = 0.10191123932600021
Validation loss = 0.10638850927352905
Validation loss = 0.10388002544641495
Validation loss = 0.10348881781101227
Validation loss = 0.11401696503162384
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1464771181344986
Validation loss = 0.1053481251001358
Validation loss = 0.1044788509607315
Validation loss = 0.11234693229198456
Validation loss = 0.10308970510959625
Validation loss = 0.10247424244880676
Validation loss = 0.10212588310241699
Validation loss = 0.10442247986793518
Validation loss = 0.10357436537742615
Validation loss = 0.10313422977924347
Validation loss = 0.10538475960493088
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14003266394138336
Validation loss = 0.10729404538869858
Validation loss = 0.10956758260726929
Validation loss = 0.10192447155714035
Validation loss = 0.09993208944797516
Validation loss = 0.1042877733707428
Validation loss = 0.10630558431148529
Validation loss = 0.1020120307803154
Validation loss = 0.10108815133571625
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1371612250804901
Validation loss = 0.1062217578291893
Validation loss = 0.10432411730289459
Validation loss = 0.10249001532793045
Validation loss = 0.10182669758796692
Validation loss = 0.10145041346549988
Validation loss = 0.10748956352472305
Validation loss = 0.104543536901474
Validation loss = 0.11090333759784698
Validation loss = 0.10349594056606293
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.13902141153812408
Validation loss = 0.10824102908372879
Validation loss = 0.10651274025440216
Validation loss = 0.10190986096858978
Validation loss = 0.10362902283668518
Validation loss = 0.10502926260232925
Validation loss = 0.10421887785196304
Validation loss = 0.1013348400592804
Validation loss = 0.10489960759878159
Validation loss = 0.10686495900154114
Validation loss = 0.10070937871932983
Validation loss = 0.10481244325637817
Validation loss = 0.10564123839139938
Validation loss = 0.10267625749111176
Validation loss = 0.1068381816148758
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -354     |
| Iteration     | 1        |
| MaximumReturn | -267     |
| MinimumReturn | -424     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12008305639028549
Validation loss = 0.10680120438337326
Validation loss = 0.10439763218164444
Validation loss = 0.10452575236558914
Validation loss = 0.10429222136735916
Validation loss = 0.10587400197982788
Validation loss = 0.10531077533960342
Validation loss = 0.10494381934404373
Validation loss = 0.10651606321334839
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11660357564687729
Validation loss = 0.10717203468084335
Validation loss = 0.10298016667366028
Validation loss = 0.10528070479631424
Validation loss = 0.10561563819646835
Validation loss = 0.10448050498962402
Validation loss = 0.10476866364479065
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11709186434745789
Validation loss = 0.10540466755628586
Validation loss = 0.1036435142159462
Validation loss = 0.10478942841291428
Validation loss = 0.1053643524646759
Validation loss = 0.10411206632852554
Validation loss = 0.10931719094514847
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11912959814071655
Validation loss = 0.10262906551361084
Validation loss = 0.10631223767995834
Validation loss = 0.1044696643948555
Validation loss = 0.1038326844573021
Validation loss = 0.10557463020086288
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.12351956218481064
Validation loss = 0.10252002626657486
Validation loss = 0.1057552695274353
Validation loss = 0.10439895838499069
Validation loss = 0.10675421357154846
Validation loss = 0.10584729164838791
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -330     |
| Iteration     | 2        |
| MaximumReturn | -276     |
| MinimumReturn | -371     |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11414189636707306
Validation loss = 0.10038916766643524
Validation loss = 0.09932467341423035
Validation loss = 0.10112009942531586
Validation loss = 0.10018172860145569
Validation loss = 0.1007000058889389
Validation loss = 0.10060291737318039
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11788555979728699
Validation loss = 0.10153017938137054
Validation loss = 0.09832413494586945
Validation loss = 0.1013839989900589
Validation loss = 0.10244353115558624
Validation loss = 0.1021440178155899
Validation loss = 0.1017376109957695
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11433471739292145
Validation loss = 0.10107859969139099
Validation loss = 0.10024860501289368
Validation loss = 0.09994937479496002
Validation loss = 0.10114441812038422
Validation loss = 0.10157327353954315
Validation loss = 0.10186342895030975
Validation loss = 0.10068730264902115
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11337818950414658
Validation loss = 0.09832888096570969
Validation loss = 0.09873449802398682
Validation loss = 0.10079547762870789
Validation loss = 0.10260792076587677
Validation loss = 0.0994018167257309
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11291491240262985
Validation loss = 0.09917829185724258
Validation loss = 0.09901714324951172
Validation loss = 0.10320964455604553
Validation loss = 0.10102298110723495
Validation loss = 0.10290464758872986
Validation loss = 0.09785337746143341
Validation loss = 0.10213421285152435
Validation loss = 0.1017531305551529
Validation loss = 0.10096900910139084
Validation loss = 0.10176607966423035
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -46.2    |
| Iteration     | 3        |
| MaximumReturn | -15.7    |
| MinimumReturn | -86.4    |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10516508668661118
Validation loss = 0.089122474193573
Validation loss = 0.08852000534534454
Validation loss = 0.08884835988283157
Validation loss = 0.08872009813785553
Validation loss = 0.09097238630056381
Validation loss = 0.0940786823630333
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09831753373146057
Validation loss = 0.08831283450126648
Validation loss = 0.08847809582948685
Validation loss = 0.08864764124155045
Validation loss = 0.08748568594455719
Validation loss = 0.08714058995246887
Validation loss = 0.08625613898038864
Validation loss = 0.08933151513338089
Validation loss = 0.0859571248292923
Validation loss = 0.09512405842542648
Validation loss = 0.08759992569684982
Validation loss = 0.08957315236330032
Validation loss = 0.0895315557718277
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10166773945093155
Validation loss = 0.0910346582531929
Validation loss = 0.08856455981731415
Validation loss = 0.08969969302415848
Validation loss = 0.08870165795087814
Validation loss = 0.0889061763882637
Validation loss = 0.08653447777032852
Validation loss = 0.08772467076778412
Validation loss = 0.08826567977666855
Validation loss = 0.08685171604156494
Validation loss = 0.09023602306842804
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09573081880807877
Validation loss = 0.0894748792052269
Validation loss = 0.08913162350654602
Validation loss = 0.08824510872364044
Validation loss = 0.08767841756343842
Validation loss = 0.08897047489881516
Validation loss = 0.08811141550540924
Validation loss = 0.08965998142957687
Validation loss = 0.08781006932258606
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09919312596321106
Validation loss = 0.08792237192392349
Validation loss = 0.09071085602045059
Validation loss = 0.09072695672512054
Validation loss = 0.08802835643291473
Validation loss = 0.08632615953683853
Validation loss = 0.08857368677854538
Validation loss = 0.08843763917684555
Validation loss = 0.08770515024662018
Validation loss = 0.0876954048871994
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 257      |
| Iteration     | 4        |
| MaximumReturn | 353      |
| MinimumReturn | 165      |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.099785715341568
Validation loss = 0.07385419309139252
Validation loss = 0.07142194360494614
Validation loss = 0.07387823611497879
Validation loss = 0.07123631238937378
Validation loss = 0.0699610561132431
Validation loss = 0.07060938328504562
Validation loss = 0.06951108574867249
Validation loss = 0.07318291813135147
Validation loss = 0.0716283768415451
Validation loss = 0.07182952016592026
Validation loss = 0.06976526230573654
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10048907995223999
Validation loss = 0.07231809943914413
Validation loss = 0.07061520218849182
Validation loss = 0.07130845636129379
Validation loss = 0.07066486775875092
Validation loss = 0.0706629604101181
Validation loss = 0.07017378509044647
Validation loss = 0.06865230947732925
Validation loss = 0.06961752474308014
Validation loss = 0.07035668939352036
Validation loss = 0.06848552078008652
Validation loss = 0.06787706166505814
Validation loss = 0.0710759162902832
Validation loss = 0.06858296692371368
Validation loss = 0.0698205903172493
Validation loss = 0.06921285390853882
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10525640845298767
Validation loss = 0.07290127128362656
Validation loss = 0.07176658511161804
Validation loss = 0.06965881586074829
Validation loss = 0.06922829151153564
Validation loss = 0.06907311826944351
Validation loss = 0.06867486238479614
Validation loss = 0.06940380483865738
Validation loss = 0.06950441747903824
Validation loss = 0.06890847533941269
Validation loss = 0.07022414356470108
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09357865899801254
Validation loss = 0.07329528778791428
Validation loss = 0.06938755512237549
Validation loss = 0.07100667804479599
Validation loss = 0.06759300082921982
Validation loss = 0.06994518637657166
Validation loss = 0.0676954984664917
Validation loss = 0.07041662186384201
Validation loss = 0.07041328400373459
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10210668295621872
Validation loss = 0.07259511202573776
Validation loss = 0.06989031285047531
Validation loss = 0.06927815824747086
Validation loss = 0.06897010654211044
Validation loss = 0.06851369887590408
Validation loss = 0.06884033977985382
Validation loss = 0.06786639988422394
Validation loss = 0.06878582388162613
Validation loss = 0.06816080957651138
Validation loss = 0.06861913949251175
Validation loss = 0.06905070692300797
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 897      |
| Iteration     | 5        |
| MaximumReturn | 1.07e+03 |
| MinimumReturn | 759      |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07174216955900192
Validation loss = 0.06348976492881775
Validation loss = 0.06702078133821487
Validation loss = 0.06399179250001907
Validation loss = 0.06299642473459244
Validation loss = 0.06277840584516525
Validation loss = 0.061616189777851105
Validation loss = 0.06523222476243973
Validation loss = 0.06359048932790756
Validation loss = 0.06322885304689407
Validation loss = 0.06245383620262146
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0721609964966774
Validation loss = 0.06344357877969742
Validation loss = 0.061560340225696564
Validation loss = 0.06329978257417679
Validation loss = 0.06131839379668236
Validation loss = 0.06382300704717636
Validation loss = 0.06185942143201828
Validation loss = 0.062348056584596634
Validation loss = 0.062048815190792084
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06892553716897964
Validation loss = 0.06258682161569595
Validation loss = 0.06279417127370834
Validation loss = 0.06324190646409988
Validation loss = 0.06344185024499893
Validation loss = 0.061345674097537994
Validation loss = 0.0628892034292221
Validation loss = 0.06619080156087875
Validation loss = 0.06167082116007805
Validation loss = 0.0627077966928482
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06858129799365997
Validation loss = 0.06274792551994324
Validation loss = 0.062160760164260864
Validation loss = 0.06388137489557266
Validation loss = 0.061402492225170135
Validation loss = 0.06280304491519928
Validation loss = 0.06313392519950867
Validation loss = 0.06078854948282242
Validation loss = 0.06206123158335686
Validation loss = 0.061263035982847214
Validation loss = 0.06300804018974304
Validation loss = 0.061733342707157135
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07082455605268478
Validation loss = 0.06302980333566666
Validation loss = 0.0615345723927021
Validation loss = 0.0617445670068264
Validation loss = 0.061871934682130814
Validation loss = 0.06172672286629677
Validation loss = 0.06276942044496536
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.12e+03 |
| Iteration     | 6        |
| MaximumReturn | 1.34e+03 |
| MinimumReturn | 1.03e+03 |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06538075953722
Validation loss = 0.061034657061100006
Validation loss = 0.06026773899793625
Validation loss = 0.059984274208545685
Validation loss = 0.059195369482040405
Validation loss = 0.05930159240961075
Validation loss = 0.059652965515851974
Validation loss = 0.06011473760008812
Validation loss = 0.05977515131235123
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06376815587282181
Validation loss = 0.060121748596429825
Validation loss = 0.05854183807969093
Validation loss = 0.05959342420101166
Validation loss = 0.05827286094427109
Validation loss = 0.05828595906496048
Validation loss = 0.05908871442079544
Validation loss = 0.05967755615711212
Validation loss = 0.05902358889579773
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06668566167354584
Validation loss = 0.060720235109329224
Validation loss = 0.06119776517152786
Validation loss = 0.05995272099971771
Validation loss = 0.059836290776729584
Validation loss = 0.05904906988143921
Validation loss = 0.05857507884502411
Validation loss = 0.060532424598932266
Validation loss = 0.059328868985176086
Validation loss = 0.05949348583817482
Validation loss = 0.05874022841453552
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06384792178869247
Validation loss = 0.05892935395240784
Validation loss = 0.058353815227746964
Validation loss = 0.05876627564430237
Validation loss = 0.05833182483911514
Validation loss = 0.05858783423900604
Validation loss = 0.05991796404123306
Validation loss = 0.05975956842303276
Validation loss = 0.05953299254179001
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06544692814350128
Validation loss = 0.060877442359924316
Validation loss = 0.058387257158756256
Validation loss = 0.06017769128084183
Validation loss = 0.05972982197999954
Validation loss = 0.05888663977384567
Validation loss = 0.059559062123298645
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.36e+03 |
| Iteration     | 7        |
| MaximumReturn | 1.56e+03 |
| MinimumReturn | 1.03e+03 |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06289497017860413
Validation loss = 0.058523278683423996
Validation loss = 0.05763917416334152
Validation loss = 0.05730654299259186
Validation loss = 0.05772058293223381
Validation loss = 0.0577084943652153
Validation loss = 0.05667801573872566
Validation loss = 0.05739559233188629
Validation loss = 0.05626358464360237
Validation loss = 0.05661033093929291
Validation loss = 0.056512754410505295
Validation loss = 0.057457420974969864
Validation loss = 0.05766473338007927
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.060128066688776016
Validation loss = 0.057225439697504044
Validation loss = 0.05731707066297531
Validation loss = 0.05644941329956055
Validation loss = 0.05714062228798866
Validation loss = 0.05555335059762001
Validation loss = 0.056282028555870056
Validation loss = 0.057599324733018875
Validation loss = 0.05765660107135773
Validation loss = 0.056653447449207306
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06070709228515625
Validation loss = 0.057290010154247284
Validation loss = 0.05689940229058266
Validation loss = 0.05716362223029137
Validation loss = 0.05669397860765457
Validation loss = 0.056552350521087646
Validation loss = 0.05725889652967453
Validation loss = 0.056297168135643005
Validation loss = 0.05667232722043991
Validation loss = 0.05684872344136238
Validation loss = 0.055507589131593704
Validation loss = 0.05630930885672569
Validation loss = 0.05728355422616005
Validation loss = 0.0563504621386528
Validation loss = 0.05763407051563263
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.061591241508722305
Validation loss = 0.05717257037758827
Validation loss = 0.05584719404578209
Validation loss = 0.06016620621085167
Validation loss = 0.056968286633491516
Validation loss = 0.05630035698413849
Validation loss = 0.056854136288166046
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06262257695198059
Validation loss = 0.05741104856133461
Validation loss = 0.058290235698223114
Validation loss = 0.05633511766791344
Validation loss = 0.05673883482813835
Validation loss = 0.057748861610889435
Validation loss = 0.05768879875540733
Validation loss = 0.057121727615594864
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.51e+03 |
| Iteration     | 8        |
| MaximumReturn | 2.01e+03 |
| MinimumReturn | 80.4     |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06624491512775421
Validation loss = 0.054877281188964844
Validation loss = 0.05463656783103943
Validation loss = 0.05605948716402054
Validation loss = 0.054581958800554276
Validation loss = 0.05524257570505142
Validation loss = 0.05490203574299812
Validation loss = 0.05462001636624336
Validation loss = 0.05547568202018738
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05937046930193901
Validation loss = 0.056900907307863235
Validation loss = 0.053557150065898895
Validation loss = 0.054964881390333176
Validation loss = 0.05573921650648117
Validation loss = 0.05537095665931702
Validation loss = 0.05477152392268181
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06102960184216499
Validation loss = 0.05511803179979324
Validation loss = 0.05431491136550903
Validation loss = 0.05408090353012085
Validation loss = 0.0551290325820446
Validation loss = 0.05393928289413452
Validation loss = 0.05341862514615059
Validation loss = 0.054190248250961304
Validation loss = 0.0546700656414032
Validation loss = 0.05507449433207512
Validation loss = 0.054904669523239136
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.061693668365478516
Validation loss = 0.054618753492832184
Validation loss = 0.0549861416220665
Validation loss = 0.05335365608334541
Validation loss = 0.05382484942674637
Validation loss = 0.05399138480424881
Validation loss = 0.0550527349114418
Validation loss = 0.05450309067964554
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0604708306491375
Validation loss = 0.05609652400016785
Validation loss = 0.055780667811632156
Validation loss = 0.056287623941898346
Validation loss = 0.055798716843128204
Validation loss = 0.05594732239842415
Validation loss = 0.056083180010318756
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.57e+03 |
| Iteration     | 9        |
| MaximumReturn | 2.13e+03 |
| MinimumReturn | 45.6     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06276526302099228
Validation loss = 0.057971660047769547
Validation loss = 0.05729878321290016
Validation loss = 0.059941425919532776
Validation loss = 0.059954989701509476
Validation loss = 0.05889763683080673
Validation loss = 0.05948528274893761
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06212792173027992
Validation loss = 0.05799514427781105
Validation loss = 0.05867799371480942
Validation loss = 0.05721956491470337
Validation loss = 0.058467913419008255
Validation loss = 0.05698439106345177
Validation loss = 0.05654382333159447
Validation loss = 0.05814443901181221
Validation loss = 0.059755828231573105
Validation loss = 0.05848902836441994
Validation loss = 0.05661269649863243
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.060878168791532516
Validation loss = 0.059256717562675476
Validation loss = 0.05747447907924652
Validation loss = 0.05931730195879936
Validation loss = 0.05726590007543564
Validation loss = 0.05720505490899086
Validation loss = 0.057811688631772995
Validation loss = 0.05749279633164406
Validation loss = 0.057256732136011124
Validation loss = 0.057276543229818344
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06263280659914017
Validation loss = 0.05715050920844078
Validation loss = 0.058277811855077744
Validation loss = 0.05661322921514511
Validation loss = 0.05800970643758774
Validation loss = 0.06061027944087982
Validation loss = 0.05906734615564346
Validation loss = 0.0585087314248085
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06132467836141586
Validation loss = 0.05872822180390358
Validation loss = 0.059596236795186996
Validation loss = 0.05925188958644867
Validation loss = 0.058588698506355286
Validation loss = 0.06052258238196373
Validation loss = 0.057469964027404785
Validation loss = 0.05894531309604645
Validation loss = 0.05974165350198746
Validation loss = 0.0583210214972496
Validation loss = 0.057823944836854935
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.55e+03 |
| Iteration     | 10       |
| MaximumReturn | 1.94e+03 |
| MinimumReturn | 1.08e+03 |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.057904865592718124
Validation loss = 0.053557638078927994
Validation loss = 0.053253173828125
Validation loss = 0.0528058223426342
Validation loss = 0.05346404388546944
Validation loss = 0.05312354862689972
Validation loss = 0.05317994952201843
Validation loss = 0.05311848595738411
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.058867454528808594
Validation loss = 0.05363263562321663
Validation loss = 0.05326607823371887
Validation loss = 0.05375796556472778
Validation loss = 0.05158263444900513
Validation loss = 0.054623112082481384
Validation loss = 0.05294441059231758
Validation loss = 0.05395042896270752
Validation loss = 0.0530889630317688
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05746258422732353
Validation loss = 0.05296991392970085
Validation loss = 0.053094152361154556
Validation loss = 0.05403288081288338
Validation loss = 0.052496571093797684
Validation loss = 0.05332330986857414
Validation loss = 0.05214511230587959
Validation loss = 0.05435128137469292
Validation loss = 0.05283360555768013
Validation loss = 0.05406472459435463
Validation loss = 0.0531151108443737
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.057076346129179
Validation loss = 0.05248764157295227
Validation loss = 0.052587080746889114
Validation loss = 0.052835989743471146
Validation loss = 0.05248090624809265
Validation loss = 0.052908334881067276
Validation loss = 0.051745589822530746
Validation loss = 0.0517345629632473
Validation loss = 0.05183535814285278
Validation loss = 0.051926422864198685
Validation loss = 0.051638346165418625
Validation loss = 0.05326193571090698
Validation loss = 0.052581291645765305
Validation loss = 0.05245557799935341
Validation loss = 0.052213069051504135
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.057423774152994156
Validation loss = 0.05370742455124855
Validation loss = 0.05317351594567299
Validation loss = 0.052968159317970276
Validation loss = 0.05322284996509552
Validation loss = 0.05405513569712639
Validation loss = 0.05333086848258972
Validation loss = 0.054394613951444626
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 962      |
| Iteration     | 11       |
| MaximumReturn | 2.19e+03 |
| MinimumReturn | 221      |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05820421501994133
Validation loss = 0.05199754238128662
Validation loss = 0.05304022505879402
Validation loss = 0.05276678502559662
Validation loss = 0.0530422143638134
Validation loss = 0.05443311110138893
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05843069776892662
Validation loss = 0.05269265174865723
Validation loss = 0.05294477939605713
Validation loss = 0.05336892977356911
Validation loss = 0.05282534658908844
Validation loss = 0.05305630713701248
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05616741627454758
Validation loss = 0.0524391271173954
Validation loss = 0.05309944972395897
Validation loss = 0.053359463810920715
Validation loss = 0.054812707006931305
Validation loss = 0.05366693064570427
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05539534613490105
Validation loss = 0.05236972123384476
Validation loss = 0.05262182280421257
Validation loss = 0.05199352279305458
Validation loss = 0.05226331576704979
Validation loss = 0.052419133484363556
Validation loss = 0.05159550532698631
Validation loss = 0.05233776569366455
Validation loss = 0.05260148271918297
Validation loss = 0.05191417783498764
Validation loss = 0.05219460278749466
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05781234800815582
Validation loss = 0.053383275866508484
Validation loss = 0.05321969464421272
Validation loss = 0.05305849760770798
Validation loss = 0.053580574691295624
Validation loss = 0.0533742755651474
Validation loss = 0.05311345309019089
Validation loss = 0.05263831838965416
Validation loss = 0.05299879610538483
Validation loss = 0.054883845150470734
Validation loss = 0.05367788299918175
Validation loss = 0.05375370755791664
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 297      |
| Iteration     | 12       |
| MaximumReturn | 1.54e+03 |
| MinimumReturn | -177     |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.058575812727212906
Validation loss = 0.051834311336278915
Validation loss = 0.05283951386809349
Validation loss = 0.05293593183159828
Validation loss = 0.051912058144807816
Validation loss = 0.05243871733546257
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.055696599185466766
Validation loss = 0.0520663857460022
Validation loss = 0.05218393728137016
Validation loss = 0.052256904542446136
Validation loss = 0.05223839357495308
Validation loss = 0.05241355299949646
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.054885685443878174
Validation loss = 0.05230478569865227
Validation loss = 0.0515214242041111
Validation loss = 0.05249558016657829
Validation loss = 0.05195879936218262
Validation loss = 0.05078580975532532
Validation loss = 0.05192791670560837
Validation loss = 0.052076589316129684
Validation loss = 0.05185547098517418
Validation loss = 0.05355862155556679
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05483012646436691
Validation loss = 0.05173518508672714
Validation loss = 0.050713684409856796
Validation loss = 0.05011860653758049
Validation loss = 0.05062517523765564
Validation loss = 0.05037400871515274
Validation loss = 0.05080101639032364
Validation loss = 0.050606876611709595
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0575883686542511
Validation loss = 0.051786523312330246
Validation loss = 0.05300232768058777
Validation loss = 0.052692003548145294
Validation loss = 0.052435435354709625
Validation loss = 0.05228770524263382
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -78.9    |
| Iteration     | 13       |
| MaximumReturn | 300      |
| MinimumReturn | -262     |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05658028647303581
Validation loss = 0.05206255242228508
Validation loss = 0.0521678552031517
Validation loss = 0.05251312255859375
Validation loss = 0.05178012698888779
Validation loss = 0.05205826833844185
Validation loss = 0.05144532769918442
Validation loss = 0.051948193460702896
Validation loss = 0.05224103108048439
Validation loss = 0.05290328711271286
Validation loss = 0.05109770968556404
Validation loss = 0.05115480348467827
Validation loss = 0.05064748227596283
Validation loss = 0.052227482199668884
Validation loss = 0.05142165347933769
Validation loss = 0.05109492316842079
Validation loss = 0.05208320915699005
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05561470612883568
Validation loss = 0.05268420651555061
Validation loss = 0.05229354649782181
Validation loss = 0.05242527276277542
Validation loss = 0.05202806740999222
Validation loss = 0.052301716059446335
Validation loss = 0.0531553253531456
Validation loss = 0.05111082270741463
Validation loss = 0.05287479981780052
Validation loss = 0.052556462585926056
Validation loss = 0.05170045420527458
Validation loss = 0.05116622895002365
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05771944299340248
Validation loss = 0.05067091062664986
Validation loss = 0.051270607858896255
Validation loss = 0.051029007881879807
Validation loss = 0.05235550180077553
Validation loss = 0.05175231024622917
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0558876171708107
Validation loss = 0.04997868090867996
Validation loss = 0.050947126001119614
Validation loss = 0.05002391338348389
Validation loss = 0.051128730177879333
Validation loss = 0.051048461347818375
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.058167148381471634
Validation loss = 0.05218152329325676
Validation loss = 0.05285944789648056
Validation loss = 0.05170983076095581
Validation loss = 0.05191449820995331
Validation loss = 0.05235440656542778
Validation loss = 0.05083010345697403
Validation loss = 0.05235111713409424
Validation loss = 0.05167922005057335
Validation loss = 0.05285410210490227
Validation loss = 0.051421310752630234
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -197     |
| Iteration     | 14       |
| MaximumReturn | 97.7     |
| MinimumReturn | -400     |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05833263695240021
Validation loss = 0.05313771218061447
Validation loss = 0.052200816571712494
Validation loss = 0.052032023668289185
Validation loss = 0.05185980349779129
Validation loss = 0.05118238925933838
Validation loss = 0.05210461467504501
Validation loss = 0.05163361877202988
Validation loss = 0.050962597131729126
Validation loss = 0.05145672708749771
Validation loss = 0.05139778554439545
Validation loss = 0.05237705260515213
Validation loss = 0.051855649799108505
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05693862587213516
Validation loss = 0.052689120173454285
Validation loss = 0.05233074724674225
Validation loss = 0.05252039432525635
Validation loss = 0.05343278869986534
Validation loss = 0.052931033074855804
Validation loss = 0.05189839005470276
Validation loss = 0.0521036796271801
Validation loss = 0.05161735415458679
Validation loss = 0.05208861827850342
Validation loss = 0.05111677944660187
Validation loss = 0.05189194530248642
Validation loss = 0.05288813263177872
Validation loss = 0.051723726093769073
Validation loss = 0.05175277590751648
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0551585853099823
Validation loss = 0.05159439519047737
Validation loss = 0.05237103998661041
Validation loss = 0.05206746608018875
Validation loss = 0.05275052413344383
Validation loss = 0.05162991210818291
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05597732216119766
Validation loss = 0.05046188831329346
Validation loss = 0.05089646205306053
Validation loss = 0.05155397206544876
Validation loss = 0.051551252603530884
Validation loss = 0.05100022628903389
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05781891196966171
Validation loss = 0.0524400994181633
Validation loss = 0.05260298401117325
Validation loss = 0.05377829819917679
Validation loss = 0.05174349248409271
Validation loss = 0.053488247096538544
Validation loss = 0.05238217115402222
Validation loss = 0.05347517132759094
Validation loss = 0.05269637703895569
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 127      |
| Iteration     | 15       |
| MaximumReturn | 725      |
| MinimumReturn | -386     |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05658145248889923
Validation loss = 0.05222111940383911
Validation loss = 0.05088808760046959
Validation loss = 0.05026409402489662
Validation loss = 0.05109255015850067
Validation loss = 0.05101468041539192
Validation loss = 0.05067065358161926
Validation loss = 0.051842328161001205
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05587293952703476
Validation loss = 0.05057890713214874
Validation loss = 0.051530640572309494
Validation loss = 0.051339041441679
Validation loss = 0.050966233015060425
Validation loss = 0.0514218732714653
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.053684085607528687
Validation loss = 0.05185094475746155
Validation loss = 0.05135708674788475
Validation loss = 0.05123281106352806
Validation loss = 0.051040150225162506
Validation loss = 0.051282912492752075
Validation loss = 0.05168445035815239
Validation loss = 0.051584869623184204
Validation loss = 0.05118563398718834
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.053484492003917694
Validation loss = 0.05033419653773308
Validation loss = 0.05043000727891922
Validation loss = 0.04964645206928253
Validation loss = 0.05099715292453766
Validation loss = 0.05066513642668724
Validation loss = 0.05116386339068413
Validation loss = 0.05141521617770195
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05642066150903702
Validation loss = 0.050446152687072754
Validation loss = 0.051483023911714554
Validation loss = 0.05205240100622177
Validation loss = 0.05118041858077049
Validation loss = 0.05163654685020447
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 58.7     |
| Iteration     | 16       |
| MaximumReturn | 481      |
| MinimumReturn | -337     |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.052597492933273315
Validation loss = 0.05018814280629158
Validation loss = 0.050235722213983536
Validation loss = 0.050447601824998856
Validation loss = 0.0513167530298233
Validation loss = 0.05033798888325691
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05477385222911835
Validation loss = 0.05007313936948776
Validation loss = 0.05037815868854523
Validation loss = 0.051087576895952225
Validation loss = 0.051931463181972504
Validation loss = 0.05134572088718414
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0528591088950634
Validation loss = 0.05079205334186554
Validation loss = 0.051103752106428146
Validation loss = 0.050974324345588684
Validation loss = 0.05208583548665047
Validation loss = 0.0507003515958786
Validation loss = 0.052242740988731384
Validation loss = 0.05072583258152008
Validation loss = 0.05051059648394585
Validation loss = 0.05169028788805008
Validation loss = 0.051483962684869766
Validation loss = 0.05077596381306648
Validation loss = 0.05123065412044525
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05275873467326164
Validation loss = 0.04990442469716072
Validation loss = 0.05012250319123268
Validation loss = 0.05142758786678314
Validation loss = 0.05030377581715584
Validation loss = 0.050947532057762146
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0543898269534111
Validation loss = 0.051626432687044144
Validation loss = 0.051121726632118225
Validation loss = 0.05096547678112984
Validation loss = 0.05104697123169899
Validation loss = 0.05130075663328171
Validation loss = 0.05093362182378769
Validation loss = 0.052040670067071915
Validation loss = 0.05274482071399689
Validation loss = 0.051454946398735046
Validation loss = 0.05143706500530243
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 307      |
| Iteration     | 17       |
| MaximumReturn | 1.2e+03  |
| MinimumReturn | -102     |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05323430150747299
Validation loss = 0.049865011125802994
Validation loss = 0.05103857070207596
Validation loss = 0.048765767365694046
Validation loss = 0.04984867572784424
Validation loss = 0.050107911229133606
Validation loss = 0.049170054495334625
Validation loss = 0.049981739372015
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0534469336271286
Validation loss = 0.05032903328537941
Validation loss = 0.050058718770742416
Validation loss = 0.050403207540512085
Validation loss = 0.050010163336992264
Validation loss = 0.050060614943504333
Validation loss = 0.05006363242864609
Validation loss = 0.05023222044110298
Validation loss = 0.050035785883665085
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05488370731472969
Validation loss = 0.050228044390678406
Validation loss = 0.04945207014679909
Validation loss = 0.04946397617459297
Validation loss = 0.05062142387032509
Validation loss = 0.05030425637960434
Validation loss = 0.05088171362876892
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05605676770210266
Validation loss = 0.04979632794857025
Validation loss = 0.04915940761566162
Validation loss = 0.04894823953509331
Validation loss = 0.04996807873249054
Validation loss = 0.049357153475284576
Validation loss = 0.04919968917965889
Validation loss = 0.050025537610054016
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.053571224212646484
Validation loss = 0.049914732575416565
Validation loss = 0.050024788826704025
Validation loss = 0.04990594461560249
Validation loss = 0.050021082162857056
Validation loss = 0.0496789924800396
Validation loss = 0.05121534690260887
Validation loss = 0.05015992745757103
Validation loss = 0.050495684146881104
Validation loss = 0.05118429288268089
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 395      |
| Iteration     | 18       |
| MaximumReturn | 1.05e+03 |
| MinimumReturn | -185     |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0522298589348793
Validation loss = 0.049196790903806686
Validation loss = 0.04869027063250542
Validation loss = 0.04916748031973839
Validation loss = 0.05061955377459526
Validation loss = 0.04969438165426254
Validation loss = 0.048253633081912994
Validation loss = 0.05040869861841202
Validation loss = 0.049820657819509506
Validation loss = 0.048583902418613434
Validation loss = 0.0492585226893425
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05251346156001091
Validation loss = 0.0494978129863739
Validation loss = 0.049476467072963715
Validation loss = 0.04974069073796272
Validation loss = 0.04941733554005623
Validation loss = 0.049622632563114166
Validation loss = 0.050619542598724365
Validation loss = 0.051190175116062164
Validation loss = 0.049223095178604126
Validation loss = 0.050209105014801025
Validation loss = 0.0492834635078907
Validation loss = 0.05094563961029053
Validation loss = 0.05045931786298752
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.051850903779268265
Validation loss = 0.049822814762592316
Validation loss = 0.05059591680765152
Validation loss = 0.04867095127701759
Validation loss = 0.05031890794634819
Validation loss = 0.04884832352399826
Validation loss = 0.049544353038072586
Validation loss = 0.049004100263118744
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05151016265153885
Validation loss = 0.04873546212911606
Validation loss = 0.049577899277210236
Validation loss = 0.0493827685713768
Validation loss = 0.04823921248316765
Validation loss = 0.047847263514995575
Validation loss = 0.04875171184539795
Validation loss = 0.049128107726573944
Validation loss = 0.049000535160303116
Validation loss = 0.04854429513216019
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.052617091685533524
Validation loss = 0.048760004341602325
Validation loss = 0.049716852605342865
Validation loss = 0.0492025688290596
Validation loss = 0.05017371103167534
Validation loss = 0.04994571954011917
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 352      |
| Iteration     | 19       |
| MaximumReturn | 740      |
| MinimumReturn | -331     |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05423491820693016
Validation loss = 0.04893605411052704
Validation loss = 0.05053945258259773
Validation loss = 0.04912573844194412
Validation loss = 0.05012879520654678
Validation loss = 0.04948784038424492
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05353982746601105
Validation loss = 0.04934057593345642
Validation loss = 0.049760621041059494
Validation loss = 0.050634004175662994
Validation loss = 0.05022754520177841
Validation loss = 0.05052179843187332
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.052476804703474045
Validation loss = 0.0509655587375164
Validation loss = 0.048921920359134674
Validation loss = 0.049353308975696564
Validation loss = 0.050323136150836945
Validation loss = 0.04945474490523338
Validation loss = 0.05030110478401184
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.055116601288318634
Validation loss = 0.04888878017663956
Validation loss = 0.04963865131139755
Validation loss = 0.049819979816675186
Validation loss = 0.048761121928691864
Validation loss = 0.049850303679704666
Validation loss = 0.049428265541791916
Validation loss = 0.05002641677856445
Validation loss = 0.04843474552035332
Validation loss = 0.05024614930152893
Validation loss = 0.04977380111813545
Validation loss = 0.04856035113334656
Validation loss = 0.05016624182462692
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.052992161363363266
Validation loss = 0.05003133416175842
Validation loss = 0.050185125321149826
Validation loss = 0.04958485811948776
Validation loss = 0.04993485286831856
Validation loss = 0.04991856589913368
Validation loss = 0.05049164593219757
Validation loss = 0.049207430332899094
Validation loss = 0.05026837810873985
Validation loss = 0.05036967247724533
Validation loss = 0.05061734840273857
Validation loss = 0.05017494410276413
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 975      |
| Iteration     | 20       |
| MaximumReturn | 1.47e+03 |
| MinimumReturn | -340     |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05196584761142731
Validation loss = 0.050057265907526016
Validation loss = 0.049928031861782074
Validation loss = 0.051018599420785904
Validation loss = 0.04996616020798683
Validation loss = 0.050093866884708405
Validation loss = 0.048824094235897064
Validation loss = 0.0504843108355999
Validation loss = 0.04937049373984337
Validation loss = 0.05001578480005264
Validation loss = 0.04902904853224754
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.052466750144958496
Validation loss = 0.04959777742624283
Validation loss = 0.049401428550481796
Validation loss = 0.04997849091887474
Validation loss = 0.0499078631401062
Validation loss = 0.049980781972408295
Validation loss = 0.050931546837091446
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05159119516611099
Validation loss = 0.04861125722527504
Validation loss = 0.0499456487596035
Validation loss = 0.049242738634347916
Validation loss = 0.04963105544447899
Validation loss = 0.050273597240448
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05041274055838585
Validation loss = 0.049504995346069336
Validation loss = 0.0502755381166935
Validation loss = 0.04959066957235336
Validation loss = 0.049888692796230316
Validation loss = 0.050001438707113266
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05180387571454048
Validation loss = 0.04919556528329849
Validation loss = 0.05014890059828758
Validation loss = 0.050314005464315414
Validation loss = 0.049946609884500504
Validation loss = 0.05003303661942482
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 440      |
| Iteration     | 21       |
| MaximumReturn | 1.28e+03 |
| MinimumReturn | -428     |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.051000140607357025
Validation loss = 0.04889654740691185
Validation loss = 0.0484207384288311
Validation loss = 0.048574674874544144
Validation loss = 0.04901649057865143
Validation loss = 0.049158427864313126
Validation loss = 0.0490528903901577
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.052911173552274704
Validation loss = 0.050216883420944214
Validation loss = 0.04948136582970619
Validation loss = 0.05001422017812729
Validation loss = 0.050371382385492325
Validation loss = 0.049752987921237946
Validation loss = 0.05048900097608566
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05178957059979439
Validation loss = 0.04947466030716896
Validation loss = 0.04978891462087631
Validation loss = 0.049189817160367966
Validation loss = 0.05066397041082382
Validation loss = 0.04898089915513992
Validation loss = 0.05015120282769203
Validation loss = 0.049754880368709564
Validation loss = 0.04877421259880066
Validation loss = 0.0498366616666317
Validation loss = 0.05076183006167412
Validation loss = 0.04933664947748184
Validation loss = 0.05010506883263588
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05085638910531998
Validation loss = 0.04896513745188713
Validation loss = 0.04923703521490097
Validation loss = 0.049007002264261246
Validation loss = 0.05039583519101143
Validation loss = 0.049472685903310776
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.053000714629888535
Validation loss = 0.05051582679152489
Validation loss = 0.04921264201402664
Validation loss = 0.05054692551493645
Validation loss = 0.0512068048119545
Validation loss = 0.05049127712845802
Validation loss = 0.049180153757333755
Validation loss = 0.05058356374502182
Validation loss = 0.049339111894369125
Validation loss = 0.04907115176320076
Validation loss = 0.049123357981443405
Validation loss = 0.04964922368526459
Validation loss = 0.04857797548174858
Validation loss = 0.04953498765826225
Validation loss = 0.0501917228102684
Validation loss = 0.04961680993437767
Validation loss = 0.04968876391649246
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 471      |
| Iteration     | 22       |
| MaximumReturn | 1.64e+03 |
| MinimumReturn | -330     |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05069083347916603
Validation loss = 0.04906770586967468
Validation loss = 0.04974567890167236
Validation loss = 0.04900447651743889
Validation loss = 0.04917318746447563
Validation loss = 0.04998530074954033
Validation loss = 0.048452381044626236
Validation loss = 0.049919139593839645
Validation loss = 0.0495731346309185
Validation loss = 0.04940301179885864
Validation loss = 0.05025697872042656
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05061924830079079
Validation loss = 0.05000677704811096
Validation loss = 0.04902338981628418
Validation loss = 0.050161391496658325
Validation loss = 0.05003736913204193
Validation loss = 0.049588218331336975
Validation loss = 0.05035265162587166
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05024020001292229
Validation loss = 0.04870682954788208
Validation loss = 0.04929841682314873
Validation loss = 0.04928883537650108
Validation loss = 0.04972777143120766
Validation loss = 0.04991236329078674
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.051231373101472855
Validation loss = 0.04874876141548157
Validation loss = 0.04866577312350273
Validation loss = 0.04913048446178436
Validation loss = 0.04946918413043022
Validation loss = 0.04912036657333374
Validation loss = 0.04885921999812126
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05287928506731987
Validation loss = 0.04886956140398979
Validation loss = 0.04924428462982178
Validation loss = 0.04963172599673271
Validation loss = 0.04932019114494324
Validation loss = 0.04902002215385437
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.22e+03 |
| Iteration     | 23       |
| MaximumReturn | 1.74e+03 |
| MinimumReturn | 193      |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.051106587052345276
Validation loss = 0.04866259917616844
Validation loss = 0.04894150793552399
Validation loss = 0.049382906407117844
Validation loss = 0.04893288016319275
Validation loss = 0.04841223731637001
Validation loss = 0.04918227717280388
Validation loss = 0.048846565186977386
Validation loss = 0.049530647695064545
Validation loss = 0.04882090911269188
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05175112560391426
Validation loss = 0.048768483102321625
Validation loss = 0.04985984414815903
Validation loss = 0.04971734434366226
Validation loss = 0.049394331872463226
Validation loss = 0.04950634017586708
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05102105438709259
Validation loss = 0.04942763224244118
Validation loss = 0.05039134994149208
Validation loss = 0.04946621507406235
Validation loss = 0.049072980880737305
Validation loss = 0.049543652683496475
Validation loss = 0.04860910400748253
Validation loss = 0.04891500622034073
Validation loss = 0.05004078894853592
Validation loss = 0.04931725561618805
Validation loss = 0.049987927079200745
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05053257942199707
Validation loss = 0.04877592250704765
Validation loss = 0.04776879772543907
Validation loss = 0.04860963672399521
Validation loss = 0.04837848246097565
Validation loss = 0.04939787834882736
Validation loss = 0.049132417887449265
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05162806063890457
Validation loss = 0.0488392598927021
Validation loss = 0.04942622780799866
Validation loss = 0.04818398132920265
Validation loss = 0.04853764921426773
Validation loss = 0.04881908372044563
Validation loss = 0.04902113601565361
Validation loss = 0.05030994862318039
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -52.8    |
| Iteration     | 24       |
| MaximumReturn | 421      |
| MinimumReturn | -474     |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.050806328654289246
Validation loss = 0.0490526407957077
Validation loss = 0.048973552882671356
Validation loss = 0.04934314265847206
Validation loss = 0.049520671367645264
Validation loss = 0.04884554445743561
Validation loss = 0.04964802414178848
Validation loss = 0.04881351813673973
Validation loss = 0.049714501947164536
Validation loss = 0.049244675785303116
Validation loss = 0.05035059154033661
Validation loss = 0.04842201992869377
Validation loss = 0.04918600991368294
Validation loss = 0.04941944405436516
Validation loss = 0.049754898995161057
Validation loss = 0.04899102449417114
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05125914141535759
Validation loss = 0.04920343682169914
Validation loss = 0.04978185519576073
Validation loss = 0.04958907887339592
Validation loss = 0.04922100529074669
Validation loss = 0.05024762824177742
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.050548817962408066
Validation loss = 0.04873797670006752
Validation loss = 0.048616401851177216
Validation loss = 0.04851388931274414
Validation loss = 0.04760356619954109
Validation loss = 0.048509977757930756
Validation loss = 0.04915962368249893
Validation loss = 0.049253419041633606
Validation loss = 0.049120865762233734
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05039931833744049
Validation loss = 0.04820038750767708
Validation loss = 0.04779285565018654
Validation loss = 0.04907914251089096
Validation loss = 0.04907945916056633
Validation loss = 0.04831266775727272
Validation loss = 0.04952194169163704
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04954596608877182
Validation loss = 0.04874303564429283
Validation loss = 0.048875365406274796
Validation loss = 0.04899626225233078
Validation loss = 0.04921495541930199
Validation loss = 0.04945913702249527
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 4.92     |
| Iteration     | 25       |
| MaximumReturn | 1.64e+03 |
| MinimumReturn | -625     |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05054092034697533
Validation loss = 0.04858648404479027
Validation loss = 0.04945845156908035
Validation loss = 0.048547130078077316
Validation loss = 0.05025032162666321
Validation loss = 0.04903934895992279
Validation loss = 0.048507291823625565
Validation loss = 0.048895906656980515
Validation loss = 0.04907691478729248
Validation loss = 0.04893434792757034
Validation loss = 0.048727843910455704
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05019722506403923
Validation loss = 0.04933164641261101
Validation loss = 0.049550581723451614
Validation loss = 0.04931327700614929
Validation loss = 0.05038917064666748
Validation loss = 0.0487644337117672
Validation loss = 0.048982731997966766
Validation loss = 0.04986043646931648
Validation loss = 0.04998640716075897
Validation loss = 0.04919581860303879
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05005686730146408
Validation loss = 0.04960809648036957
Validation loss = 0.048924293369054794
Validation loss = 0.0494009293615818
Validation loss = 0.04865282028913498
Validation loss = 0.04815593361854553
Validation loss = 0.04878898337483406
Validation loss = 0.04860052838921547
Validation loss = 0.05041076987981796
Validation loss = 0.04822854697704315
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0495108999311924
Validation loss = 0.04799162968993187
Validation loss = 0.04949488118290901
Validation loss = 0.049204256385564804
Validation loss = 0.04932194575667381
Validation loss = 0.04925885796546936
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.050197094678878784
Validation loss = 0.04868527874350548
Validation loss = 0.04845438525080681
Validation loss = 0.04958733916282654
Validation loss = 0.04928869754076004
Validation loss = 0.04917940869927406
Validation loss = 0.04863809794187546
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 341      |
| Iteration     | 26       |
| MaximumReturn | 1.56e+03 |
| MinimumReturn | -403     |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.049781471490859985
Validation loss = 0.0483846589922905
Validation loss = 0.04813294857740402
Validation loss = 0.048580050468444824
Validation loss = 0.04979337379336357
Validation loss = 0.048659417778253555
Validation loss = 0.04849362000823021
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.050359826534986496
Validation loss = 0.048736147582530975
Validation loss = 0.04873029515147209
Validation loss = 0.04850887879729271
Validation loss = 0.04924137145280838
Validation loss = 0.049072809517383575
Validation loss = 0.04820847883820534
Validation loss = 0.04869961366057396
Validation loss = 0.04841877892613411
Validation loss = 0.048909422010183334
Validation loss = 0.04884277656674385
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04984676465392113
Validation loss = 0.048286207020282745
Validation loss = 0.04786859080195427
Validation loss = 0.0485125370323658
Validation loss = 0.0480375662446022
Validation loss = 0.04933028668165207
Validation loss = 0.04817938804626465
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.049917638301849365
Validation loss = 0.04907156899571419
Validation loss = 0.04755384847521782
Validation loss = 0.04867257550358772
Validation loss = 0.048623520880937576
Validation loss = 0.04772268980741501
Validation loss = 0.04833376407623291
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04907025769352913
Validation loss = 0.04896615073084831
Validation loss = 0.04800980165600777
Validation loss = 0.047856591641902924
Validation loss = 0.049027808010578156
Validation loss = 0.04867164418101311
Validation loss = 0.048784248530864716
Validation loss = 0.048327285796403885
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 138      |
| Iteration     | 27       |
| MaximumReturn | 897      |
| MinimumReturn | -439     |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05059199407696724
Validation loss = 0.04860743507742882
Validation loss = 0.04816773533821106
Validation loss = 0.04875621572136879
Validation loss = 0.048338815569877625
Validation loss = 0.04903877526521683
Validation loss = 0.04925335943698883
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04958733171224594
Validation loss = 0.04802020266652107
Validation loss = 0.04840406775474548
Validation loss = 0.0488034226000309
Validation loss = 0.0496101938188076
Validation loss = 0.04906943067908287
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.050330955535173416
Validation loss = 0.048093196004629135
Validation loss = 0.04863050952553749
Validation loss = 0.048331137746572495
Validation loss = 0.04782548546791077
Validation loss = 0.04868507385253906
Validation loss = 0.047723572701215744
Validation loss = 0.04885876923799515
Validation loss = 0.04824114590883255
Validation loss = 0.0476074181497097
Validation loss = 0.0493159145116806
Validation loss = 0.048580601811409
Validation loss = 0.04802630841732025
Validation loss = 0.04791056364774704
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04829263687133789
Validation loss = 0.04821319133043289
Validation loss = 0.047742776572704315
Validation loss = 0.04788908362388611
Validation loss = 0.04880461096763611
Validation loss = 0.04815758392214775
Validation loss = 0.048821281641721725
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0497320182621479
Validation loss = 0.04813612625002861
Validation loss = 0.048631228506565094
Validation loss = 0.04849011078476906
Validation loss = 0.0482197180390358
Validation loss = 0.049356408417224884
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 441      |
| Iteration     | 28       |
| MaximumReturn | 1.86e+03 |
| MinimumReturn | -262     |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04964444786310196
Validation loss = 0.048165660351514816
Validation loss = 0.048042599111795425
Validation loss = 0.04784877225756645
Validation loss = 0.04765617474913597
Validation loss = 0.048082947731018066
Validation loss = 0.047760479152202606
Validation loss = 0.04862628132104874
Validation loss = 0.04824299365282059
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0492098331451416
Validation loss = 0.04858207330107689
Validation loss = 0.048798561096191406
Validation loss = 0.047759704291820526
Validation loss = 0.04850437864661217
Validation loss = 0.048318345099687576
Validation loss = 0.04873708635568619
Validation loss = 0.04854192957282066
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04974282905459404
Validation loss = 0.04758258908987045
Validation loss = 0.04825932905077934
Validation loss = 0.0485856756567955
Validation loss = 0.0491609126329422
Validation loss = 0.04825718700885773
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05043063312768936
Validation loss = 0.047771621495485306
Validation loss = 0.048027388751506805
Validation loss = 0.048923127353191376
Validation loss = 0.047739285975694656
Validation loss = 0.04836813732981682
Validation loss = 0.04916638508439064
Validation loss = 0.04777325689792633
Validation loss = 0.04852423816919327
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.049548618495464325
Validation loss = 0.04788893088698387
Validation loss = 0.04794052615761757
Validation loss = 0.0476347841322422
Validation loss = 0.048203084617853165
Validation loss = 0.048519160598516464
Validation loss = 0.0479726642370224
Validation loss = 0.04825090989470482
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 134      |
| Iteration     | 29       |
| MaximumReturn | 1.78e+03 |
| MinimumReturn | -544     |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.22766278684139252
Validation loss = 0.2148059457540512
Validation loss = 0.2401551604270935
Validation loss = 0.15867270529270172
Validation loss = 0.19787843525409698
Validation loss = 0.19570772349834442
Validation loss = 0.24005305767059326
Validation loss = 0.21638093888759613
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.28910210728645325
Validation loss = 0.24247102439403534
Validation loss = 0.22848954796791077
Validation loss = 0.2994821071624756
Validation loss = 0.20458312332630157
Validation loss = 0.2069198340177536
Validation loss = 0.21961545944213867
Validation loss = 0.26873892545700073
Validation loss = 0.21713745594024658
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.28685227036476135
Validation loss = 0.23012161254882812
Validation loss = 0.21507051587104797
Validation loss = 0.1953851282596588
Validation loss = 0.2059212476015091
Validation loss = 0.24141579866409302
Validation loss = 0.2524397075176239
Validation loss = 0.24152116477489471
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2351829707622528
Validation loss = 0.255420058965683
Validation loss = 0.28321579098701477
Validation loss = 0.3025435209274292
Validation loss = 0.2592967748641968
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3297331631183624
Validation loss = 0.30163756012916565
Validation loss = 0.30535170435905457
Validation loss = 0.34080180525779724
Validation loss = 0.34904971718788147
Validation loss = 0.3670192062854767
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 278      |
| Iteration     | 30       |
| MaximumReturn | 1.13e+03 |
| MinimumReturn | -724     |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2018144130706787
Validation loss = 0.16573213040828705
Validation loss = 0.2176971435546875
Validation loss = 0.22024017572402954
Validation loss = 0.21014252305030823
Validation loss = 0.21432849764823914
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.20335394144058228
Validation loss = 0.18511584401130676
Validation loss = 0.2460295557975769
Validation loss = 0.24117563664913177
Validation loss = 0.19865682721138
Validation loss = 0.2625758647918701
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2604392170906067
Validation loss = 0.24553470313549042
Validation loss = 0.2132655829191208
Validation loss = 0.23676389455795288
Validation loss = 0.2744278907775879
Validation loss = 0.2264343500137329
Validation loss = 0.2134050726890564
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2072235345840454
Validation loss = 0.22083628177642822
Validation loss = 0.24387836456298828
Validation loss = 0.280477911233902
Validation loss = 0.2513726055622101
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.33105266094207764
Validation loss = 0.3575383126735687
Validation loss = 0.26962313055992126
Validation loss = 0.29560407996177673
Validation loss = 0.33257728815078735
Validation loss = 0.32871848344802856
Validation loss = 0.33317485451698303
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.35e+03 |
| Iteration     | 31       |
| MaximumReturn | 2.18e+03 |
| MinimumReturn | 319      |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.19230249524116516
Validation loss = 0.19302241504192352
Validation loss = 0.17837807536125183
Validation loss = 0.20784492790699005
Validation loss = 0.2174040973186493
Validation loss = 0.21139241755008698
Validation loss = 0.22954943776130676
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.24430720508098602
Validation loss = 0.22807300090789795
Validation loss = 0.23074783384799957
Validation loss = 0.26674550771713257
Validation loss = 0.2621757388114929
Validation loss = 0.1867118626832962
Validation loss = 0.19951863586902618
Validation loss = 0.2018875926733017
Validation loss = 0.24188397824764252
Validation loss = 0.24110309779644012
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.19661512970924377
Validation loss = 0.22900700569152832
Validation loss = 0.26723286509513855
Validation loss = 0.2082996368408203
Validation loss = 0.25137680768966675
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.26442575454711914
Validation loss = 0.2440674751996994
Validation loss = 0.3127668797969818
Validation loss = 0.27248862385749817
Validation loss = 0.2566811144351959
Validation loss = 0.28668832778930664
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3249276876449585
Validation loss = 0.35930395126342773
Validation loss = 0.4084744155406952
Validation loss = 0.29510095715522766
Validation loss = 0.304985910654068
Validation loss = 0.2750079929828644
Validation loss = 0.2670612335205078
Validation loss = 0.30656418204307556
Validation loss = 0.2931259572505951
Validation loss = 0.26118430495262146
Validation loss = 0.2998915910720825
Validation loss = 0.27838313579559326
Validation loss = 0.27050161361694336
Validation loss = 0.25176647305488586
Validation loss = 0.268921822309494
Validation loss = 0.29476040601730347
Validation loss = 0.2928159236907959
Validation loss = 0.32744672894477844
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 833      |
| Iteration     | 32       |
| MaximumReturn | 1.92e+03 |
| MinimumReturn | -701     |
| TotalSamples  | 136000   |
----------------------------
