Logging to experiments/gym_cheetahA01/gym_cheetahA01/Fri-28-Oct-2022-03-06-00-PM-CDT_gym_cheetahA01_trpo_iteration_20_seed4321
Print configuration .....
{'env_name': 'gym_cheetahA01', 'random_seeds': [4321, 2314, 2341, 3421], 'save_variables': False, 'model_save_dir': '/tmp/gym_cheetahA01_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.39598608016967773
Validation loss = 0.15533006191253662
Validation loss = 0.11147357523441315
Validation loss = 0.08842272311449051
Validation loss = 0.07915785163640976
Validation loss = 0.07814157009124756
Validation loss = 0.07836902886629105
Validation loss = 0.08464354276657104
Validation loss = 0.08272747695446014
Validation loss = 0.08807998895645142
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.846684455871582
Validation loss = 0.14972877502441406
Validation loss = 0.10712345689535141
Validation loss = 0.08727972209453583
Validation loss = 0.07814866304397583
Validation loss = 0.07489769160747528
Validation loss = 0.07769548892974854
Validation loss = 0.07764080911874771
Validation loss = 0.0954902321100235
Validation loss = 0.08946986496448517
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5613178014755249
Validation loss = 0.1553196907043457
Validation loss = 0.10991869866847992
Validation loss = 0.08714427053928375
Validation loss = 0.07800227403640747
Validation loss = 0.08249974995851517
Validation loss = 0.08425255119800568
Validation loss = 0.10461565852165222
Validation loss = 0.10599519312381744
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.47138479351997375
Validation loss = 0.14875748753547668
Validation loss = 0.11136766523122787
Validation loss = 0.09251855313777924
Validation loss = 0.0802629142999649
Validation loss = 0.08344921469688416
Validation loss = 0.07589425146579742
Validation loss = 0.08501997590065002
Validation loss = 0.0806889533996582
Validation loss = 0.07850223779678345
Validation loss = 0.08171655237674713
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.42794233560562134
Validation loss = 0.1556776911020279
Validation loss = 0.11316999793052673
Validation loss = 0.09238050878047943
Validation loss = 0.08075939118862152
Validation loss = 0.10004746913909912
Validation loss = 0.07955870032310486
Validation loss = 0.08447925001382828
Validation loss = 0.10641195625066757
Validation loss = 0.10791563242673874
Validation loss = 0.10320563614368439
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -430     |
| Iteration     | 0        |
| MaximumReturn | -325     |
| MinimumReturn | -554     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1420058161020279
Validation loss = 0.09031661599874496
Validation loss = 0.09517891705036163
Validation loss = 0.07281845808029175
Validation loss = 0.07749983668327332
Validation loss = 0.06798714399337769
Validation loss = 0.06882493942975998
Validation loss = 0.08488884568214417
Validation loss = 0.06529122591018677
Validation loss = 0.06672099977731705
Validation loss = 0.06564082205295563
Validation loss = 0.06299324333667755
Validation loss = 0.06281811743974686
Validation loss = 0.07201344519853592
Validation loss = 0.06450234353542328
Validation loss = 0.06273075938224792
Validation loss = 0.06807882338762283
Validation loss = 0.061522845178842545
Validation loss = 0.061429981142282486
Validation loss = 0.06077133119106293
Validation loss = 0.06654970347881317
Validation loss = 0.060591623187065125
Validation loss = 0.061958588659763336
Validation loss = 0.060971587896347046
Validation loss = 0.06002466380596161
Validation loss = 0.061944156885147095
Validation loss = 0.06035591661930084
Validation loss = 0.06629309058189392
Validation loss = 0.060246728360652924
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.143772155046463
Validation loss = 0.08919109404087067
Validation loss = 0.08059099316596985
Validation loss = 0.08239724487066269
Validation loss = 0.08892174810171127
Validation loss = 0.06850573420524597
Validation loss = 0.07552802562713623
Validation loss = 0.06972900032997131
Validation loss = 0.06594576686620712
Validation loss = 0.06560762971639633
Validation loss = 0.06506331264972687
Validation loss = 0.06506402045488358
Validation loss = 0.06560066342353821
Validation loss = 0.06619071215391159
Validation loss = 0.0734567940235138
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14330938458442688
Validation loss = 0.10038354992866516
Validation loss = 0.08770842850208282
Validation loss = 0.07818298041820526
Validation loss = 0.08600969612598419
Validation loss = 0.07446806877851486
Validation loss = 0.09095996618270874
Validation loss = 0.0709986686706543
Validation loss = 0.08276224136352539
Validation loss = 0.06437446177005768
Validation loss = 0.06304792314767838
Validation loss = 0.0825408399105072
Validation loss = 0.06520029902458191
Validation loss = 0.06446132063865662
Validation loss = 0.0761050209403038
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.13431861996650696
Validation loss = 0.0917338877916336
Validation loss = 0.08615415543317795
Validation loss = 0.07535410672426224
Validation loss = 0.07253329455852509
Validation loss = 0.0692247524857521
Validation loss = 0.07130786031484604
Validation loss = 0.06834201514720917
Validation loss = 0.0693691074848175
Validation loss = 0.06232298165559769
Validation loss = 0.06474332511425018
Validation loss = 0.0772174596786499
Validation loss = 0.061119552701711655
Validation loss = 0.06457700580358505
Validation loss = 0.06741105020046234
Validation loss = 0.061599135398864746
Validation loss = 0.06517581641674042
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14335541427135468
Validation loss = 0.09207697212696075
Validation loss = 0.0921962708234787
Validation loss = 0.08764594793319702
Validation loss = 0.0764542743563652
Validation loss = 0.07173649966716766
Validation loss = 0.06985227763652802
Validation loss = 0.07385306060314178
Validation loss = 0.0708349347114563
Validation loss = 0.06576555967330933
Validation loss = 0.06321752071380615
Validation loss = 0.06493619084358215
Validation loss = 0.06446840614080429
Validation loss = 0.06446540355682373
Validation loss = 0.06233439967036247
Validation loss = 0.06203846633434296
Validation loss = 0.07195974886417389
Validation loss = 0.06441479921340942
Validation loss = 0.06389071047306061
Validation loss = 0.06319665163755417
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -312     |
| Iteration     | 1        |
| MaximumReturn | -250     |
| MinimumReturn | -358     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08857817202806473
Validation loss = 0.06056421995162964
Validation loss = 0.06328430026769638
Validation loss = 0.058101192116737366
Validation loss = 0.06294333934783936
Validation loss = 0.058941591531038284
Validation loss = 0.05610080063343048
Validation loss = 0.05889667943120003
Validation loss = 0.061518874019384384
Validation loss = 0.06056253984570503
Validation loss = 0.06541728228330612
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09559794515371323
Validation loss = 0.06781882047653198
Validation loss = 0.06620562076568604
Validation loss = 0.06340084224939346
Validation loss = 0.058628007769584656
Validation loss = 0.060277026146650314
Validation loss = 0.05805754289031029
Validation loss = 0.05948064103722572
Validation loss = 0.05789588391780853
Validation loss = 0.0575612373650074
Validation loss = 0.05711619183421135
Validation loss = 0.060196131467819214
Validation loss = 0.06594984978437424
Validation loss = 0.06355922669172287
Validation loss = 0.0570821650326252
Validation loss = 0.054703887552022934
Validation loss = 0.05598350241780281
Validation loss = 0.06285388767719269
Validation loss = 0.06418590247631073
Validation loss = 0.054690275341272354
Validation loss = 0.058504506945610046
Validation loss = 0.055383455008268356
Validation loss = 0.05377081036567688
Validation loss = 0.05703084170818329
Validation loss = 0.05603856220841408
Validation loss = 0.05538481846451759
Validation loss = 0.05605093017220497
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08756926655769348
Validation loss = 0.06525005400180817
Validation loss = 0.06610678881406784
Validation loss = 0.0641426220536232
Validation loss = 0.06371593475341797
Validation loss = 0.0601261667907238
Validation loss = 0.058571260422468185
Validation loss = 0.06452486664056778
Validation loss = 0.06375723332166672
Validation loss = 0.060030195862054825
Validation loss = 0.05736244097352028
Validation loss = 0.0593089759349823
Validation loss = 0.05992380902171135
Validation loss = 0.05752637982368469
Validation loss = 0.05645820125937462
Validation loss = 0.057191599160432816
Validation loss = 0.05846785381436348
Validation loss = 0.06623753160238266
Validation loss = 0.055366456508636475
Validation loss = 0.054510608315467834
Validation loss = 0.05519227311015129
Validation loss = 0.053868431597948074
Validation loss = 0.05593984201550484
Validation loss = 0.058949679136276245
Validation loss = 0.05595479533076286
Validation loss = 0.061131060123443604
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08756989240646362
Validation loss = 0.0619547963142395
Validation loss = 0.06059731915593147
Validation loss = 0.07247588783502579
Validation loss = 0.05736606940627098
Validation loss = 0.060650989413261414
Validation loss = 0.06738416105508804
Validation loss = 0.05706363916397095
Validation loss = 0.06185893341898918
Validation loss = 0.06103212758898735
Validation loss = 0.05823317542672157
Validation loss = 0.05906461551785469
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0845978632569313
Validation loss = 0.06393492966890335
Validation loss = 0.061326224356889725
Validation loss = 0.05926470831036568
Validation loss = 0.06392738968133926
Validation loss = 0.06005890294909477
Validation loss = 0.06926821917295456
Validation loss = 0.057338450103998184
Validation loss = 0.057879384607076645
Validation loss = 0.057052452117204666
Validation loss = 0.05698445439338684
Validation loss = 0.05755188688635826
Validation loss = 0.059494972229003906
Validation loss = 0.0825754925608635
Validation loss = 0.059125158935785294
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -173     |
| Iteration     | 2        |
| MaximumReturn | -67.2    |
| MinimumReturn | -257     |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06545531749725342
Validation loss = 0.05599231272935867
Validation loss = 0.055250801146030426
Validation loss = 0.05743677169084549
Validation loss = 0.054403822869062424
Validation loss = 0.055437564849853516
Validation loss = 0.05521151423454285
Validation loss = 0.054112136363983154
Validation loss = 0.06328506767749786
Validation loss = 0.053613848984241486
Validation loss = 0.053487733006477356
Validation loss = 0.05692282319068909
Validation loss = 0.05347385257482529
Validation loss = 0.052175816148519516
Validation loss = 0.055137619376182556
Validation loss = 0.05577554181218147
Validation loss = 0.05251243710517883
Validation loss = 0.059261251240968704
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06659416109323502
Validation loss = 0.0575825497508049
Validation loss = 0.058435894548892975
Validation loss = 0.05754096806049347
Validation loss = 0.05466883257031441
Validation loss = 0.0586886927485466
Validation loss = 0.054702699184417725
Validation loss = 0.05500701442360878
Validation loss = 0.05357865244150162
Validation loss = 0.06087721139192581
Validation loss = 0.05274064093828201
Validation loss = 0.05246073007583618
Validation loss = 0.05163908004760742
Validation loss = 0.05309513211250305
Validation loss = 0.051823217421770096
Validation loss = 0.050830766558647156
Validation loss = 0.05087354779243469
Validation loss = 0.052769776433706284
Validation loss = 0.05578409507870674
Validation loss = 0.058664530515670776
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0697496086359024
Validation loss = 0.062493931502103806
Validation loss = 0.054905906319618225
Validation loss = 0.05577012896537781
Validation loss = 0.05608389526605606
Validation loss = 0.056275125592947006
Validation loss = 0.05525907874107361
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06347968429327011
Validation loss = 0.058020368218421936
Validation loss = 0.056496571749448776
Validation loss = 0.05458473414182663
Validation loss = 0.05646372586488724
Validation loss = 0.05595962330698967
Validation loss = 0.05459919571876526
Validation loss = 0.05744308605790138
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06783419102430344
Validation loss = 0.06103936955332756
Validation loss = 0.05783184617757797
Validation loss = 0.055125053972005844
Validation loss = 0.05786515772342682
Validation loss = 0.0602254644036293
Validation loss = 0.057463981211185455
Validation loss = 0.05433213338255882
Validation loss = 0.061996739357709885
Validation loss = 0.05504453927278519
Validation loss = 0.052619218826293945
Validation loss = 0.053940508514642715
Validation loss = 0.05753786116838455
Validation loss = 0.05666998028755188
Validation loss = 0.06097593903541565
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 274      |
| Iteration     | 3        |
| MaximumReturn | 425      |
| MinimumReturn | 35.7     |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07637621462345123
Validation loss = 0.06112536042928696
Validation loss = 0.05814790725708008
Validation loss = 0.05714137479662895
Validation loss = 0.058437664061784744
Validation loss = 0.06204710155725479
Validation loss = 0.05915641039609909
Validation loss = 0.05645793676376343
Validation loss = 0.05541352555155754
Validation loss = 0.055976103991270065
Validation loss = 0.056162405759096146
Validation loss = 0.05843828245997429
Validation loss = 0.05441506952047348
Validation loss = 0.05686189979314804
Validation loss = 0.05626797676086426
Validation loss = 0.05435352772474289
Validation loss = 0.05580120533704758
Validation loss = 0.057631004601716995
Validation loss = 0.05664343386888504
Validation loss = 0.05733758211135864
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0739356279373169
Validation loss = 0.05861612409353256
Validation loss = 0.05727582052350044
Validation loss = 0.0574687235057354
Validation loss = 0.05863545462489128
Validation loss = 0.06495489180088043
Validation loss = 0.05531274154782295
Validation loss = 0.0549258291721344
Validation loss = 0.059891484677791595
Validation loss = 0.05703514814376831
Validation loss = 0.05739027261734009
Validation loss = 0.058729421347379684
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07574374973773956
Validation loss = 0.06503602117300034
Validation loss = 0.058670639991760254
Validation loss = 0.05862177535891533
Validation loss = 0.06001724675297737
Validation loss = 0.061297811567783356
Validation loss = 0.05688551068305969
Validation loss = 0.0569579117000103
Validation loss = 0.059622518718242645
Validation loss = 0.05940818041563034
Validation loss = 0.057041771709918976
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0731341540813446
Validation loss = 0.06124448776245117
Validation loss = 0.05859243869781494
Validation loss = 0.05686958506703377
Validation loss = 0.05945976823568344
Validation loss = 0.05910320207476616
Validation loss = 0.056854408234357834
Validation loss = 0.05852416902780533
Validation loss = 0.05558948591351509
Validation loss = 0.05628800392150879
Validation loss = 0.05866573378443718
Validation loss = 0.05743752792477608
Validation loss = 0.05613412708044052
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07244779914617538
Validation loss = 0.06429128348827362
Validation loss = 0.06055770441889763
Validation loss = 0.06220334768295288
Validation loss = 0.06043628975749016
Validation loss = 0.05731982737779617
Validation loss = 0.05766535922884941
Validation loss = 0.05760693550109863
Validation loss = 0.06091571971774101
Validation loss = 0.06088525801897049
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 308      |
| Iteration     | 4        |
| MaximumReturn | 423      |
| MinimumReturn | -53.4    |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05620206892490387
Validation loss = 0.04995128512382507
Validation loss = 0.05288539454340935
Validation loss = 0.04862039163708687
Validation loss = 0.050994873046875
Validation loss = 0.04838348925113678
Validation loss = 0.05010329186916351
Validation loss = 0.04934096336364746
Validation loss = 0.048033300787210464
Validation loss = 0.04921023175120354
Validation loss = 0.049954500049352646
Validation loss = 0.049315083771944046
Validation loss = 0.055183347314596176
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.056354861706495285
Validation loss = 0.053640395402908325
Validation loss = 0.0507250539958477
Validation loss = 0.05045904591679573
Validation loss = 0.050427019596099854
Validation loss = 0.04929899051785469
Validation loss = 0.049025777727365494
Validation loss = 0.04996887221932411
Validation loss = 0.05027918890118599
Validation loss = 0.04939870536327362
Validation loss = 0.04927557706832886
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05641821399331093
Validation loss = 0.05175471305847168
Validation loss = 0.05720771476626396
Validation loss = 0.051747437566518784
Validation loss = 0.05251821503043175
Validation loss = 0.051283299922943115
Validation loss = 0.050714656710624695
Validation loss = 0.04997923597693443
Validation loss = 0.050299692898988724
Validation loss = 0.05261741206049919
Validation loss = 0.0496615506708622
Validation loss = 0.0495435856282711
Validation loss = 0.0506906658411026
Validation loss = 0.04964024946093559
Validation loss = 0.04866725578904152
Validation loss = 0.049912918359041214
Validation loss = 0.048879604786634445
Validation loss = 0.05296032503247261
Validation loss = 0.04824190214276314
Validation loss = 0.05084145441651344
Validation loss = 0.04781915247440338
Validation loss = 0.0522732175886631
Validation loss = 0.048609793186187744
Validation loss = 0.04981958866119385
Validation loss = 0.049273084849119186
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.057952046394348145
Validation loss = 0.05112626031041145
Validation loss = 0.055986855179071426
Validation loss = 0.05118192732334137
Validation loss = 0.04922713711857796
Validation loss = 0.049350202083587646
Validation loss = 0.052740201354026794
Validation loss = 0.053687307983636856
Validation loss = 0.053482070565223694
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05657471716403961
Validation loss = 0.05277112126350403
Validation loss = 0.05287271365523338
Validation loss = 0.05209803581237793
Validation loss = 0.052529096603393555
Validation loss = 0.05056511238217354
Validation loss = 0.05261300876736641
Validation loss = 0.050466399639844894
Validation loss = 0.05257510766386986
Validation loss = 0.05134214088320732
Validation loss = 0.049435656517744064
Validation loss = 0.05029262229800224
Validation loss = 0.05176277086138725
Validation loss = 0.04934394732117653
Validation loss = 0.04995347559452057
Validation loss = 0.05006028711795807
Validation loss = 0.05233834683895111
Validation loss = 0.054405998438596725
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 422      |
| Iteration     | 5        |
| MaximumReturn | 615      |
| MinimumReturn | 25.2     |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05311879515647888
Validation loss = 0.049871183931827545
Validation loss = 0.045342233031988144
Validation loss = 0.04657648131251335
Validation loss = 0.04557904228568077
Validation loss = 0.04819459468126297
Validation loss = 0.046652622520923615
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.052925992757081985
Validation loss = 0.04767391085624695
Validation loss = 0.04788584262132645
Validation loss = 0.04630029574036598
Validation loss = 0.04558524116873741
Validation loss = 0.046658098697662354
Validation loss = 0.04587724432349205
Validation loss = 0.045342423021793365
Validation loss = 0.04569181799888611
Validation loss = 0.047572556883096695
Validation loss = 0.04453941062092781
Validation loss = 0.04761543124914169
Validation loss = 0.046408962458372116
Validation loss = 0.04555950686335564
Validation loss = 0.04534285143017769
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.054860737174749374
Validation loss = 0.04660014063119888
Validation loss = 0.046604666858911514
Validation loss = 0.04610370472073555
Validation loss = 0.04584025964140892
Validation loss = 0.04771328344941139
Validation loss = 0.04585399106144905
Validation loss = 0.045565903186798096
Validation loss = 0.044876787811517715
Validation loss = 0.04609804227948189
Validation loss = 0.043712545186281204
Validation loss = 0.046825509518384933
Validation loss = 0.044167615473270416
Validation loss = 0.04383883997797966
Validation loss = 0.0444258451461792
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.056626398116350174
Validation loss = 0.04774865880608559
Validation loss = 0.047621604055166245
Validation loss = 0.04815366119146347
Validation loss = 0.04630651697516441
Validation loss = 0.04598531126976013
Validation loss = 0.046458158642053604
Validation loss = 0.04656047374010086
Validation loss = 0.046462975442409515
Validation loss = 0.04405851289629936
Validation loss = 0.04937878996133804
Validation loss = 0.04834164306521416
Validation loss = 0.04413721337914467
Validation loss = 0.04539924114942551
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05491924658417702
Validation loss = 0.04704857990145683
Validation loss = 0.047491807490587234
Validation loss = 0.050450172275304794
Validation loss = 0.04659821465611458
Validation loss = 0.04598008468747139
Validation loss = 0.04865698143839836
Validation loss = 0.04740266874432564
Validation loss = 0.04746798053383827
Validation loss = 0.046459224075078964
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 772      |
| Iteration     | 6        |
| MaximumReturn | 850      |
| MinimumReturn | 712      |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04330896586179733
Validation loss = 0.04319886118173599
Validation loss = 0.042292144149541855
Validation loss = 0.03984057158231735
Validation loss = 0.04308910295367241
Validation loss = 0.040283799171447754
Validation loss = 0.04088984429836273
Validation loss = 0.039419449865818024
Validation loss = 0.03910217806696892
Validation loss = 0.03832026571035385
Validation loss = 0.038806356489658356
Validation loss = 0.03806174546480179
Validation loss = 0.04131729155778885
Validation loss = 0.0417771115899086
Validation loss = 0.03827191889286041
Validation loss = 0.037349600344896317
Validation loss = 0.037241656333208084
Validation loss = 0.03799516707658768
Validation loss = 0.03796713799238205
Validation loss = 0.042942579835653305
Validation loss = 0.03654167801141739
Validation loss = 0.04008866846561432
Validation loss = 0.037672899663448334
Validation loss = 0.03607155382633209
Validation loss = 0.03540636599063873
Validation loss = 0.0374528132379055
Validation loss = 0.035872604697942734
Validation loss = 0.03692176938056946
Validation loss = 0.036898523569107056
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04427189379930496
Validation loss = 0.04249110817909241
Validation loss = 0.03993510454893112
Validation loss = 0.04004135727882385
Validation loss = 0.04132751375436783
Validation loss = 0.04071107506752014
Validation loss = 0.0394459143280983
Validation loss = 0.038698624819517136
Validation loss = 0.038683243095874786
Validation loss = 0.040381934493780136
Validation loss = 0.039404794573783875
Validation loss = 0.03881518542766571
Validation loss = 0.039211470633745193
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.046421442180871964
Validation loss = 0.0430915504693985
Validation loss = 0.03945181518793106
Validation loss = 0.03944360464811325
Validation loss = 0.03865775465965271
Validation loss = 0.04073156788945198
Validation loss = 0.0385279655456543
Validation loss = 0.038364775478839874
Validation loss = 0.038643911480903625
Validation loss = 0.03695128113031387
Validation loss = 0.03756037354469299
Validation loss = 0.03773723170161247
Validation loss = 0.03864486515522003
Validation loss = 0.037011951208114624
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04724077880382538
Validation loss = 0.04120878130197525
Validation loss = 0.04155883565545082
Validation loss = 0.0404851958155632
Validation loss = 0.03931713476777077
Validation loss = 0.03890277072787285
Validation loss = 0.039205171167850494
Validation loss = 0.03872611001133919
Validation loss = 0.03728862479329109
Validation loss = 0.03843752294778824
Validation loss = 0.03821022808551788
Validation loss = 0.037364087998867035
Validation loss = 0.03920877352356911
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.047555577009916306
Validation loss = 0.04349016398191452
Validation loss = 0.04388654977083206
Validation loss = 0.04127154499292374
Validation loss = 0.04249908775091171
Validation loss = 0.039829812943935394
Validation loss = 0.04037231579422951
Validation loss = 0.041686996817588806
Validation loss = 0.041918475180864334
Validation loss = 0.039599694311618805
Validation loss = 0.039640724658966064
Validation loss = 0.04242396727204323
Validation loss = 0.03868436440825462
Validation loss = 0.04256554692983627
Validation loss = 0.04009341076016426
Validation loss = 0.038717448711395264
Validation loss = 0.04215178266167641
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 710      |
| Iteration     | 7        |
| MaximumReturn | 1.07e+03 |
| MinimumReturn | -468     |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.039848387241363525
Validation loss = 0.03529132902622223
Validation loss = 0.03255366533994675
Validation loss = 0.03213034197688103
Validation loss = 0.03196144104003906
Validation loss = 0.03224751725792885
Validation loss = 0.03226419910788536
Validation loss = 0.033013708889484406
Validation loss = 0.035150978714227676
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04079630970954895
Validation loss = 0.036671023815870285
Validation loss = 0.03548532724380493
Validation loss = 0.03419317677617073
Validation loss = 0.03453252464532852
Validation loss = 0.03398517146706581
Validation loss = 0.03446332365274429
Validation loss = 0.035056617110967636
Validation loss = 0.033547479659318924
Validation loss = 0.03345663473010063
Validation loss = 0.032953694462776184
Validation loss = 0.03296661004424095
Validation loss = 0.03401261940598488
Validation loss = 0.03357832506299019
Validation loss = 0.03337729722261429
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.040397364646196365
Validation loss = 0.034505780786275864
Validation loss = 0.03350462019443512
Validation loss = 0.034964669495821
Validation loss = 0.03403721749782562
Validation loss = 0.03250886872410774
Validation loss = 0.03211786225438118
Validation loss = 0.032075610011816025
Validation loss = 0.03165696561336517
Validation loss = 0.03348157927393913
Validation loss = 0.03140817582607269
Validation loss = 0.031237768009305
Validation loss = 0.0296725332736969
Validation loss = 0.03053772822022438
Validation loss = 0.03074868582189083
Validation loss = 0.03278343379497528
Validation loss = 0.031038813292980194
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.041451554745435715
Validation loss = 0.03450711816549301
Validation loss = 0.03450334444642067
Validation loss = 0.036829624325037
Validation loss = 0.03609703481197357
Validation loss = 0.03336898609995842
Validation loss = 0.03298049047589302
Validation loss = 0.032997310161590576
Validation loss = 0.03310305252671242
Validation loss = 0.032189831137657166
Validation loss = 0.032353028655052185
Validation loss = 0.031069424003362656
Validation loss = 0.03080318681895733
Validation loss = 0.0315636582672596
Validation loss = 0.032418061047792435
Validation loss = 0.03381204232573509
Validation loss = 0.030284442007541656
Validation loss = 0.0322805792093277
Validation loss = 0.03021768480539322
Validation loss = 0.03001462109386921
Validation loss = 0.0297451950609684
Validation loss = 0.03281097114086151
Validation loss = 0.02944980375468731
Validation loss = 0.02866884134709835
Validation loss = 0.029456688091158867
Validation loss = 0.03043319471180439
Validation loss = 0.029811697080731392
Validation loss = 0.028458036482334137
Validation loss = 0.029810378327965736
Validation loss = 0.028523312881588936
Validation loss = 0.027815185487270355
Validation loss = 0.028658710420131683
Validation loss = 0.02823231928050518
Validation loss = 0.02805952914059162
Validation loss = 0.027602441608905792
Validation loss = 0.028923699632287025
Validation loss = 0.027557067573070526
Validation loss = 0.026337571442127228
Validation loss = 0.028133699670433998
Validation loss = 0.02753281407058239
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04122023656964302
Validation loss = 0.03707076236605644
Validation loss = 0.035909075289964676
Validation loss = 0.04034904018044472
Validation loss = 0.03648756444454193
Validation loss = 0.03523580729961395
Validation loss = 0.035340771079063416
Validation loss = 0.03424888476729393
Validation loss = 0.034621380269527435
Validation loss = 0.03445596620440483
Validation loss = 0.0343744158744812
Validation loss = 0.035512860864400864
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 887      |
| Iteration     | 8        |
| MaximumReturn | 973      |
| MinimumReturn | 790      |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03276306018233299
Validation loss = 0.029048144817352295
Validation loss = 0.028663918375968933
Validation loss = 0.028489261865615845
Validation loss = 0.028689924627542496
Validation loss = 0.028526807203888893
Validation loss = 0.029209474101662636
Validation loss = 0.02870924212038517
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03373141214251518
Validation loss = 0.02979782596230507
Validation loss = 0.028654813766479492
Validation loss = 0.028518658131361008
Validation loss = 0.027412274852395058
Validation loss = 0.027470245957374573
Validation loss = 0.0282448623329401
Validation loss = 0.02749570831656456
Validation loss = 0.02627127803862095
Validation loss = 0.026182562112808228
Validation loss = 0.02810472622513771
Validation loss = 0.027035599574446678
Validation loss = 0.026984259486198425
Validation loss = 0.029684901237487793
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.031039705500006676
Validation loss = 0.027326339855790138
Validation loss = 0.025993864983320236
Validation loss = 0.025891613215208054
Validation loss = 0.025964444503188133
Validation loss = 0.02470008097589016
Validation loss = 0.025952229276299477
Validation loss = 0.02480209991335869
Validation loss = 0.02568521536886692
Validation loss = 0.025754770264029503
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02900380827486515
Validation loss = 0.02434682846069336
Validation loss = 0.023600753396749496
Validation loss = 0.02406770922243595
Validation loss = 0.025028973817825317
Validation loss = 0.023621436208486557
Validation loss = 0.023358479142189026
Validation loss = 0.022819316014647484
Validation loss = 0.023922331631183624
Validation loss = 0.023518431931734085
Validation loss = 0.02347010187804699
Validation loss = 0.023171572014689445
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.035562995821237564
Validation loss = 0.030143480747938156
Validation loss = 0.03221648186445236
Validation loss = 0.029257962480187416
Validation loss = 0.028611432760953903
Validation loss = 0.030547067523002625
Validation loss = 0.031137000769376755
Validation loss = 0.0278057511895895
Validation loss = 0.027937058359384537
Validation loss = 0.027318770065903664
Validation loss = 0.025943104177713394
Validation loss = 0.02974764071404934
Validation loss = 0.027044599875807762
Validation loss = 0.02698882296681404
Validation loss = 0.025767389684915543
Validation loss = 0.027241701260209084
Validation loss = 0.02520838752388954
Validation loss = 0.026008570566773415
Validation loss = 0.025561723858118057
Validation loss = 0.024898532778024673
Validation loss = 0.027068238705396652
Validation loss = 0.024389145895838737
Validation loss = 0.02704966627061367
Validation loss = 0.024479355663061142
Validation loss = 0.024183744564652443
Validation loss = 0.027202913537621498
Validation loss = 0.023683417588472366
Validation loss = 0.02554035186767578
Validation loss = 0.023171763867139816
Validation loss = 0.024846766144037247
Validation loss = 0.024136608466506004
Validation loss = 0.026067251339554787
Validation loss = 0.023715440183877945
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 567      |
| Iteration     | 9        |
| MaximumReturn | 872      |
| MinimumReturn | -504     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03132379427552223
Validation loss = 0.0255186315625906
Validation loss = 0.02542349509894848
Validation loss = 0.02530921809375286
Validation loss = 0.023593442514538765
Validation loss = 0.02362978644669056
Validation loss = 0.02382044680416584
Validation loss = 0.024281656369566917
Validation loss = 0.02412610501050949
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02696778066456318
Validation loss = 0.026034535840153694
Validation loss = 0.02262754924595356
Validation loss = 0.023206066340208054
Validation loss = 0.022820448502898216
Validation loss = 0.023652292788028717
Validation loss = 0.023746544495224953
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0257854163646698
Validation loss = 0.022185174748301506
Validation loss = 0.023157238960266113
Validation loss = 0.022907597944140434
Validation loss = 0.02192441001534462
Validation loss = 0.02326536737382412
Validation loss = 0.02119353972375393
Validation loss = 0.02130867727100849
Validation loss = 0.021198224276304245
Validation loss = 0.021608324721455574
Validation loss = 0.022803789004683495
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025404231622815132
Validation loss = 0.021264227107167244
Validation loss = 0.021649254485964775
Validation loss = 0.020833734422922134
Validation loss = 0.020742012187838554
Validation loss = 0.02072087675333023
Validation loss = 0.020677713677287102
Validation loss = 0.021637728437781334
Validation loss = 0.023079954087734222
Validation loss = 0.020617766305804253
Validation loss = 0.021095572039484978
Validation loss = 0.020855380222201347
Validation loss = 0.020000489428639412
Validation loss = 0.0201412420719862
Validation loss = 0.02349197119474411
Validation loss = 0.01884632557630539
Validation loss = 0.019526338204741478
Validation loss = 0.020468464121222496
Validation loss = 0.018897302448749542
Validation loss = 0.020819462835788727
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024213707074522972
Validation loss = 0.02153540775179863
Validation loss = 0.021076824516057968
Validation loss = 0.022164657711982727
Validation loss = 0.021419255062937737
Validation loss = 0.02135218307375908
Validation loss = 0.02163473330438137
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 993      |
| Iteration     | 10       |
| MaximumReturn | 1.12e+03 |
| MinimumReturn | 785      |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025701725855469704
Validation loss = 0.0228746235370636
Validation loss = 0.02202228643000126
Validation loss = 0.02294393815100193
Validation loss = 0.020663417875766754
Validation loss = 0.024566182866692543
Validation loss = 0.020967459306120872
Validation loss = 0.020669689401984215
Validation loss = 0.021768882870674133
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02290923334658146
Validation loss = 0.021506384015083313
Validation loss = 0.021997669711709023
Validation loss = 0.021313101053237915
Validation loss = 0.02136058174073696
Validation loss = 0.02061093971133232
Validation loss = 0.020021310076117516
Validation loss = 0.02142784558236599
Validation loss = 0.01935645565390587
Validation loss = 0.020870720967650414
Validation loss = 0.018913792446255684
Validation loss = 0.019228508695960045
Validation loss = 0.020900225266814232
Validation loss = 0.01923668570816517
Validation loss = 0.01934427209198475
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023983662948012352
Validation loss = 0.019907450303435326
Validation loss = 0.020872140303254128
Validation loss = 0.02059401012957096
Validation loss = 0.02121012471616268
Validation loss = 0.02036222070455551
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021743105724453926
Validation loss = 0.018562346696853638
Validation loss = 0.018297774717211723
Validation loss = 0.018772510811686516
Validation loss = 0.018730251118540764
Validation loss = 0.018409043550491333
Validation loss = 0.018854839727282524
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022937746718525887
Validation loss = 0.020609481260180473
Validation loss = 0.021818270906805992
Validation loss = 0.019510887563228607
Validation loss = 0.0186361875385046
Validation loss = 0.019515298306941986
Validation loss = 0.0191320963203907
Validation loss = 0.018715379759669304
Validation loss = 0.018900111317634583
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 706      |
| Iteration     | 11       |
| MaximumReturn | 1.13e+03 |
| MinimumReturn | -537     |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024785887449979782
Validation loss = 0.02048278972506523
Validation loss = 0.019414184615015984
Validation loss = 0.020280135795474052
Validation loss = 0.019231000915169716
Validation loss = 0.01984507218003273
Validation loss = 0.019934194162487984
Validation loss = 0.019741395488381386
Validation loss = 0.020461708307266235
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022639978677034378
Validation loss = 0.01899617165327072
Validation loss = 0.01864093728363514
Validation loss = 0.01824985072016716
Validation loss = 0.018389342352747917
Validation loss = 0.018725579604506493
Validation loss = 0.01765286549925804
Validation loss = 0.018804265186190605
Validation loss = 0.018008992075920105
Validation loss = 0.018559712916612625
Validation loss = 0.017735078930854797
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023087028414011
Validation loss = 0.01833876222372055
Validation loss = 0.018394744023680687
Validation loss = 0.019406236708164215
Validation loss = 0.01931329257786274
Validation loss = 0.01752324588596821
Validation loss = 0.01825805753469467
Validation loss = 0.019742855802178383
Validation loss = 0.01804327592253685
Validation loss = 0.01825009100139141
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02089657261967659
Validation loss = 0.01767677254974842
Validation loss = 0.017626982182264328
Validation loss = 0.0179903544485569
Validation loss = 0.017373891547322273
Validation loss = 0.017420295625925064
Validation loss = 0.01696285605430603
Validation loss = 0.016841426491737366
Validation loss = 0.01682630553841591
Validation loss = 0.01654750481247902
Validation loss = 0.017554713413119316
Validation loss = 0.016715839505195618
Validation loss = 0.01688644476234913
Validation loss = 0.017143424600362778
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022735493257641792
Validation loss = 0.01888144202530384
Validation loss = 0.01844688132405281
Validation loss = 0.01853591576218605
Validation loss = 0.01765257678925991
Validation loss = 0.01763274520635605
Validation loss = 0.01896689087152481
Validation loss = 0.01863851770758629
Validation loss = 0.01790875941514969
Validation loss = 0.018686547875404358
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.04e+03 |
| Iteration     | 12       |
| MaximumReturn | 1.13e+03 |
| MinimumReturn | 830      |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023417457938194275
Validation loss = 0.017592428252100945
Validation loss = 0.01794448308646679
Validation loss = 0.018096936866641045
Validation loss = 0.01737787388265133
Validation loss = 0.01808725856244564
Validation loss = 0.018552839756011963
Validation loss = 0.017614776268601418
Validation loss = 0.017287952825427055
Validation loss = 0.017242107540369034
Validation loss = 0.01746659353375435
Validation loss = 0.01953599415719509
Validation loss = 0.017153922468423843
Validation loss = 0.01735537499189377
Validation loss = 0.017085151746869087
Validation loss = 0.017322208732366562
Validation loss = 0.017857780680060387
Validation loss = 0.01774638146162033
Validation loss = 0.01625042036175728
Validation loss = 0.016918232664465904
Validation loss = 0.01628517545759678
Validation loss = 0.017083652317523956
Validation loss = 0.017482943832874298
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01918622851371765
Validation loss = 0.017634417861700058
Validation loss = 0.01732724718749523
Validation loss = 0.016636881977319717
Validation loss = 0.01759091392159462
Validation loss = 0.019314201548695564
Validation loss = 0.016008509323000908
Validation loss = 0.01947571523487568
Validation loss = 0.017792342230677605
Validation loss = 0.01592313125729561
Validation loss = 0.016970571130514145
Validation loss = 0.01767480932176113
Validation loss = 0.016056127846240997
Validation loss = 0.01625991053879261
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019136523827910423
Validation loss = 0.017931589856743813
Validation loss = 0.018296467140316963
Validation loss = 0.01725492998957634
Validation loss = 0.01891384646296501
Validation loss = 0.016912024468183517
Validation loss = 0.017124682664871216
Validation loss = 0.017082197591662407
Validation loss = 0.017308807000517845
Validation loss = 0.017461024224758148
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017240110784769058
Validation loss = 0.016353348270058632
Validation loss = 0.016011865809559822
Validation loss = 0.01612240821123123
Validation loss = 0.018648339435458183
Validation loss = 0.015783848240971565
Validation loss = 0.016705883666872978
Validation loss = 0.01679718866944313
Validation loss = 0.015581267885863781
Validation loss = 0.016516070812940598
Validation loss = 0.01530696451663971
Validation loss = 0.015868324786424637
Validation loss = 0.015368728898465633
Validation loss = 0.015125720761716366
Validation loss = 0.01624455861747265
Validation loss = 0.014925345778465271
Validation loss = 0.015820344910025597
Validation loss = 0.015212601982057095
Validation loss = 0.01562451757490635
Validation loss = 0.0165802501142025
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0196676068007946
Validation loss = 0.017258014529943466
Validation loss = 0.016610585153102875
Validation loss = 0.017596350982785225
Validation loss = 0.016317246481776237
Validation loss = 0.01708107255399227
Validation loss = 0.0167059488594532
Validation loss = 0.016780877485871315
Validation loss = 0.01750180497765541
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.03e+03 |
| Iteration     | 13       |
| MaximumReturn | 1.11e+03 |
| MinimumReturn | 917      |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017746804282069206
Validation loss = 0.015323624014854431
Validation loss = 0.01605592481791973
Validation loss = 0.01549901906400919
Validation loss = 0.015809135511517525
Validation loss = 0.015520349144935608
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01731468364596367
Validation loss = 0.01572420634329319
Validation loss = 0.015738362446427345
Validation loss = 0.016453517600893974
Validation loss = 0.01568697579205036
Validation loss = 0.016194326803088188
Validation loss = 0.016063643619418144
Validation loss = 0.01564544439315796
Validation loss = 0.016154639422893524
Validation loss = 0.015716658905148506
Validation loss = 0.015568469651043415
Validation loss = 0.014961093664169312
Validation loss = 0.01451642345637083
Validation loss = 0.0155342947691679
Validation loss = 0.01482770498842001
Validation loss = 0.016032304614782333
Validation loss = 0.014649695716798306
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018382303416728973
Validation loss = 0.017051521688699722
Validation loss = 0.01663190871477127
Validation loss = 0.015838736668229103
Validation loss = 0.015771519392728806
Validation loss = 0.015537681989371777
Validation loss = 0.016062749549746513
Validation loss = 0.015320422127842903
Validation loss = 0.015125539153814316
Validation loss = 0.015560656785964966
Validation loss = 0.01662389189004898
Validation loss = 0.015610733069479465
Validation loss = 0.015073179267346859
Validation loss = 0.01568666473031044
Validation loss = 0.015515444800257683
Validation loss = 0.015674570575356483
Validation loss = 0.0156534593552351
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01682623289525509
Validation loss = 0.015137321315705776
Validation loss = 0.015378241427242756
Validation loss = 0.014717833139002323
Validation loss = 0.014043235220015049
Validation loss = 0.015088937245309353
Validation loss = 0.015246883034706116
Validation loss = 0.01476738415658474
Validation loss = 0.014750326052308083
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017838243395090103
Validation loss = 0.016399597749114037
Validation loss = 0.015468685887753963
Validation loss = 0.01715683937072754
Validation loss = 0.015344856306910515
Validation loss = 0.016042284667491913
Validation loss = 0.016172342002391815
Validation loss = 0.014955624006688595
Validation loss = 0.014811939559876919
Validation loss = 0.015890561044216156
Validation loss = 0.01589726097881794
Validation loss = 0.015065548941493034
Validation loss = 0.015158182941377163
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.09e+03 |
| Iteration     | 14       |
| MaximumReturn | 1.2e+03  |
| MinimumReturn | 938      |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016663458198308945
Validation loss = 0.016266942024230957
Validation loss = 0.015315458178520203
Validation loss = 0.015857387334108353
Validation loss = 0.014823485165834427
Validation loss = 0.015344091691076756
Validation loss = 0.015034837648272514
Validation loss = 0.01551300473511219
Validation loss = 0.014822844415903091
Validation loss = 0.014668331481516361
Validation loss = 0.014563968405127525
Validation loss = 0.014737937599420547
Validation loss = 0.014479856938123703
Validation loss = 0.01492680236697197
Validation loss = 0.015683943405747414
Validation loss = 0.014858522452414036
Validation loss = 0.014153365045785904
Validation loss = 0.014457572251558304
Validation loss = 0.015079047530889511
Validation loss = 0.014769177883863449
Validation loss = 0.015059500932693481
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016091346740722656
Validation loss = 0.014585165306925774
Validation loss = 0.014348959550261497
Validation loss = 0.01555687841027975
Validation loss = 0.014580488204956055
Validation loss = 0.014237643219530582
Validation loss = 0.013939151540398598
Validation loss = 0.01514546386897564
Validation loss = 0.01463257148861885
Validation loss = 0.015379801392555237
Validation loss = 0.014837783761322498
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016045456752181053
Validation loss = 0.015519361011683941
Validation loss = 0.015584571287035942
Validation loss = 0.01553254947066307
Validation loss = 0.016320563852787018
Validation loss = 0.014674191363155842
Validation loss = 0.014926599338650703
Validation loss = 0.014280613511800766
Validation loss = 0.01500664185732603
Validation loss = 0.014347529038786888
Validation loss = 0.014545260928571224
Validation loss = 0.01424841582775116
Validation loss = 0.014199282042682171
Validation loss = 0.014615604653954506
Validation loss = 0.01409130822867155
Validation loss = 0.014880916103720665
Validation loss = 0.014086891897022724
Validation loss = 0.01473386213183403
Validation loss = 0.014866058714687824
Validation loss = 0.014417183585464954
Validation loss = 0.01375920232385397
Validation loss = 0.014001613482832909
Validation loss = 0.01453479565680027
Validation loss = 0.01374340895563364
Validation loss = 0.014365709386765957
Validation loss = 0.014050351455807686
Validation loss = 0.014866489917039871
Validation loss = 0.013828347437083721
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016246609389781952
Validation loss = 0.013745836913585663
Validation loss = 0.013673720881342888
Validation loss = 0.013751835562288761
Validation loss = 0.014776544645428658
Validation loss = 0.013945169746875763
Validation loss = 0.014899730682373047
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016570985317230225
Validation loss = 0.014719175174832344
Validation loss = 0.015824750065803528
Validation loss = 0.01463028322905302
Validation loss = 0.014498473145067692
Validation loss = 0.014343230053782463
Validation loss = 0.014437943696975708
Validation loss = 0.014198219403624535
Validation loss = 0.01438923366367817
Validation loss = 0.015900898724794388
Validation loss = 0.014147518202662468
Validation loss = 0.01409176830202341
Validation loss = 0.014319362118840218
Validation loss = 0.01461717113852501
Validation loss = 0.014195572584867477
Validation loss = 0.013782701455056667
Validation loss = 0.014199243858456612
Validation loss = 0.014242907986044884
Validation loss = 0.014175642281770706
Validation loss = 0.014063418842852116
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.08e+03 |
| Iteration     | 15       |
| MaximumReturn | 1.18e+03 |
| MinimumReturn | 973      |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015231690369546413
Validation loss = 0.014120875857770443
Validation loss = 0.013830102980136871
Validation loss = 0.014073538593947887
Validation loss = 0.013346530497074127
Validation loss = 0.014192987233400345
Validation loss = 0.014371148310601711
Validation loss = 0.013052411377429962
Validation loss = 0.013455506414175034
Validation loss = 0.013789000920951366
Validation loss = 0.013023816049098969
Validation loss = 0.013732592575252056
Validation loss = 0.013091746717691422
Validation loss = 0.014686087146401405
Validation loss = 0.013635856099426746
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014593008905649185
Validation loss = 0.014446428045630455
Validation loss = 0.013472714461386204
Validation loss = 0.01398082822561264
Validation loss = 0.012972734868526459
Validation loss = 0.01471103634685278
Validation loss = 0.013419190421700478
Validation loss = 0.013026503846049309
Validation loss = 0.013386647216975689
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014933844096958637
Validation loss = 0.013096757233142853
Validation loss = 0.013585299253463745
Validation loss = 0.014038658700883389
Validation loss = 0.013729979284107685
Validation loss = 0.013177793473005295
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01427278108894825
Validation loss = 0.013152837753295898
Validation loss = 0.01360915694385767
Validation loss = 0.013224281370639801
Validation loss = 0.013849464245140553
Validation loss = 0.013542013242840767
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014765151776373386
Validation loss = 0.013602250255644321
Validation loss = 0.013518399558961391
Validation loss = 0.01307948213070631
Validation loss = 0.014109314419329166
Validation loss = 0.01341680996119976
Validation loss = 0.013415678404271603
Validation loss = 0.012885835953056812
Validation loss = 0.014031625352799892
Validation loss = 0.013944321312010288
Validation loss = 0.012833567336201668
Validation loss = 0.013183362782001495
Validation loss = 0.01308014988899231
Validation loss = 0.014343278482556343
Validation loss = 0.01298485416918993
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.04e+03 |
| Iteration     | 16       |
| MaximumReturn | 1.15e+03 |
| MinimumReturn | 971      |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014309823513031006
Validation loss = 0.013565246015787125
Validation loss = 0.012350298464298248
Validation loss = 0.0131113575771451
Validation loss = 0.012849807739257812
Validation loss = 0.0131538026034832
Validation loss = 0.01305859163403511
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015489445999264717
Validation loss = 0.013491795398294926
Validation loss = 0.013251692056655884
Validation loss = 0.01249483972787857
Validation loss = 0.013085776939988136
Validation loss = 0.012648864649236202
Validation loss = 0.013524402864277363
Validation loss = 0.013297534547746181
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01436745747923851
Validation loss = 0.012823901139199734
Validation loss = 0.013560265302658081
Validation loss = 0.012981528416275978
Validation loss = 0.013027617707848549
Validation loss = 0.013528134673833847
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015682101249694824
Validation loss = 0.013021637685596943
Validation loss = 0.013036571443080902
Validation loss = 0.013104395009577274
Validation loss = 0.012769694440066814
Validation loss = 0.013376062735915184
Validation loss = 0.013101537711918354
Validation loss = 0.012862218543887138
Validation loss = 0.013625959865748882
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013392441906034946
Validation loss = 0.012806093320250511
Validation loss = 0.012577186338603497
Validation loss = 0.012800129130482674
Validation loss = 0.012538452632725239
Validation loss = 0.012865387834608555
Validation loss = 0.012871168553829193
Validation loss = 0.012423443607985973
Validation loss = 0.012547791935503483
Validation loss = 0.013064200058579445
Validation loss = 0.012676623649895191
Validation loss = 0.011913146823644638
Validation loss = 0.012555962428450584
Validation loss = 0.012791827321052551
Validation loss = 0.01308906078338623
Validation loss = 0.012344448827207088
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 855      |
| Iteration     | 17       |
| MaximumReturn | 1.25e+03 |
| MinimumReturn | -312     |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014748780056834221
Validation loss = 0.01269681565463543
Validation loss = 0.012618274427950382
Validation loss = 0.013746705837547779
Validation loss = 0.012211364693939686
Validation loss = 0.01275396067649126
Validation loss = 0.01232328824698925
Validation loss = 0.012065966613590717
Validation loss = 0.012034378945827484
Validation loss = 0.01293954998254776
Validation loss = 0.012442141771316528
Validation loss = 0.011760571040213108
Validation loss = 0.012047716416418552
Validation loss = 0.01257161982357502
Validation loss = 0.01178715843707323
Validation loss = 0.011640784330666065
Validation loss = 0.012313303537666798
Validation loss = 0.012299003079533577
Validation loss = 0.013479012064635754
Validation loss = 0.012107818387448788
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014645968563854694
Validation loss = 0.012997527606785297
Validation loss = 0.012262057512998581
Validation loss = 0.012993989512324333
Validation loss = 0.011767895892262459
Validation loss = 0.012169711291790009
Validation loss = 0.011854832991957664
Validation loss = 0.012579116970300674
Validation loss = 0.012386147864162922
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013768771663308144
Validation loss = 0.012441311031579971
Validation loss = 0.012394165620207787
Validation loss = 0.013046705164015293
Validation loss = 0.012358222156763077
Validation loss = 0.012619647197425365
Validation loss = 0.012533625587821007
Validation loss = 0.0119759701192379
Validation loss = 0.012920665554702282
Validation loss = 0.011772424913942814
Validation loss = 0.012050124816596508
Validation loss = 0.011845259927213192
Validation loss = 0.011636059731245041
Validation loss = 0.01240642461925745
Validation loss = 0.012134530581533909
Validation loss = 0.011651341803371906
Validation loss = 0.012885135598480701
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01425978820770979
Validation loss = 0.01257838774472475
Validation loss = 0.011999237351119518
Validation loss = 0.012943306006491184
Validation loss = 0.01274721696972847
Validation loss = 0.012467180378735065
Validation loss = 0.01232951134443283
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014262935146689415
Validation loss = 0.011980261653661728
Validation loss = 0.011986292898654938
Validation loss = 0.011798735707998276
Validation loss = 0.011784439906477928
Validation loss = 0.011301231570541859
Validation loss = 0.011773884296417236
Validation loss = 0.011541268788278103
Validation loss = 0.011923248879611492
Validation loss = 0.01137753576040268
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.13e+03 |
| Iteration     | 18       |
| MaximumReturn | 1.27e+03 |
| MinimumReturn | 1.05e+03 |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01310834288597107
Validation loss = 0.011735843494534492
Validation loss = 0.01144549623131752
Validation loss = 0.011874291114509106
Validation loss = 0.01177259162068367
Validation loss = 0.011325456202030182
Validation loss = 0.011154609732329845
Validation loss = 0.011736462824046612
Validation loss = 0.011332543566823006
Validation loss = 0.013356429524719715
Validation loss = 0.011334666982293129
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012658996507525444
Validation loss = 0.011676657013595104
Validation loss = 0.012257037684321404
Validation loss = 0.011583268642425537
Validation loss = 0.012346120551228523
Validation loss = 0.011892089620232582
Validation loss = 0.012043334543704987
Validation loss = 0.011601057834923267
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012105635367333889
Validation loss = 0.01216665469110012
Validation loss = 0.011495404876768589
Validation loss = 0.011583038605749607
Validation loss = 0.012379243969917297
Validation loss = 0.011957479640841484
Validation loss = 0.011266718618571758
Validation loss = 0.011324738152325153
Validation loss = 0.012056615203619003
Validation loss = 0.01105815265327692
Validation loss = 0.011516569182276726
Validation loss = 0.011241043917834759
Validation loss = 0.011718407273292542
Validation loss = 0.011202674359083176
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012637150473892689
Validation loss = 0.01209279429167509
Validation loss = 0.012443318963050842
Validation loss = 0.012095851823687553
Validation loss = 0.012305374257266521
Validation loss = 0.011636080220341682
Validation loss = 0.012555603869259357
Validation loss = 0.01190844178199768
Validation loss = 0.011542028747498989
Validation loss = 0.011415132321417332
Validation loss = 0.011460808105766773
Validation loss = 0.01146378368139267
Validation loss = 0.011976359412074089
Validation loss = 0.011563405394554138
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012317285872995853
Validation loss = 0.01177862100303173
Validation loss = 0.01173699926584959
Validation loss = 0.011117463931441307
Validation loss = 0.011175107210874557
Validation loss = 0.011296135373413563
Validation loss = 0.011937522329390049
Validation loss = 0.011163132265210152
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 911      |
| Iteration     | 19       |
| MaximumReturn | 1.27e+03 |
| MinimumReturn | -317     |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012350604869425297
Validation loss = 0.011036614887416363
Validation loss = 0.011321616359055042
Validation loss = 0.010814545676112175
Validation loss = 0.010969452559947968
Validation loss = 0.011001607403159142
Validation loss = 0.010704782791435719
Validation loss = 0.010806790553033352
Validation loss = 0.01117798499763012
Validation loss = 0.010940395295619965
Validation loss = 0.010916815139353275
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012199665419757366
Validation loss = 0.011463540606200695
Validation loss = 0.011550597846508026
Validation loss = 0.011083816178143024
Validation loss = 0.01107032597064972
Validation loss = 0.01098939124494791
Validation loss = 0.011321376077830791
Validation loss = 0.011610527522861958
Validation loss = 0.011300559155642986
Validation loss = 0.011828974820673466
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012738273479044437
Validation loss = 0.010771690867841244
Validation loss = 0.010771606117486954
Validation loss = 0.011561509221792221
Validation loss = 0.010491975583136082
Validation loss = 0.010788281448185444
Validation loss = 0.010495847091078758
Validation loss = 0.011654330417513847
Validation loss = 0.010802011936903
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011929653584957123
Validation loss = 0.01134341862052679
Validation loss = 0.010750793851912022
Validation loss = 0.010951909236609936
Validation loss = 0.010738345794379711
Validation loss = 0.010918140411376953
Validation loss = 0.010977904312312603
Validation loss = 0.011803559958934784
Validation loss = 0.01130978949368
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012055388651788235
Validation loss = 0.010798302479088306
Validation loss = 0.010945796966552734
Validation loss = 0.010978074744343758
Validation loss = 0.011026309803128242
Validation loss = 0.010780686512589455
Validation loss = 0.01147291250526905
Validation loss = 0.011317811906337738
Validation loss = 0.010979309678077698
Validation loss = 0.010730786249041557
Validation loss = 0.010112874209880829
Validation loss = 0.010308469645678997
Validation loss = 0.01075773872435093
Validation loss = 0.010418117977678776
Validation loss = 0.01146693341434002
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.14e+03 |
| Iteration     | 20       |
| MaximumReturn | 1.29e+03 |
| MinimumReturn | 970      |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010705852881073952
Validation loss = 0.010432769544422626
Validation loss = 0.010606463998556137
Validation loss = 0.010649686679244041
Validation loss = 0.010712513700127602
Validation loss = 0.010340500622987747
Validation loss = 0.010651400312781334
Validation loss = 0.011205594055354595
Validation loss = 0.010742307640612125
Validation loss = 0.010500616393983364
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011020905338227749
Validation loss = 0.011071941815316677
Validation loss = 0.010671673342585564
Validation loss = 0.011562054976820946
Validation loss = 0.011000948026776314
Validation loss = 0.01077178493142128
Validation loss = 0.01043707225471735
Validation loss = 0.010466986335814
Validation loss = 0.010554094798862934
Validation loss = 0.010530197992920876
Validation loss = 0.010572093538939953
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010726475156843662
Validation loss = 0.010537541471421719
Validation loss = 0.010776140727102757
Validation loss = 0.010156068950891495
Validation loss = 0.010390552692115307
Validation loss = 0.010386246256530285
Validation loss = 0.01017607282847166
Validation loss = 0.010165197774767876
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010927938856184483
Validation loss = 0.010745310224592686
Validation loss = 0.010271179489791393
Validation loss = 0.010428003035485744
Validation loss = 0.010826809331774712
Validation loss = 0.010678804479539394
Validation loss = 0.010516873560845852
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011011450551450253
Validation loss = 0.010404028929769993
Validation loss = 0.010381183587014675
Validation loss = 0.010452180169522762
Validation loss = 0.00977485254406929
Validation loss = 0.010287566110491753
Validation loss = 0.01024990901350975
Validation loss = 0.009693333879113197
Validation loss = 0.01019278820604086
Validation loss = 0.010390044189989567
Validation loss = 0.00984465517103672
Validation loss = 0.01151109579950571
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.09e+03 |
| Iteration     | 21       |
| MaximumReturn | 1.26e+03 |
| MinimumReturn | 975      |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01051719393581152
Validation loss = 0.010047716088593006
Validation loss = 0.010989096015691757
Validation loss = 0.00968972034752369
Validation loss = 0.00971057265996933
Validation loss = 0.009744888171553612
Validation loss = 0.009550794959068298
Validation loss = 0.009541803039610386
Validation loss = 0.009976095519959927
Validation loss = 0.010062850080430508
Validation loss = 0.009530285373330116
Validation loss = 0.010092517361044884
Validation loss = 0.010135937482118607
Validation loss = 0.009874553419649601
Validation loss = 0.01019627507776022
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0107521191239357
Validation loss = 0.010902497917413712
Validation loss = 0.010431202128529549
Validation loss = 0.010502560064196587
Validation loss = 0.01009115856140852
Validation loss = 0.010458224453032017
Validation loss = 0.009670785628259182
Validation loss = 0.009625655598938465
Validation loss = 0.009970301762223244
Validation loss = 0.010018154047429562
Validation loss = 0.009802520275115967
Validation loss = 0.00985442753881216
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010421760380268097
Validation loss = 0.009844598360359669
Validation loss = 0.009724737145006657
Validation loss = 0.009833714924752712
Validation loss = 0.009553888812661171
Validation loss = 0.009920294396579266
Validation loss = 0.01019633375108242
Validation loss = 0.00995881948620081
Validation loss = 0.009667991660535336
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010942763648927212
Validation loss = 0.010963321663439274
Validation loss = 0.010282495990395546
Validation loss = 0.010116376914083958
Validation loss = 0.010902156122028828
Validation loss = 0.010206817649304867
Validation loss = 0.010571787133812904
Validation loss = 0.010241582058370113
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010585165582597256
Validation loss = 0.00948190689086914
Validation loss = 0.009916289709508419
Validation loss = 0.00966106541454792
Validation loss = 0.009671884588897228
Validation loss = 0.010500766336917877
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.2e+03  |
| Iteration     | 22       |
| MaximumReturn | 1.32e+03 |
| MinimumReturn | 1.05e+03 |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010709994472563267
Validation loss = 0.009594671428203583
Validation loss = 0.01016212347894907
Validation loss = 0.00990621279925108
Validation loss = 0.009345605038106441
Validation loss = 0.009523951448500156
Validation loss = 0.009592405520379543
Validation loss = 0.009157904423773289
Validation loss = 0.009639316238462925
Validation loss = 0.009667170234024525
Validation loss = 0.00977045577019453
Validation loss = 0.009311756119132042
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010302883572876453
Validation loss = 0.009205030277371407
Validation loss = 0.009814003482460976
Validation loss = 0.009767073206603527
Validation loss = 0.01050237100571394
Validation loss = 0.009719640016555786
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010286816395819187
Validation loss = 0.009766127914190292
Validation loss = 0.009888623841106892
Validation loss = 0.009604785591363907
Validation loss = 0.00945365708321333
Validation loss = 0.010116485878825188
Validation loss = 0.00977353099733591
Validation loss = 0.00978139415383339
Validation loss = 0.009626810438930988
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010455877520143986
Validation loss = 0.010078485123813152
Validation loss = 0.00985549483448267
Validation loss = 0.010618488304316998
Validation loss = 0.009647190570831299
Validation loss = 0.010310865007340908
Validation loss = 0.009859784506261349
Validation loss = 0.010143651627004147
Validation loss = 0.009977743960916996
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011163483373820782
Validation loss = 0.009860535152256489
Validation loss = 0.009291796013712883
Validation loss = 0.010023645125329494
Validation loss = 0.009535255841910839
Validation loss = 0.009764297865331173
Validation loss = 0.00900379940867424
Validation loss = 0.009855235926806927
Validation loss = 0.009302902966737747
Validation loss = 0.009143233299255371
Validation loss = 0.008953067474067211
Validation loss = 0.009634683839976788
Validation loss = 0.009431286714971066
Validation loss = 0.009282256476581097
Validation loss = 0.009424518793821335
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.1e+03  |
| Iteration     | 23       |
| MaximumReturn | 1.16e+03 |
| MinimumReturn | 945      |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010462877340614796
Validation loss = 0.008959462866187096
Validation loss = 0.009912554174661636
Validation loss = 0.009145106188952923
Validation loss = 0.009370087645947933
Validation loss = 0.010246774181723595
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010176599957048893
Validation loss = 0.010063770227134228
Validation loss = 0.009131387807428837
Validation loss = 0.009545885026454926
Validation loss = 0.009787374176084995
Validation loss = 0.009299240075051785
Validation loss = 0.009297857992351055
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010067974217236042
Validation loss = 0.008985470980405807
Validation loss = 0.009634818881750107
Validation loss = 0.009756592102348804
Validation loss = 0.009375721216201782
Validation loss = 0.009672821499407291
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009747573174536228
Validation loss = 0.009769720956683159
Validation loss = 0.00998876616358757
Validation loss = 0.00948390830308199
Validation loss = 0.010049977339804173
Validation loss = 0.00940839946269989
Validation loss = 0.00926203466951847
Validation loss = 0.009714780375361443
Validation loss = 0.009240081533789635
Validation loss = 0.009836587123572826
Validation loss = 0.009391258470714092
Validation loss = 0.009311181493103504
Validation loss = 0.009398754686117172
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009388045407831669
Validation loss = 0.009026680141687393
Validation loss = 0.008830946870148182
Validation loss = 0.009174151346087456
Validation loss = 0.009505977854132652
Validation loss = 0.010020583868026733
Validation loss = 0.00929045956581831
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.09e+03 |
| Iteration     | 24       |
| MaximumReturn | 1.22e+03 |
| MinimumReturn | 936      |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009513070806860924
Validation loss = 0.009155956096947193
Validation loss = 0.009396810084581375
Validation loss = 0.008749600499868393
Validation loss = 0.009114766493439674
Validation loss = 0.008789273910224438
Validation loss = 0.009263603016734123
Validation loss = 0.0089860949665308
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009972384199500084
Validation loss = 0.009080804884433746
Validation loss = 0.00923837348818779
Validation loss = 0.009255020879209042
Validation loss = 0.009068617597222328
Validation loss = 0.009064930491149426
Validation loss = 0.00939480122178793
Validation loss = 0.009007605724036694
Validation loss = 0.009936382994055748
Validation loss = 0.008579417131841183
Validation loss = 0.008958266116678715
Validation loss = 0.008982561528682709
Validation loss = 0.008836441673338413
Validation loss = 0.00918580498546362
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010567869991064072
Validation loss = 0.00916594173759222
Validation loss = 0.009110669605433941
Validation loss = 0.009229355491697788
Validation loss = 0.009027399122714996
Validation loss = 0.009247008711099625
Validation loss = 0.008846231736242771
Validation loss = 0.009340321645140648
Validation loss = 0.008969919756054878
Validation loss = 0.00926453061401844
Validation loss = 0.009131046943366528
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009761134162545204
Validation loss = 0.00922312680631876
Validation loss = 0.009463503025472164
Validation loss = 0.00929499976336956
Validation loss = 0.010015169158577919
Validation loss = 0.009156370535492897
Validation loss = 0.00933001097291708
Validation loss = 0.00953226163983345
Validation loss = 0.009261742234230042
Validation loss = 0.00916216243058443
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009295772761106491
Validation loss = 0.009044351056218147
Validation loss = 0.008793780580163002
Validation loss = 0.00852789543569088
Validation loss = 0.009199642576277256
Validation loss = 0.009694220498204231
Validation loss = 0.008947995491325855
Validation loss = 0.009130777791142464
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.12e+03 |
| Iteration     | 25       |
| MaximumReturn | 1.19e+03 |
| MinimumReturn | 1.01e+03 |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009054898284375668
Validation loss = 0.00855353381484747
Validation loss = 0.009267904795706272
Validation loss = 0.009049436077475548
Validation loss = 0.008759732358157635
Validation loss = 0.00878759566694498
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009027168154716492
Validation loss = 0.00910315290093422
Validation loss = 0.009245587512850761
Validation loss = 0.00892635341733694
Validation loss = 0.008771413005888462
Validation loss = 0.008795746602118015
Validation loss = 0.008581473492085934
Validation loss = 0.00862195435911417
Validation loss = 0.009022055193781853
Validation loss = 0.008825050666928291
Validation loss = 0.008872329257428646
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009178276173770428
Validation loss = 0.008865602314472198
Validation loss = 0.008892214857041836
Validation loss = 0.009068062528967857
Validation loss = 0.00883369892835617
Validation loss = 0.009153719060122967
Validation loss = 0.008919890969991684
Validation loss = 0.009457395412027836
Validation loss = 0.009662366472184658
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008821839466691017
Validation loss = 0.009340249001979828
Validation loss = 0.00932877417653799
Validation loss = 0.008600208908319473
Validation loss = 0.008333257399499416
Validation loss = 0.008944503031671047
Validation loss = 0.009529461152851582
Validation loss = 0.008917628787457943
Validation loss = 0.008525358512997627
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00905113760381937
Validation loss = 0.00855737179517746
Validation loss = 0.008756226859986782
Validation loss = 0.00941429752856493
Validation loss = 0.008405343629419804
Validation loss = 0.008492776192724705
Validation loss = 0.008748968131840229
Validation loss = 0.008506820537149906
Validation loss = 0.008427671156823635
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.2e+03  |
| Iteration     | 26       |
| MaximumReturn | 1.23e+03 |
| MinimumReturn | 1.16e+03 |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009240984916687012
Validation loss = 0.00874664168804884
Validation loss = 0.00897134281694889
Validation loss = 0.008224544115364552
Validation loss = 0.008823735639452934
Validation loss = 0.008066997863352299
Validation loss = 0.008555525913834572
Validation loss = 0.008527704514563084
Validation loss = 0.009197273291647434
Validation loss = 0.008409331552684307
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008864901028573513
Validation loss = 0.00865965522825718
Validation loss = 0.008498131297528744
Validation loss = 0.008454887196421623
Validation loss = 0.00845976360142231
Validation loss = 0.008576767519116402
Validation loss = 0.008549157530069351
Validation loss = 0.008419893682003021
Validation loss = 0.008435972034931183
Validation loss = 0.008606666699051857
Validation loss = 0.008108505979180336
Validation loss = 0.008312354795634747
Validation loss = 0.008465911261737347
Validation loss = 0.008077798411250114
Validation loss = 0.008573800325393677
Validation loss = 0.008218281902372837
Validation loss = 0.008101665414869785
Validation loss = 0.008360428735613823
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009137014858424664
Validation loss = 0.009272662922739983
Validation loss = 0.008396884426474571
Validation loss = 0.008571862243115902
Validation loss = 0.008459236472845078
Validation loss = 0.008746153675019741
Validation loss = 0.00852169282734394
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009105033241212368
Validation loss = 0.00846106093376875
Validation loss = 0.008938394486904144
Validation loss = 0.00859567616134882
Validation loss = 0.008728758431971073
Validation loss = 0.008744110353291035
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009377628564834595
Validation loss = 0.008916071616113186
Validation loss = 0.008594579063355923
Validation loss = 0.008873479440808296
Validation loss = 0.008447025902569294
Validation loss = 0.008281656540930271
Validation loss = 0.008706788532435894
Validation loss = 0.008443155325949192
Validation loss = 0.008734260685741901
Validation loss = 0.008020998910069466
Validation loss = 0.00792220514267683
Validation loss = 0.008436543866991997
Validation loss = 0.008443593047559261
Validation loss = 0.008398660458624363
Validation loss = 0.008304431103169918
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.18e+03 |
| Iteration     | 27       |
| MaximumReturn | 1.25e+03 |
| MinimumReturn | 1.14e+03 |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008621112443506718
Validation loss = 0.008771346881985664
Validation loss = 0.008585053496062756
Validation loss = 0.008445722982287407
Validation loss = 0.008814721368253231
Validation loss = 0.008510252460837364
Validation loss = 0.008202902972698212
Validation loss = 0.008901458233594894
Validation loss = 0.007888929918408394
Validation loss = 0.008913441561162472
Validation loss = 0.00790114514529705
Validation loss = 0.008811281062662601
Validation loss = 0.008136626332998276
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008588298223912716
Validation loss = 0.00816694088280201
Validation loss = 0.008167200721800327
Validation loss = 0.008088392205536366
Validation loss = 0.007992398925125599
Validation loss = 0.007958355359733105
Validation loss = 0.008316618390381336
Validation loss = 0.008091404102742672
Validation loss = 0.008878200314939022
Validation loss = 0.00816242303699255
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008341877721250057
Validation loss = 0.008556346409022808
Validation loss = 0.008645780384540558
Validation loss = 0.008486395701766014
Validation loss = 0.008575093001127243
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008400341495871544
Validation loss = 0.008842739276587963
Validation loss = 0.008323226124048233
Validation loss = 0.008197220973670483
Validation loss = 0.009009606204926968
Validation loss = 0.00894254632294178
Validation loss = 0.008479687385261059
Validation loss = 0.008302683010697365
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008539039641618729
Validation loss = 0.008215425536036491
Validation loss = 0.008196976035833359
Validation loss = 0.008285338059067726
Validation loss = 0.007918083108961582
Validation loss = 0.008231259882450104
Validation loss = 0.008558329194784164
Validation loss = 0.007974040694534779
Validation loss = 0.008392740041017532
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.11e+03 |
| Iteration     | 28       |
| MaximumReturn | 1.15e+03 |
| MinimumReturn | 1.05e+03 |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008225949481129646
Validation loss = 0.007961736992001534
Validation loss = 0.008505641482770443
Validation loss = 0.008089246228337288
Validation loss = 0.008334058336913586
Validation loss = 0.008393456228077412
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008046968840062618
Validation loss = 0.008179624564945698
Validation loss = 0.00857793353497982
Validation loss = 0.008066346868872643
Validation loss = 0.008341304026544094
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008485609665513039
Validation loss = 0.008607462979853153
Validation loss = 0.009629414416849613
Validation loss = 0.008151495829224586
Validation loss = 0.008178341202437878
Validation loss = 0.008106979541480541
Validation loss = 0.00849744863808155
Validation loss = 0.00823364406824112
Validation loss = 0.008113060146570206
Validation loss = 0.00800372939556837
Validation loss = 0.008313728496432304
Validation loss = 0.008480017073452473
Validation loss = 0.008210591971874237
Validation loss = 0.008309886790812016
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008645218797028065
Validation loss = 0.008318399079144001
Validation loss = 0.008654151111841202
Validation loss = 0.00802088063210249
Validation loss = 0.00809870008379221
Validation loss = 0.00812553334981203
Validation loss = 0.008845272473990917
Validation loss = 0.008373136632144451
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008623067289590836
Validation loss = 0.007944597862660885
Validation loss = 0.007897884584963322
Validation loss = 0.007975143380463123
Validation loss = 0.007970375008881092
Validation loss = 0.008410240523517132
Validation loss = 0.008322922512888908
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.14e+03 |
| Iteration     | 29       |
| MaximumReturn | 1.22e+03 |
| MinimumReturn | 950      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00846344605088234
Validation loss = 0.008127205073833466
Validation loss = 0.007969892583787441
Validation loss = 0.008054407313466072
Validation loss = 0.00823910441249609
Validation loss = 0.008428561501204967
Validation loss = 0.008151744492352009
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008321967907249928
Validation loss = 0.008140265010297298
Validation loss = 0.008178891614079475
Validation loss = 0.007925369776785374
Validation loss = 0.007645224221050739
Validation loss = 0.008001633919775486
Validation loss = 0.008277444168925285
Validation loss = 0.007524190470576286
Validation loss = 0.00790040660649538
Validation loss = 0.008062286302447319
Validation loss = 0.007652053143829107
Validation loss = 0.007969295606017113
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009079433977603912
Validation loss = 0.008820994757115841
Validation loss = 0.008121675811707973
Validation loss = 0.008402416482567787
Validation loss = 0.007994530722498894
Validation loss = 0.008589772507548332
Validation loss = 0.00839738268405199
Validation loss = 0.008479923941195011
Validation loss = 0.008087139576673508
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008873852901160717
Validation loss = 0.008071827702224255
Validation loss = 0.00818119291216135
Validation loss = 0.007799739018082619
Validation loss = 0.007891681976616383
Validation loss = 0.007990892045199871
Validation loss = 0.008195807226002216
Validation loss = 0.008310146629810333
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008411487564444542
Validation loss = 0.0080590033903718
Validation loss = 0.0074869864620268345
Validation loss = 0.007930577732622623
Validation loss = 0.00817744992673397
Validation loss = 0.007702990900725126
Validation loss = 0.008060031570494175
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.19e+03 |
| Iteration     | 30       |
| MaximumReturn | 1.25e+03 |
| MinimumReturn | 1.12e+03 |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008111163973808289
Validation loss = 0.007973989471793175
Validation loss = 0.00835278257727623
Validation loss = 0.008074872195720673
Validation loss = 0.00823446549475193
Validation loss = 0.007787293754518032
Validation loss = 0.007885511964559555
Validation loss = 0.007696988061070442
Validation loss = 0.0074658929370343685
Validation loss = 0.00803627260029316
Validation loss = 0.008170191198587418
Validation loss = 0.007585586979985237
Validation loss = 0.007502515334635973
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008046664297580719
Validation loss = 0.00781763345003128
Validation loss = 0.007653901819139719
Validation loss = 0.007543097250163555
Validation loss = 0.00791809894144535
Validation loss = 0.007862414233386517
Validation loss = 0.008151810616254807
Validation loss = 0.007946500554680824
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008699212223291397
Validation loss = 0.008056798949837685
Validation loss = 0.00780507130548358
Validation loss = 0.00828723143786192
Validation loss = 0.007971495389938354
Validation loss = 0.007763626053929329
Validation loss = 0.007933763787150383
Validation loss = 0.008132554590702057
Validation loss = 0.008039461448788643
Validation loss = 0.008295871317386627
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008512972854077816
Validation loss = 0.008229712024331093
Validation loss = 0.0077822888270020485
Validation loss = 0.007935280911624432
Validation loss = 0.00810931995511055
Validation loss = 0.007785391993820667
Validation loss = 0.007680065464228392
Validation loss = 0.007741185836493969
Validation loss = 0.0075781033374369144
Validation loss = 0.007794731296598911
Validation loss = 0.007776347920298576
Validation loss = 0.007919006049633026
Validation loss = 0.007537792436778545
Validation loss = 0.007768237963318825
Validation loss = 0.0075919185765087605
Validation loss = 0.00773710198700428
Validation loss = 0.007706076838076115
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007803357206285
Validation loss = 0.007795017212629318
Validation loss = 0.008210823871195316
Validation loss = 0.007911109365522861
Validation loss = 0.007890061475336552
Validation loss = 0.008120937272906303
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.19e+03 |
| Iteration     | 31       |
| MaximumReturn | 1.29e+03 |
| MinimumReturn | 1.08e+03 |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008264824748039246
Validation loss = 0.007490372750908136
Validation loss = 0.007847107946872711
Validation loss = 0.007749649696052074
Validation loss = 0.007683622185140848
Validation loss = 0.0077447351068258286
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007574304938316345
Validation loss = 0.007447386160492897
Validation loss = 0.007937866263091564
Validation loss = 0.00755689200013876
Validation loss = 0.007991572842001915
Validation loss = 0.008491035550832748
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00803071353584528
Validation loss = 0.007617555093020201
Validation loss = 0.008448347449302673
Validation loss = 0.00818542204797268
Validation loss = 0.00784215610474348
Validation loss = 0.007922591641545296
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007619386538863182
Validation loss = 0.007507578004151583
Validation loss = 0.007811582647264004
Validation loss = 0.007801556959748268
Validation loss = 0.007663952652364969
Validation loss = 0.0077600497752428055
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008048953488469124
Validation loss = 0.007848002947866917
Validation loss = 0.007624586578458548
Validation loss = 0.0075156898237764835
Validation loss = 0.008105031214654446
Validation loss = 0.0076677375473082066
Validation loss = 0.007662118412554264
Validation loss = 0.007731880992650986
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.22e+03 |
| Iteration     | 32       |
| MaximumReturn | 1.29e+03 |
| MinimumReturn | 1.14e+03 |
| TotalSamples  | 136000   |
----------------------------
