Logging to experiments/gym_fswimmer/S/Wed-02-Nov-2022-04-21-47-PM-CDT_gym_fswimmer_trpo_iteration_20_seed2312
Print configuration .....
{'env_name': 'gym_fswimmer', 'random_seeds': [2312, 1231, 2631, 5543], 'save_variables': False, 'model_save_dir': '/tmp/gym_fswimmer_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 200, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.42169123888015747
Validation loss = 0.1687835156917572
Validation loss = 0.11583241820335388
Validation loss = 0.09174427390098572
Validation loss = 0.07151280343532562
Validation loss = 0.0680202916264534
Validation loss = 0.06725679337978363
Validation loss = 0.0718962699174881
Validation loss = 0.08011339604854584
Validation loss = 0.060640715062618256
Validation loss = 0.0695711225271225
Validation loss = 0.06669117510318756
Validation loss = 0.06382398307323456
Validation loss = 0.06025451421737671
Validation loss = 0.05924898386001587
Validation loss = 0.06262124329805374
Validation loss = 0.061651624739170074
Validation loss = 0.059414997696876526
Validation loss = 0.06482472270727158
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.42212823033332825
Validation loss = 0.17228847742080688
Validation loss = 0.11006469279527664
Validation loss = 0.08425284177064896
Validation loss = 0.074641652405262
Validation loss = 0.06890144944190979
Validation loss = 0.0651588886976242
Validation loss = 0.06865571439266205
Validation loss = 0.06335628032684326
Validation loss = 0.06682319939136505
Validation loss = 0.06908387690782547
Validation loss = 0.0685681402683258
Validation loss = 0.06068947911262512
Validation loss = 0.0588398277759552
Validation loss = 0.07037019729614258
Validation loss = 0.05894576758146286
Validation loss = 0.06994056701660156
Validation loss = 0.06240961700677872
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3723387122154236
Validation loss = 0.16418126225471497
Validation loss = 0.10522596538066864
Validation loss = 0.0820855051279068
Validation loss = 0.07425558567047119
Validation loss = 0.07014533877372742
Validation loss = 0.07294220477342606
Validation loss = 0.06623408198356628
Validation loss = 0.06347152590751648
Validation loss = 0.06239601969718933
Validation loss = 0.061940748244524
Validation loss = 0.0593067929148674
Validation loss = 0.06380725651979446
Validation loss = 0.06302032619714737
Validation loss = 0.07252290844917297
Validation loss = 0.06227752938866615
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3750009536743164
Validation loss = 0.16336506605148315
Validation loss = 0.10090085864067078
Validation loss = 0.0803937166929245
Validation loss = 0.07217484712600708
Validation loss = 0.07090841233730316
Validation loss = 0.06920009851455688
Validation loss = 0.06232088804244995
Validation loss = 0.06369494646787643
Validation loss = 0.06082150712609291
Validation loss = 0.06498764455318451
Validation loss = 0.061458006501197815
Validation loss = 0.05796850472688675
Validation loss = 0.06097998097538948
Validation loss = 0.062253594398498535
Validation loss = 0.059992119669914246
Validation loss = 0.05585799738764763
Validation loss = 0.06768602877855301
Validation loss = 0.06042037904262543
Validation loss = 0.06747680902481079
Validation loss = 0.060095690190792084
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3322365880012512
Validation loss = 0.156558096408844
Validation loss = 0.09348537027835846
Validation loss = 0.07548531144857407
Validation loss = 0.07203522324562073
Validation loss = 0.06606022268533707
Validation loss = 0.06829078495502472
Validation loss = 0.06502166390419006
Validation loss = 0.06131865084171295
Validation loss = 0.0694662481546402
Validation loss = 0.0631881058216095
Validation loss = 0.06160816550254822
Validation loss = 0.06099782511591911
Validation loss = 0.06247361749410629
Validation loss = 0.06019626930356026
Validation loss = 0.05704222992062569
Validation loss = 0.060164742171764374
Validation loss = 0.0620446503162384
Validation loss = 0.05706140398979187
Validation loss = 0.05624189227819443
Validation loss = 0.0636139065027237
Validation loss = 0.0693412572145462
Validation loss = 0.0562167763710022
Validation loss = 0.0588834322988987
Validation loss = 0.05570532754063606
Validation loss = 0.05581129342317581
Validation loss = 0.05463748425245285
Validation loss = 0.05332775041460991
Validation loss = 0.058563970029354095
Validation loss = 0.06421968340873718
Validation loss = 0.052458979189395905
Validation loss = 0.05416146293282509
Validation loss = 0.053558968007564545
Validation loss = 0.05786411464214325
Validation loss = 0.05965518206357956
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 82.7     |
| Iteration     | 0        |
| MaximumReturn | 94.1     |
| MinimumReturn | 71.6     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12409226596355438
Validation loss = 0.03525564819574356
Validation loss = 0.02707703411579132
Validation loss = 0.024175945669412613
Validation loss = 0.023514680564403534
Validation loss = 0.021506953984498978
Validation loss = 0.021923378109931946
Validation loss = 0.02038237266242504
Validation loss = 0.024031680077314377
Validation loss = 0.019986359402537346
Validation loss = 0.01863926649093628
Validation loss = 0.019828330725431442
Validation loss = 0.020267849788069725
Validation loss = 0.01873236708343029
Validation loss = 0.01983536407351494
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09048549085855484
Validation loss = 0.03350859880447388
Validation loss = 0.02802327834069729
Validation loss = 0.0248740054666996
Validation loss = 0.021370356902480125
Validation loss = 0.025570042431354523
Validation loss = 0.02197570540010929
Validation loss = 0.02166145294904709
Validation loss = 0.020005052909255028
Validation loss = 0.020211897790431976
Validation loss = 0.02029603160917759
Validation loss = 0.020265163853764534
Validation loss = 0.020843056961894035
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09221260249614716
Validation loss = 0.032511062920093536
Validation loss = 0.02688063308596611
Validation loss = 0.024496670812368393
Validation loss = 0.023134322836995125
Validation loss = 0.0265949834138155
Validation loss = 0.021999098360538483
Validation loss = 0.02319430001080036
Validation loss = 0.023583583533763885
Validation loss = 0.02137460745871067
Validation loss = 0.021649951115250587
Validation loss = 0.021094880998134613
Validation loss = 0.01953536830842495
Validation loss = 0.019784966483712196
Validation loss = 0.019491370767354965
Validation loss = 0.022817935794591904
Validation loss = 0.018647225573658943
Validation loss = 0.01841776631772518
Validation loss = 0.018171587958931923
Validation loss = 0.02353951707482338
Validation loss = 0.0192050002515316
Validation loss = 0.017283985391259193
Validation loss = 0.01846960373222828
Validation loss = 0.017695270478725433
Validation loss = 0.018396703526377678
Validation loss = 0.01773560792207718
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1135861873626709
Validation loss = 0.03211922198534012
Validation loss = 0.02598905935883522
Validation loss = 0.025320924818515778
Validation loss = 0.022611089050769806
Validation loss = 0.02101769484579563
Validation loss = 0.025041703134775162
Validation loss = 0.0202512014657259
Validation loss = 0.021015267819166183
Validation loss = 0.019887616857886314
Validation loss = 0.024910561740398407
Validation loss = 0.018360059708356857
Validation loss = 0.019641760736703873
Validation loss = 0.019187815487384796
Validation loss = 0.01929423213005066
Validation loss = 0.01905176416039467
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09842835366725922
Validation loss = 0.030626017600297928
Validation loss = 0.025404641404747963
Validation loss = 0.022623181343078613
Validation loss = 0.020769622176885605
Validation loss = 0.02067405916750431
Validation loss = 0.02106860652565956
Validation loss = 0.021410439163446426
Validation loss = 0.020982349291443825
Validation loss = 0.02368733286857605
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 193      |
| Iteration     | 1        |
| MaximumReturn | 197      |
| MinimumReturn | 188      |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02711843140423298
Validation loss = 0.012629386968910694
Validation loss = 0.012229067273437977
Validation loss = 0.012466903775930405
Validation loss = 0.01193808764219284
Validation loss = 0.012491732835769653
Validation loss = 0.01715720258653164
Validation loss = 0.01243132259696722
Validation loss = 0.011597797274589539
Validation loss = 0.011775908060371876
Validation loss = 0.01187038142234087
Validation loss = 0.013168477453291416
Validation loss = 0.011854205280542374
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03694237396121025
Validation loss = 0.013156970031559467
Validation loss = 0.011966713704168797
Validation loss = 0.011831319890916348
Validation loss = 0.014621441252529621
Validation loss = 0.012451059184968472
Validation loss = 0.014571390114724636
Validation loss = 0.011848542839288712
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024945465847849846
Validation loss = 0.01258272584527731
Validation loss = 0.01274606678634882
Validation loss = 0.012133070267736912
Validation loss = 0.011845110915601254
Validation loss = 0.011331426911056042
Validation loss = 0.011735636740922928
Validation loss = 0.013010541908442974
Validation loss = 0.011074117384850979
Validation loss = 0.011615753173828125
Validation loss = 0.011435332708060741
Validation loss = 0.019810715690255165
Validation loss = 0.011376791633665562
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024722130969166756
Validation loss = 0.012639620341360569
Validation loss = 0.012239824049174786
Validation loss = 0.012469208799302578
Validation loss = 0.013018753379583359
Validation loss = 0.01314589474350214
Validation loss = 0.014899492263793945
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020721666514873505
Validation loss = 0.012896145693957806
Validation loss = 0.014896159060299397
Validation loss = 0.014800559729337692
Validation loss = 0.013882850296795368
Validation loss = 0.01627443917095661
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 216      |
| Iteration     | 2        |
| MaximumReturn | 220      |
| MinimumReturn | 209      |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016783148050308228
Validation loss = 0.009397691115736961
Validation loss = 0.008810197934508324
Validation loss = 0.00909650418907404
Validation loss = 0.010061550885438919
Validation loss = 0.01061644684523344
Validation loss = 0.010126795619726181
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018190352246165276
Validation loss = 0.0097971735522151
Validation loss = 0.010732898488640785
Validation loss = 0.009481451474130154
Validation loss = 0.010609359480440617
Validation loss = 0.010869636200368404
Validation loss = 0.00881966296583414
Validation loss = 0.009284690953791142
Validation loss = 0.009338246658444405
Validation loss = 0.009524394758045673
Validation loss = 0.012381535954773426
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01993066631257534
Validation loss = 0.008300547488033772
Validation loss = 0.00963281374424696
Validation loss = 0.008731480687856674
Validation loss = 0.008915385231375694
Validation loss = 0.008651657029986382
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018329307436943054
Validation loss = 0.008762766607105732
Validation loss = 0.0090969642624259
Validation loss = 0.010513663291931152
Validation loss = 0.008605931885540485
Validation loss = 0.01036466471850872
Validation loss = 0.009396588429808617
Validation loss = 0.01213047280907631
Validation loss = 0.011177709326148033
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015814032405614853
Validation loss = 0.010294337756931782
Validation loss = 0.009677749127149582
Validation loss = 0.009214743971824646
Validation loss = 0.012921803630888462
Validation loss = 0.009466119110584259
Validation loss = 0.010068530216813087
Validation loss = 0.009481925517320633
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 212      |
| Iteration     | 3        |
| MaximumReturn | 217      |
| MinimumReturn | 208      |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008822480216622353
Validation loss = 0.00815268699079752
Validation loss = 0.007952236570417881
Validation loss = 0.008149432949721813
Validation loss = 0.00678801815956831
Validation loss = 0.007091847714036703
Validation loss = 0.008153151720762253
Validation loss = 0.0064484067261219025
Validation loss = 0.007880360819399357
Validation loss = 0.007733053062111139
Validation loss = 0.007691769860684872
Validation loss = 0.009060631506145
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011418286710977554
Validation loss = 0.0071080951020121574
Validation loss = 0.007586723659187555
Validation loss = 0.007670760154724121
Validation loss = 0.007000441662967205
Validation loss = 0.008482194505631924
Validation loss = 0.007185193710029125
Validation loss = 0.008832158520817757
Validation loss = 0.007301262114197016
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01323466282337904
Validation loss = 0.007881415076553822
Validation loss = 0.008495290763676167
Validation loss = 0.0074316514655947685
Validation loss = 0.00865087192505598
Validation loss = 0.008231870830059052
Validation loss = 0.007281499914824963
Validation loss = 0.006994152907282114
Validation loss = 0.00822632759809494
Validation loss = 0.007262921426445246
Validation loss = 0.007897535338997841
Validation loss = 0.007678103633224964
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011333966627717018
Validation loss = 0.007711066864430904
Validation loss = 0.00841638445854187
Validation loss = 0.008569639176130295
Validation loss = 0.01046687364578247
Validation loss = 0.007490742020308971
Validation loss = 0.007308623753488064
Validation loss = 0.007628811988979578
Validation loss = 0.007763420231640339
Validation loss = 0.007965920493006706
Validation loss = 0.006993906106799841
Validation loss = 0.008583199232816696
Validation loss = 0.007655817084014416
Validation loss = 0.00821098405867815
Validation loss = 0.00791174080222845
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011462047696113586
Validation loss = 0.010333556681871414
Validation loss = 0.007726685609668493
Validation loss = 0.008101229555904865
Validation loss = 0.007656185422092676
Validation loss = 0.00857790932059288
Validation loss = 0.008990393951535225
Validation loss = 0.008006533607840538
Validation loss = 0.007796579506248236
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 253      |
| Iteration     | 4        |
| MaximumReturn | 255      |
| MinimumReturn | 251      |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0065018776804208755
Validation loss = 0.00585295632481575
Validation loss = 0.008369196206331253
Validation loss = 0.005829402711242437
Validation loss = 0.006275747437030077
Validation loss = 0.005479064304381609
Validation loss = 0.006245298311114311
Validation loss = 0.005666214507073164
Validation loss = 0.005559902172535658
Validation loss = 0.006180793046951294
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006255191285163164
Validation loss = 0.008757334202528
Validation loss = 0.006090259645134211
Validation loss = 0.00656174635514617
Validation loss = 0.005974122788757086
Validation loss = 0.00582140265032649
Validation loss = 0.005637235939502716
Validation loss = 0.006503841374069452
Validation loss = 0.005870815832167864
Validation loss = 0.006488789338618517
Validation loss = 0.005762210115790367
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007995317690074444
Validation loss = 0.005714112892746925
Validation loss = 0.006764556746929884
Validation loss = 0.005771928932517767
Validation loss = 0.005954094231128693
Validation loss = 0.005427882075309753
Validation loss = 0.006688686087727547
Validation loss = 0.00701209856197238
Validation loss = 0.006176972761750221
Validation loss = 0.0061707329005002975
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006759431678801775
Validation loss = 0.006177206058055162
Validation loss = 0.0062824091874063015
Validation loss = 0.006904760375618935
Validation loss = 0.007250494789332151
Validation loss = 0.006118317600339651
Validation loss = 0.006664695218205452
Validation loss = 0.005458803381770849
Validation loss = 0.006416501943022013
Validation loss = 0.007858588360249996
Validation loss = 0.006324378773570061
Validation loss = 0.006160999182611704
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006856827065348625
Validation loss = 0.006888435687869787
Validation loss = 0.00592607818543911
Validation loss = 0.008420814760029316
Validation loss = 0.007562937214970589
Validation loss = 0.005912222433835268
Validation loss = 0.00575672322884202
Validation loss = 0.009157602675259113
Validation loss = 0.005647440906614065
Validation loss = 0.006545349955558777
Validation loss = 0.005550250876694918
Validation loss = 0.006514927837997675
Validation loss = 0.006278588902205229
Validation loss = 0.007006113883107901
Validation loss = 0.006823012605309486
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 262      |
| Iteration     | 5        |
| MaximumReturn | 266      |
| MinimumReturn | 259      |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0064287190325558186
Validation loss = 0.004906523507088423
Validation loss = 0.004711870104074478
Validation loss = 0.005454630590975285
Validation loss = 0.005042817909270525
Validation loss = 0.0050204298458993435
Validation loss = 0.0050277067348361015
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005493897013366222
Validation loss = 0.0062255822122097015
Validation loss = 0.005625411868095398
Validation loss = 0.005064703524112701
Validation loss = 0.005177432205528021
Validation loss = 0.005503136198967695
Validation loss = 0.005055165383964777
Validation loss = 0.004583257716149092
Validation loss = 0.0057287439703941345
Validation loss = 0.005738584790378809
Validation loss = 0.005708231590688229
Validation loss = 0.0057847811840474606
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005705034825950861
Validation loss = 0.005066052079200745
Validation loss = 0.005775696597993374
Validation loss = 0.0058327773585915565
Validation loss = 0.00682948250323534
Validation loss = 0.004274404142051935
Validation loss = 0.004737939685583115
Validation loss = 0.005036614835262299
Validation loss = 0.005090479273349047
Validation loss = 0.005172900855541229
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005808647722005844
Validation loss = 0.005019334610551596
Validation loss = 0.004856594372540712
Validation loss = 0.006377516780048609
Validation loss = 0.004940065089613199
Validation loss = 0.005568929016590118
Validation loss = 0.005102890077978373
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005242520477622747
Validation loss = 0.005882870405912399
Validation loss = 0.005382518749684095
Validation loss = 0.005815199110656977
Validation loss = 0.005243617109954357
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 284      |
| Iteration     | 6        |
| MaximumReturn | 287      |
| MinimumReturn | 280      |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004109456669539213
Validation loss = 0.003926081117242575
Validation loss = 0.004240028560161591
Validation loss = 0.00429571233689785
Validation loss = 0.007159267086535692
Validation loss = 0.007531147450208664
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006470853462815285
Validation loss = 0.004596638958901167
Validation loss = 0.003990802448242903
Validation loss = 0.00458518648520112
Validation loss = 0.004141123034060001
Validation loss = 0.00432652747258544
Validation loss = 0.004131703171879053
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004562172573059797
Validation loss = 0.004837825894355774
Validation loss = 0.004012643825262785
Validation loss = 0.004922240972518921
Validation loss = 0.004097884986549616
Validation loss = 0.004243292845785618
Validation loss = 0.004739316180348396
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004892381839454174
Validation loss = 0.005217364989221096
Validation loss = 0.005111283157020807
Validation loss = 0.004577012732625008
Validation loss = 0.004051650874316692
Validation loss = 0.0048700422048568726
Validation loss = 0.004920182283967733
Validation loss = 0.004327600821852684
Validation loss = 0.004356214310973883
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005037739872932434
Validation loss = 0.004434187430888414
Validation loss = 0.004625823348760605
Validation loss = 0.003984143491834402
Validation loss = 0.008818148635327816
Validation loss = 0.0047484831884503365
Validation loss = 0.004309061914682388
Validation loss = 0.004524522926658392
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 278      |
| Iteration     | 7        |
| MaximumReturn | 282      |
| MinimumReturn | 273      |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004141421057283878
Validation loss = 0.0034567746333777905
Validation loss = 0.003661210648715496
Validation loss = 0.003949249163269997
Validation loss = 0.004517662804573774
Validation loss = 0.004607927519828081
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0036614350974559784
Validation loss = 0.004314703866839409
Validation loss = 0.004128741100430489
Validation loss = 0.0036366793792694807
Validation loss = 0.0045930733904242516
Validation loss = 0.0037700666580349207
Validation loss = 0.004307590890675783
Validation loss = 0.004008346702903509
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004383197985589504
Validation loss = 0.0036230299156159163
Validation loss = 0.0037910693790763617
Validation loss = 0.0032981345430016518
Validation loss = 0.003167030168697238
Validation loss = 0.003392196958884597
Validation loss = 0.0038406220264732838
Validation loss = 0.0037557976320385933
Validation loss = 0.0043840184807777405
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004527214448899031
Validation loss = 0.0038884426467120647
Validation loss = 0.0038082587998360395
Validation loss = 0.004001148510724306
Validation loss = 0.0037553226575255394
Validation loss = 0.003279902972280979
Validation loss = 0.003860351862385869
Validation loss = 0.0046087163500487804
Validation loss = 0.004516562446951866
Validation loss = 0.003972734324634075
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003417268628254533
Validation loss = 0.005369487218558788
Validation loss = 0.0032762212213128805
Validation loss = 0.0043492126278579235
Validation loss = 0.0037200707010924816
Validation loss = 0.0038181671407073736
Validation loss = 0.003580779302865267
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 274      |
| Iteration     | 8        |
| MaximumReturn | 276      |
| MinimumReturn | 271      |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0036995895206928253
Validation loss = 0.003320761024951935
Validation loss = 0.0035492151509970427
Validation loss = 0.0034802905283868313
Validation loss = 0.0039580524899065495
Validation loss = 0.003946457989513874
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003765834029763937
Validation loss = 0.0034122844226658344
Validation loss = 0.00446844007819891
Validation loss = 0.0033361990936100483
Validation loss = 0.0031859143637120724
Validation loss = 0.003099463414400816
Validation loss = 0.004123953636735678
Validation loss = 0.0034450560342520475
Validation loss = 0.003423805581405759
Validation loss = 0.0029209640342742205
Validation loss = 0.002959778066724539
Validation loss = 0.0032848534174263477
Validation loss = 0.00407294649630785
Validation loss = 0.0029589678160846233
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004227310419082642
Validation loss = 0.0031140868086367846
Validation loss = 0.003947099205106497
Validation loss = 0.0036269400734454393
Validation loss = 0.0029133628122508526
Validation loss = 0.003685942618176341
Validation loss = 0.003248856868594885
Validation loss = 0.0038915430195629597
Validation loss = 0.00318910856731236
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0038860677741467953
Validation loss = 0.002983971033245325
Validation loss = 0.005121358670294285
Validation loss = 0.0031916373409330845
Validation loss = 0.004490500781685114
Validation loss = 0.003614528104662895
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00322996499016881
Validation loss = 0.0033533666282892227
Validation loss = 0.003333085449412465
Validation loss = 0.003287407336756587
Validation loss = 0.0033549717627465725
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 295      |
| Iteration     | 9        |
| MaximumReturn | 297      |
| MinimumReturn | 291      |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0032264860346913338
Validation loss = 0.0030672429129481316
Validation loss = 0.003597482806071639
Validation loss = 0.003158437553793192
Validation loss = 0.002974951406940818
Validation loss = 0.0029293172992765903
Validation loss = 0.003395184176042676
Validation loss = 0.003563910024240613
Validation loss = 0.002572504570707679
Validation loss = 0.0028168405406177044
Validation loss = 0.00357162207365036
Validation loss = 0.0027764977421611547
Validation loss = 0.002568664029240608
Validation loss = 0.002851826837286353
Validation loss = 0.0033779828809201717
Validation loss = 0.003018319373950362
Validation loss = 0.0032833563163876534
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0029971017502248287
Validation loss = 0.003829285502433777
Validation loss = 0.0029361441265791655
Validation loss = 0.002997509902343154
Validation loss = 0.0026951348409056664
Validation loss = 0.003924975171685219
Validation loss = 0.003004479454830289
Validation loss = 0.002717076102271676
Validation loss = 0.0034910833928734064
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0027057102415710688
Validation loss = 0.003233082825317979
Validation loss = 0.0029911876190453768
Validation loss = 0.0032362649217247963
Validation loss = 0.002782264957204461
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003137225052341819
Validation loss = 0.003077014349400997
Validation loss = 0.00614506471902132
Validation loss = 0.0029076181817799807
Validation loss = 0.002813042141497135
Validation loss = 0.0034554230514913797
Validation loss = 0.003461659885942936
Validation loss = 0.0029453495517373085
Validation loss = 0.0025984898675233126
Validation loss = 0.0032698095310479403
Validation loss = 0.0031205930281430483
Validation loss = 0.0027347265277057886
Validation loss = 0.002938388381153345
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0032049743458628654
Validation loss = 0.002748766215518117
Validation loss = 0.003185729030519724
Validation loss = 0.0033640943001955748
Validation loss = 0.0033387853763997555
Validation loss = 0.0032146298326551914
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 319      |
| Iteration     | 10       |
| MaximumReturn | 324      |
| MinimumReturn | 315      |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0030494891107082367
Validation loss = 0.0023596244864165783
Validation loss = 0.003007955150678754
Validation loss = 0.003020426956936717
Validation loss = 0.002535491017624736
Validation loss = 0.0026835158932954073
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0038140863180160522
Validation loss = 0.00292805302888155
Validation loss = 0.002561713568866253
Validation loss = 0.002531458856537938
Validation loss = 0.0027898969128727913
Validation loss = 0.0027843781281262636
Validation loss = 0.0029210958164185286
Validation loss = 0.0027077964041382074
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002254665130749345
Validation loss = 0.0034822977613657713
Validation loss = 0.0028107587713748217
Validation loss = 0.0031903169583529234
Validation loss = 0.0035926494747400284
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0032009712886065245
Validation loss = 0.0024089303333312273
Validation loss = 0.0029238404240459204
Validation loss = 0.0026327448431402445
Validation loss = 0.002612553769722581
Validation loss = 0.0023463787510991096
Validation loss = 0.0023386513348668814
Validation loss = 0.00265513826161623
Validation loss = 0.002523075556382537
Validation loss = 0.0022782988380640745
Validation loss = 0.0028400802984833717
Validation loss = 0.003880815813317895
Validation loss = 0.002949892310425639
Validation loss = 0.0024350511375814676
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003397752298042178
Validation loss = 0.0029299540910869837
Validation loss = 0.002767768921330571
Validation loss = 0.0032912276219576597
Validation loss = 0.0028321535792201757
Validation loss = 0.0025022069457918406
Validation loss = 0.0025346626061946154
Validation loss = 0.002671782858669758
Validation loss = 0.0035777047742158175
Validation loss = 0.003004353493452072
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 325      |
| Iteration     | 11       |
| MaximumReturn | 329      |
| MinimumReturn | 323      |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004859107546508312
Validation loss = 0.0026462299283593893
Validation loss = 0.002424753736704588
Validation loss = 0.002616588491946459
Validation loss = 0.002320260740816593
Validation loss = 0.002524584298953414
Validation loss = 0.0026815582532435656
Validation loss = 0.00260607386007905
Validation loss = 0.0022330041974782944
Validation loss = 0.002233474049717188
Validation loss = 0.0027015318628400564
Validation loss = 0.002393659669905901
Validation loss = 0.003031499683856964
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002990495413541794
Validation loss = 0.0023133226204663515
Validation loss = 0.0024931568186730146
Validation loss = 0.002292347140610218
Validation loss = 0.002932925708591938
Validation loss = 0.0024397852830588818
Validation loss = 0.0021193616557866335
Validation loss = 0.0027690199203789234
Validation loss = 0.00258226552978158
Validation loss = 0.002353405114263296
Validation loss = 0.002110831905156374
Validation loss = 0.002260665874928236
Validation loss = 0.0023360291961580515
Validation loss = 0.002981539350003004
Validation loss = 0.0023375467862933874
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002469998085871339
Validation loss = 0.0027172837872058153
Validation loss = 0.0024313481990247965
Validation loss = 0.002623525680974126
Validation loss = 0.002516644774004817
Validation loss = 0.0024572887923568487
Validation loss = 0.002342821331694722
Validation loss = 0.002338320016860962
Validation loss = 0.0023689433000981808
Validation loss = 0.0022879671305418015
Validation loss = 0.0024439143016934395
Validation loss = 0.002687019295990467
Validation loss = 0.002364581683650613
Validation loss = 0.0034252600744366646
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00266860774718225
Validation loss = 0.002505313837900758
Validation loss = 0.0037817766424268484
Validation loss = 0.0025816496927291155
Validation loss = 0.00233651464805007
Validation loss = 0.002054465701803565
Validation loss = 0.0021416249219328165
Validation loss = 0.0021530843805521727
Validation loss = 0.0024635482113808393
Validation loss = 0.0024700069334357977
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003836591262370348
Validation loss = 0.002394511131569743
Validation loss = 0.0030056152027100325
Validation loss = 0.002627279842272401
Validation loss = 0.0019881767220795155
Validation loss = 0.0025897203013300896
Validation loss = 0.0024787122383713722
Validation loss = 0.0024925193283706903
Validation loss = 0.0024817867670208216
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 322      |
| Iteration     | 12       |
| MaximumReturn | 326      |
| MinimumReturn | 319      |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002592042088508606
Validation loss = 0.0024637875612825155
Validation loss = 0.002282937755808234
Validation loss = 0.00198201066814363
Validation loss = 0.002945670858025551
Validation loss = 0.002302292501553893
Validation loss = 0.0024422924034297466
Validation loss = 0.0024608068633824587
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001955122919753194
Validation loss = 0.0019460030598565936
Validation loss = 0.0022590176668018103
Validation loss = 0.0022793824318796396
Validation loss = 0.0023636992555111647
Validation loss = 0.0020445233676582575
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018894057720899582
Validation loss = 0.002409897278994322
Validation loss = 0.0022232350893318653
Validation loss = 0.002231760649010539
Validation loss = 0.0021149388048797846
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0021456785034388304
Validation loss = 0.002341547980904579
Validation loss = 0.0022739910054951906
Validation loss = 0.0021647808607667685
Validation loss = 0.0019968252163380384
Validation loss = 0.0020657856948673725
Validation loss = 0.0019485291559249163
Validation loss = 0.0022393916733562946
Validation loss = 0.00295804045163095
Validation loss = 0.0019492999417707324
Validation loss = 0.001988281961530447
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0025644043926149607
Validation loss = 0.0030250311829149723
Validation loss = 0.00196934980340302
Validation loss = 0.0026815361343324184
Validation loss = 0.0028042204212397337
Validation loss = 0.0027737847995013
Validation loss = 0.0025726899039000273
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 326      |
| Iteration     | 13       |
| MaximumReturn | 330      |
| MinimumReturn | 322      |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021901424042880535
Validation loss = 0.0023791389539837837
Validation loss = 0.0022263864520937204
Validation loss = 0.0018402427667751908
Validation loss = 0.0023058122023940086
Validation loss = 0.001976882806047797
Validation loss = 0.0020793427247554064
Validation loss = 0.002365507185459137
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0024889507330954075
Validation loss = 0.0020963395945727825
Validation loss = 0.0022153109312057495
Validation loss = 0.002071845345199108
Validation loss = 0.001969544682651758
Validation loss = 0.0018598535098135471
Validation loss = 0.0023938342928886414
Validation loss = 0.0021036958787590265
Validation loss = 0.0018780648242682219
Validation loss = 0.0018739906372502446
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0027775682974606752
Validation loss = 0.002046211389824748
Validation loss = 0.0021461560390889645
Validation loss = 0.002061187755316496
Validation loss = 0.00238407077267766
Validation loss = 0.0018863828154280782
Validation loss = 0.0016976926708593965
Validation loss = 0.001833785674534738
Validation loss = 0.0021739755757153034
Validation loss = 0.002193018328398466
Validation loss = 0.0019195926142856479
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0019890631083399057
Validation loss = 0.002080790000036359
Validation loss = 0.002552522113546729
Validation loss = 0.00172516074962914
Validation loss = 0.0022560912184417248
Validation loss = 0.0030401411931961775
Validation loss = 0.001640091766603291
Validation loss = 0.002116267802193761
Validation loss = 0.002057853853330016
Validation loss = 0.0024896806571632624
Validation loss = 0.0019439681200310588
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0022840448655188084
Validation loss = 0.002181155839934945
Validation loss = 0.0018866807222366333
Validation loss = 0.002472222549840808
Validation loss = 0.0018744997214525938
Validation loss = 0.0022491973359137774
Validation loss = 0.0020663251634687185
Validation loss = 0.001848944346420467
Validation loss = 0.0021675669122487307
Validation loss = 0.0029953757766634226
Validation loss = 0.0019292128272354603
Validation loss = 0.002066217828541994
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 334      |
| Iteration     | 14       |
| MaximumReturn | 338      |
| MinimumReturn | 331      |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0020618089474737644
Validation loss = 0.0018500818405300379
Validation loss = 0.002039763843640685
Validation loss = 0.002339109545573592
Validation loss = 0.0022462275810539722
Validation loss = 0.002225602278485894
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0019794031977653503
Validation loss = 0.0025294674560427666
Validation loss = 0.002158713061362505
Validation loss = 0.0027996134012937546
Validation loss = 0.001998667139559984
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001797631150111556
Validation loss = 0.001855702605098486
Validation loss = 0.0019214882049709558
Validation loss = 0.001953318016603589
Validation loss = 0.002078681718558073
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0018991129472851753
Validation loss = 0.0023547678720206022
Validation loss = 0.0016085606766864657
Validation loss = 0.0021178964525461197
Validation loss = 0.0017121698474511504
Validation loss = 0.0018071222584694624
Validation loss = 0.0017892044270411134
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0020273125264793634
Validation loss = 0.0018105037743225694
Validation loss = 0.0020146765746176243
Validation loss = 0.00221160426735878
Validation loss = 0.0022669394966214895
Validation loss = 0.001963110873475671
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 331      |
| Iteration     | 15       |
| MaximumReturn | 333      |
| MinimumReturn | 327      |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002575960475951433
Validation loss = 0.0018802868435159326
Validation loss = 0.002700546057894826
Validation loss = 0.0017813792219385505
Validation loss = 0.0019753919914364815
Validation loss = 0.0015897308476269245
Validation loss = 0.0019080761121585965
Validation loss = 0.0016600474482402205
Validation loss = 0.0020716905128210783
Validation loss = 0.0021751129534095526
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016921667847782373
Validation loss = 0.0026444629766047
Validation loss = 0.0015466168988496065
Validation loss = 0.0018194792792201042
Validation loss = 0.00164416350889951
Validation loss = 0.0017611293587833643
Validation loss = 0.0017313368152827024
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021148480009287596
Validation loss = 0.002581617794930935
Validation loss = 0.0016822549514472485
Validation loss = 0.002380599733442068
Validation loss = 0.0017037388170138001
Validation loss = 0.0020891204476356506
Validation loss = 0.0020346837118268013
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001517268130555749
Validation loss = 0.0017649257788434625
Validation loss = 0.001965919276699424
Validation loss = 0.0016437096055597067
Validation loss = 0.0018334632040932775
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001754556316882372
Validation loss = 0.0018886367324739695
Validation loss = 0.0017545795999467373
Validation loss = 0.002322214189916849
Validation loss = 0.002126022707670927
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 330      |
| Iteration     | 16       |
| MaximumReturn | 333      |
| MinimumReturn | 325      |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0020595877431333065
Validation loss = 0.0015764989657327533
Validation loss = 0.00177847221493721
Validation loss = 0.0017608641646802425
Validation loss = 0.0017757597379386425
Validation loss = 0.001673953258432448
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002050922252237797
Validation loss = 0.0020958927925676107
Validation loss = 0.001466088928282261
Validation loss = 0.0020389417186379433
Validation loss = 0.0022947490215301514
Validation loss = 0.001875156769528985
Validation loss = 0.0015891818329691887
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002455697162076831
Validation loss = 0.0017148912884294987
Validation loss = 0.0019018097082152963
Validation loss = 0.0021487988997250795
Validation loss = 0.0018957608845084906
Validation loss = 0.0016721963183954358
Validation loss = 0.0017543967114761472
Validation loss = 0.0016865719808265567
Validation loss = 0.002032197080552578
Validation loss = 0.0018542311154305935
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002030925126746297
Validation loss = 0.0016906379023566842
Validation loss = 0.0020401524379849434
Validation loss = 0.0021175809670239687
Validation loss = 0.0019284321460872889
Validation loss = 0.001554170623421669
Validation loss = 0.0017262697219848633
Validation loss = 0.0017280860338360071
Validation loss = 0.0016471123090013862
Validation loss = 0.0014459122903645039
Validation loss = 0.0015591797418892384
Validation loss = 0.0016955648316070437
Validation loss = 0.0015867145266383886
Validation loss = 0.0017100636614486575
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0020392131991684437
Validation loss = 0.0017204554751515388
Validation loss = 0.0019535659812390804
Validation loss = 0.0015922068851068616
Validation loss = 0.0016625190619379282
Validation loss = 0.0018237379845231771
Validation loss = 0.0015079246368259192
Validation loss = 0.001624247757717967
Validation loss = 0.0023208470083773136
Validation loss = 0.0016531344735994935
Validation loss = 0.0017373580485582352
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 327      |
| Iteration     | 17       |
| MaximumReturn | 329      |
| MinimumReturn | 321      |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018254728056490421
Validation loss = 0.0016709687188267708
Validation loss = 0.0014412358868867159
Validation loss = 0.001652248203754425
Validation loss = 0.002073895651847124
Validation loss = 0.0015253962483257055
Validation loss = 0.0021886283066123724
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002075756900012493
Validation loss = 0.0016526589170098305
Validation loss = 0.001655260450206697
Validation loss = 0.0019279546104371548
Validation loss = 0.0015170160913839936
Validation loss = 0.0026259038131684065
Validation loss = 0.0014663472538813949
Validation loss = 0.0016027312958613038
Validation loss = 0.0016118367202579975
Validation loss = 0.0015812896890565753
Validation loss = 0.0015748515725135803
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015027947956696153
Validation loss = 0.0013536115875467658
Validation loss = 0.001636421657167375
Validation loss = 0.0015473911771550775
Validation loss = 0.0018233972368761897
Validation loss = 0.001863256678916514
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0018203454092144966
Validation loss = 0.0016065496020019054
Validation loss = 0.001372649916447699
Validation loss = 0.0015488866483792663
Validation loss = 0.0014755182201042771
Validation loss = 0.001930132508277893
Validation loss = 0.0016844533383846283
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014338217442855239
Validation loss = 0.001887340098619461
Validation loss = 0.0016366480849683285
Validation loss = 0.0016278235707432032
Validation loss = 0.001528618042357266
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 334      |
| Iteration     | 18       |
| MaximumReturn | 337      |
| MinimumReturn | 329      |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015165118966251612
Validation loss = 0.0015737578505650163
Validation loss = 0.0014916923828423023
Validation loss = 0.001329821185208857
Validation loss = 0.0015295761404559016
Validation loss = 0.001617410802282393
Validation loss = 0.0014910323079675436
Validation loss = 0.0013874454889446497
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002323666587471962
Validation loss = 0.001813122653402388
Validation loss = 0.001791378133930266
Validation loss = 0.0014782382640987635
Validation loss = 0.00157754251267761
Validation loss = 0.0019338462734594941
Validation loss = 0.0014830023283138871
Validation loss = 0.0018717872444540262
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0024293609894812107
Validation loss = 0.0017272888217121363
Validation loss = 0.0021711091976612806
Validation loss = 0.0017334442818537354
Validation loss = 0.0014355758903548121
Validation loss = 0.0015035525429993868
Validation loss = 0.0015429150080308318
Validation loss = 0.0015720274532213807
Validation loss = 0.0016038533067330718
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016258846735581756
Validation loss = 0.0018333354964852333
Validation loss = 0.0016432527918368578
Validation loss = 0.0017963811988011003
Validation loss = 0.001496998593211174
Validation loss = 0.0016367718344554305
Validation loss = 0.0014073506463319063
Validation loss = 0.0014126573223620653
Validation loss = 0.0014005672419443727
Validation loss = 0.0017599428538233042
Validation loss = 0.0013504663947969675
Validation loss = 0.0014716211007907987
Validation loss = 0.0014499147655442357
Validation loss = 0.0016601008828729391
Validation loss = 0.0012776164803653955
Validation loss = 0.001481946324929595
Validation loss = 0.0013721365248784423
Validation loss = 0.0019875208381563425
Validation loss = 0.001430812175385654
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001787975779734552
Validation loss = 0.0015934124821797013
Validation loss = 0.0015565699432045221
Validation loss = 0.002408707980066538
Validation loss = 0.0015086634084582329
Validation loss = 0.0015346620930358768
Validation loss = 0.001821485930122435
Validation loss = 0.0014735311269760132
Validation loss = 0.001459675026126206
Validation loss = 0.0017738022143021226
Validation loss = 0.0018348643789067864
Validation loss = 0.0015932079404592514
Validation loss = 0.001555203227326274
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 338      |
| Iteration     | 19       |
| MaximumReturn | 340      |
| MinimumReturn | 334      |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015271762385964394
Validation loss = 0.00204238248988986
Validation loss = 0.001687980955466628
Validation loss = 0.001375033869408071
Validation loss = 0.0017979382537305355
Validation loss = 0.0015362432459369302
Validation loss = 0.0013289423659443855
Validation loss = 0.0017697105649858713
Validation loss = 0.0013212430058047175
Validation loss = 0.0015245350077748299
Validation loss = 0.0012812482891604304
Validation loss = 0.0012955357087776065
Validation loss = 0.001660649781115353
Validation loss = 0.0014374542515724897
Validation loss = 0.0016581775853410363
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016655257204547524
Validation loss = 0.001583797624334693
Validation loss = 0.0017968883039429784
Validation loss = 0.0014766728272661567
Validation loss = 0.0015068290522322059
Validation loss = 0.0012516818242147565
Validation loss = 0.0015664336970075965
Validation loss = 0.0012288768775761127
Validation loss = 0.0014515413204208016
Validation loss = 0.00132953270804137
Validation loss = 0.001497267046943307
Validation loss = 0.0015681524528190494
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017077174270525575
Validation loss = 0.0014785906532779336
Validation loss = 0.0014793048612773418
Validation loss = 0.0014414918841794133
Validation loss = 0.0016110388096421957
Validation loss = 0.001907247002236545
Validation loss = 0.0017290115356445312
Validation loss = 0.0013405380304902792
Validation loss = 0.0013416146393865347
Validation loss = 0.0014144058804959059
Validation loss = 0.001552901929244399
Validation loss = 0.001569982385262847
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0018363924464210868
Validation loss = 0.001669728197157383
Validation loss = 0.0013969838619232178
Validation loss = 0.0019237161614000797
Validation loss = 0.0012953890254721045
Validation loss = 0.0014151493087410927
Validation loss = 0.001358931651338935
Validation loss = 0.0014134673401713371
Validation loss = 0.0014498529490083456
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0017172867665067315
Validation loss = 0.0014608922647312284
Validation loss = 0.0025881307665258646
Validation loss = 0.002112969756126404
Validation loss = 0.0013025058433413506
Validation loss = 0.0015908419154584408
Validation loss = 0.0012416815152391791
Validation loss = 0.001960680354386568
Validation loss = 0.002278630854561925
Validation loss = 0.0018254322931170464
Validation loss = 0.0013987274141982198
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 337      |
| Iteration     | 20       |
| MaximumReturn | 341      |
| MinimumReturn | 332      |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015976116992533207
Validation loss = 0.0014728927053511143
Validation loss = 0.001482709078118205
Validation loss = 0.0014338705223053694
Validation loss = 0.0015318128280341625
Validation loss = 0.0015775527572259307
Validation loss = 0.0015535198617726564
Validation loss = 0.0017328211106359959
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013535722391679883
Validation loss = 0.0013324794126674533
Validation loss = 0.0015098671428859234
Validation loss = 0.00194854277651757
Validation loss = 0.001314350520260632
Validation loss = 0.0012856167741119862
Validation loss = 0.0012809562031179667
Validation loss = 0.0012310291640460491
Validation loss = 0.0012038885615766048
Validation loss = 0.001495209289714694
Validation loss = 0.0017267238581553102
Validation loss = 0.0012486304622143507
Validation loss = 0.0016116424230858684
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012254673056304455
Validation loss = 0.0012819929979741573
Validation loss = 0.0013961580116301775
Validation loss = 0.0015942850150167942
Validation loss = 0.0015400470001623034
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001611049403436482
Validation loss = 0.0019087065011262894
Validation loss = 0.0014592031948268414
Validation loss = 0.0011745962547138333
Validation loss = 0.0011899572564288974
Validation loss = 0.0012638545595109463
Validation loss = 0.0012006767792627215
Validation loss = 0.0012881410075351596
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014355044113472104
Validation loss = 0.0013029836118221283
Validation loss = 0.0012995625147596002
Validation loss = 0.0015775974607095122
Validation loss = 0.0014210629742592573
Validation loss = 0.0018724951660260558
Validation loss = 0.0016426303191110492
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 339      |
| Iteration     | 21       |
| MaximumReturn | 340      |
| MinimumReturn | 337      |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014439969090744853
Validation loss = 0.0015333734918385744
Validation loss = 0.0012945798225700855
Validation loss = 0.0014450261369347572
Validation loss = 0.0012015268439427018
Validation loss = 0.0014558094553649426
Validation loss = 0.0017634270479902625
Validation loss = 0.0012982930056750774
Validation loss = 0.0014807123225182295
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015863829758018255
Validation loss = 0.0013853289419785142
Validation loss = 0.0011760148918256164
Validation loss = 0.001210337388329208
Validation loss = 0.0013559689978137612
Validation loss = 0.001273094560019672
Validation loss = 0.0011631232919171453
Validation loss = 0.0015805668663233519
Validation loss = 0.0011220378801226616
Validation loss = 0.0018465339671820402
Validation loss = 0.0017565522575750947
Validation loss = 0.001416682731360197
Validation loss = 0.0013304016320034862
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012745754793286324
Validation loss = 0.0013337561395019293
Validation loss = 0.0016359207220375538
Validation loss = 0.0014446796849370003
Validation loss = 0.0012280932860448956
Validation loss = 0.0013766902266070247
Validation loss = 0.0017698698211461306
Validation loss = 0.0011665902566164732
Validation loss = 0.0012604647781699896
Validation loss = 0.001213560812175274
Validation loss = 0.0017953424248844385
Validation loss = 0.0012112843105569482
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011256809812039137
Validation loss = 0.0017321084160357714
Validation loss = 0.0012345552677288651
Validation loss = 0.0011383488308638334
Validation loss = 0.0012451165821403265
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013669799081981182
Validation loss = 0.0013659108662977815
Validation loss = 0.001681043184362352
Validation loss = 0.0016275312518700957
Validation loss = 0.0012966300128027797
Validation loss = 0.0014554965309798717
Validation loss = 0.001579990959726274
Validation loss = 0.0012164452346041799
Validation loss = 0.001325477845966816
Validation loss = 0.0014256549766287208
Validation loss = 0.001429862342774868
Validation loss = 0.0015232078731060028
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 346      |
| Iteration     | 22       |
| MaximumReturn | 349      |
| MinimumReturn | 342      |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0020036918576806784
Validation loss = 0.0011799600906670094
Validation loss = 0.0014911368489265442
Validation loss = 0.001192235853523016
Validation loss = 0.0012006416218355298
Validation loss = 0.0012454360257834196
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013306617038324475
Validation loss = 0.0011713068233802915
Validation loss = 0.0015626884996891022
Validation loss = 0.0014588279882445931
Validation loss = 0.0012056853156536818
Validation loss = 0.0014003394171595573
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012432847870513797
Validation loss = 0.0012362722773104906
Validation loss = 0.0011859688675031066
Validation loss = 0.0010315856197848916
Validation loss = 0.0012320695677772164
Validation loss = 0.001127608702518046
Validation loss = 0.0012502000899985433
Validation loss = 0.001228382927365601
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013882518978789449
Validation loss = 0.0011426472337916493
Validation loss = 0.0016190867172554135
Validation loss = 0.0012237122282385826
Validation loss = 0.001320970244705677
Validation loss = 0.0012902667513117194
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012398260878399014
Validation loss = 0.0018123205518350005
Validation loss = 0.0011551359202712774
Validation loss = 0.0010793217225000262
Validation loss = 0.0013437120942398906
Validation loss = 0.0011340506607666612
Validation loss = 0.0011497299419716
Validation loss = 0.001241117250174284
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 342      |
| Iteration     | 23       |
| MaximumReturn | 343      |
| MinimumReturn | 341      |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015220452332869172
Validation loss = 0.0015839440748095512
Validation loss = 0.0012366886949166656
Validation loss = 0.0011628398206084967
Validation loss = 0.00124943139962852
Validation loss = 0.0015407272148877382
Validation loss = 0.001392326899804175
Validation loss = 0.0011582000879570842
Validation loss = 0.0012707333080470562
Validation loss = 0.0010097556514665484
Validation loss = 0.0011582791339606047
Validation loss = 0.0012564649805426598
Validation loss = 0.0013803248293697834
Validation loss = 0.0011002742685377598
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012765857391059399
Validation loss = 0.0013690622290596366
Validation loss = 0.0012033680686727166
Validation loss = 0.0011028207372874022
Validation loss = 0.0015385709702968597
Validation loss = 0.0013022094499319792
Validation loss = 0.0011920881224796176
Validation loss = 0.0012000050628557801
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010851791594177485
Validation loss = 0.0014146310277283192
Validation loss = 0.0012306164717301726
Validation loss = 0.001468417001888156
Validation loss = 0.001179381157271564
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011102169519290328
Validation loss = 0.0011322139762341976
Validation loss = 0.0012106405338272452
Validation loss = 0.0011560437269508839
Validation loss = 0.0013658092357218266
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011068020248785615
Validation loss = 0.0011549710761755705
Validation loss = 0.0014128140173852444
Validation loss = 0.0021530608646571636
Validation loss = 0.0010759328724816442
Validation loss = 0.0012207978870719671
Validation loss = 0.00134172011166811
Validation loss = 0.0014888173900544643
Validation loss = 0.0011806184193119407
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 342      |
| Iteration     | 24       |
| MaximumReturn | 347      |
| MinimumReturn | 336      |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011915491195395589
Validation loss = 0.0015680992510169744
Validation loss = 0.001030193641781807
Validation loss = 0.0011954802321270108
Validation loss = 0.0012445634929463267
Validation loss = 0.0011896418873220682
Validation loss = 0.0011894620256498456
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011475787032395601
Validation loss = 0.001301334472373128
Validation loss = 0.0011022650869563222
Validation loss = 0.0011953264474868774
Validation loss = 0.0011265315115451813
Validation loss = 0.0011766819516196847
Validation loss = 0.0012764128623530269
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018567534862086177
Validation loss = 0.0009933357359841466
Validation loss = 0.0010336694540455937
Validation loss = 0.0013841289328411222
Validation loss = 0.0013956102775409818
Validation loss = 0.0014768245164304972
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014172157971188426
Validation loss = 0.0013690529158338904
Validation loss = 0.0011010293383151293
Validation loss = 0.0014701009495183825
Validation loss = 0.0010227240854874253
Validation loss = 0.0014434781624004245
Validation loss = 0.0013873273273929954
Validation loss = 0.0010796253336593509
Validation loss = 0.0013609160669147968
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015153144486248493
Validation loss = 0.0013560198713093996
Validation loss = 0.001104671391658485
Validation loss = 0.0015374782960861921
Validation loss = 0.0010907032992690802
Validation loss = 0.0012682286323979497
Validation loss = 0.001028941827826202
Validation loss = 0.0012417121324688196
Validation loss = 0.0015041136648505926
Validation loss = 0.0011852980824187398
Validation loss = 0.0011333436705172062
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 344      |
| Iteration     | 25       |
| MaximumReturn | 348      |
| MinimumReturn | 341      |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010626493021845818
Validation loss = 0.0011930214241147041
Validation loss = 0.0012504778569564223
Validation loss = 0.0010080474894493818
Validation loss = 0.0016899789916351438
Validation loss = 0.0010711527429521084
Validation loss = 0.0011427149875089526
Validation loss = 0.0009676490444689989
Validation loss = 0.00176474719773978
Validation loss = 0.0010119957150891423
Validation loss = 0.001236417912878096
Validation loss = 0.0010226713493466377
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001376579632051289
Validation loss = 0.0009393469081260264
Validation loss = 0.0011888998560607433
Validation loss = 0.001109032891690731
Validation loss = 0.001049528713338077
Validation loss = 0.0011488718446344137
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001175986952148378
Validation loss = 0.001110327080823481
Validation loss = 0.0012339205713942647
Validation loss = 0.00114975415635854
Validation loss = 0.0012358457315713167
Validation loss = 0.001216211123391986
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012521134922280908
Validation loss = 0.0012047565542161465
Validation loss = 0.0012176071759313345
Validation loss = 0.0012479174183681607
Validation loss = 0.0011938282987102866
Validation loss = 0.0010072232689708471
Validation loss = 0.0010025008814409375
Validation loss = 0.0010928590781986713
Validation loss = 0.0011076924856752157
Validation loss = 0.0009883786551654339
Validation loss = 0.001090385252609849
Validation loss = 0.0009519168525002897
Validation loss = 0.0009844013256952167
Validation loss = 0.0012355083599686623
Validation loss = 0.0011095156660303473
Validation loss = 0.0012821980053558946
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010251628700643778
Validation loss = 0.0010813858825713396
Validation loss = 0.0011592170922085643
Validation loss = 0.001080058398656547
Validation loss = 0.001288842991925776
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 346      |
| Iteration     | 26       |
| MaximumReturn | 348      |
| MinimumReturn | 343      |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001091244164854288
Validation loss = 0.0010513380402699113
Validation loss = 0.0013763869646936655
Validation loss = 0.00156499445438385
Validation loss = 0.0010267238831147552
Validation loss = 0.00192288460675627
Validation loss = 0.0009974036365747452
Validation loss = 0.000994154717773199
Validation loss = 0.0011653080582618713
Validation loss = 0.0010712920920923352
Validation loss = 0.0009512562537565827
Validation loss = 0.0010917828185483813
Validation loss = 0.0010703594889491796
Validation loss = 0.0011118125403299928
Validation loss = 0.0009776124497875571
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010215367656201124
Validation loss = 0.0013128322316333652
Validation loss = 0.0009460056899115443
Validation loss = 0.001061099348589778
Validation loss = 0.00101132330019027
Validation loss = 0.0009631876018829644
Validation loss = 0.0012633315054699779
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001097564003430307
Validation loss = 0.0010731121292337775
Validation loss = 0.0011449501616880298
Validation loss = 0.001070088124834001
Validation loss = 0.0010593343758955598
Validation loss = 0.0009832029463723302
Validation loss = 0.001265058876015246
Validation loss = 0.0010001292685046792
Validation loss = 0.0014391012955456972
Validation loss = 0.0010524408426135778
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012680563377216458
Validation loss = 0.0009855389362201095
Validation loss = 0.0014499066164717078
Validation loss = 0.0009596420568414032
Validation loss = 0.0012941232416778803
Validation loss = 0.0011956571834161878
Validation loss = 0.0009010951616801322
Validation loss = 0.000967461324762553
Validation loss = 0.0010368026560172439
Validation loss = 0.0009030172950588167
Validation loss = 0.001032726257108152
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001098918030038476
Validation loss = 0.001095500192604959
Validation loss = 0.0014176800614222884
Validation loss = 0.0009447703487239778
Validation loss = 0.001083014882169664
Validation loss = 0.0013806660426780581
Validation loss = 0.0011280400212854147
Validation loss = 0.0013036137679591775
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 350      |
| Iteration     | 27       |
| MaximumReturn | 351      |
| MinimumReturn | 349      |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014036361826583743
Validation loss = 0.0009828833863139153
Validation loss = 0.0008486404549330473
Validation loss = 0.0011691271793097258
Validation loss = 0.000942409154959023
Validation loss = 0.0009895734256133437
Validation loss = 0.0010167433647438884
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011015549534931779
Validation loss = 0.0009515118435956538
Validation loss = 0.0012645384995266795
Validation loss = 0.0009655459434725344
Validation loss = 0.0011488695163279772
Validation loss = 0.0008861253736540675
Validation loss = 0.00224321405403316
Validation loss = 0.0009502972825430334
Validation loss = 0.0013252442004159093
Validation loss = 0.0010849416721612215
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012209410779178143
Validation loss = 0.0010846093064174056
Validation loss = 0.0011394491884857416
Validation loss = 0.001097261207178235
Validation loss = 0.0011487926822155714
Validation loss = 0.000944028957746923
Validation loss = 0.001119019347243011
Validation loss = 0.001232061069458723
Validation loss = 0.0013716592220589519
Validation loss = 0.001010458916425705
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010800710879266262
Validation loss = 0.0011997222900390625
Validation loss = 0.0009952681139111519
Validation loss = 0.0010982441017404199
Validation loss = 0.0009620572091080248
Validation loss = 0.0011420048540458083
Validation loss = 0.0008598236017860472
Validation loss = 0.001033630920574069
Validation loss = 0.0011188129428774118
Validation loss = 0.0011277706362307072
Validation loss = 0.0008894969359971583
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010124272666871548
Validation loss = 0.001138264429755509
Validation loss = 0.0011610598303377628
Validation loss = 0.0009110341779887676
Validation loss = 0.0013401382602751255
Validation loss = 0.0009223425877280533
Validation loss = 0.0013367680367082357
Validation loss = 0.001244427403435111
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 346      |
| Iteration     | 28       |
| MaximumReturn | 348      |
| MinimumReturn | 344      |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010107462294399738
Validation loss = 0.0012274169130250812
Validation loss = 0.001090228441171348
Validation loss = 0.0012033104430884123
Validation loss = 0.0013218952808529139
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010314780520275235
Validation loss = 0.0021883093286305666
Validation loss = 0.001043683267198503
Validation loss = 0.0009416528628207743
Validation loss = 0.000963280035648495
Validation loss = 0.0011006054701283574
Validation loss = 0.0011272819247096777
Validation loss = 0.0009509535739198327
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009802902350202203
Validation loss = 0.0011923194397240877
Validation loss = 0.0010712442453950644
Validation loss = 0.00119927863124758
Validation loss = 0.0009078572620637715
Validation loss = 0.001274587819352746
Validation loss = 0.001124123577028513
Validation loss = 0.0014576718676835299
Validation loss = 0.0009570507681928575
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001142658293247223
Validation loss = 0.001359107089228928
Validation loss = 0.0010711061768233776
Validation loss = 0.0009063233737833798
Validation loss = 0.0011143586598336697
Validation loss = 0.0012347811134532094
Validation loss = 0.0008861070382408798
Validation loss = 0.0011001084931194782
Validation loss = 0.0009289461304433644
Validation loss = 0.0011765060480684042
Validation loss = 0.0007959614158608019
Validation loss = 0.0008976304088719189
Validation loss = 0.0008427921566180885
Validation loss = 0.0008104289299808443
Validation loss = 0.0014928466407582164
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012514620320871472
Validation loss = 0.0010084868408739567
Validation loss = 0.0009805555455386639
Validation loss = 0.0008693839190527797
Validation loss = 0.001202032552100718
Validation loss = 0.0010258309775963426
Validation loss = 0.0009565475629642606
Validation loss = 0.0010661291889846325
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 345      |
| Iteration     | 29       |
| MaximumReturn | 348      |
| MinimumReturn | 343      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016558634815737605
Validation loss = 0.0009873375529423356
Validation loss = 0.0010744578903540969
Validation loss = 0.0016936341999098659
Validation loss = 0.001061755232512951
Validation loss = 0.0009581082849763334
Validation loss = 0.0009980874601751566
Validation loss = 0.0010257974499836564
Validation loss = 0.0011063613928854465
Validation loss = 0.0011283524800091982
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009657705086283386
Validation loss = 0.0009211183059960604
Validation loss = 0.0008937198435887694
Validation loss = 0.000978640397079289
Validation loss = 0.0009102945914492011
Validation loss = 0.0011175754480063915
Validation loss = 0.0008536685490980744
Validation loss = 0.0008889581076800823
Validation loss = 0.0008875179919414222
Validation loss = 0.001142192748375237
Validation loss = 0.00112637085840106
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009869829518720508
Validation loss = 0.0010863514617085457
Validation loss = 0.0009852235671132803
Validation loss = 0.0009061025339178741
Validation loss = 0.0008917864761315286
Validation loss = 0.001124637434259057
Validation loss = 0.0010230576153844595
Validation loss = 0.000928723078686744
Validation loss = 0.000980486162006855
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016176869394257665
Validation loss = 0.0008060362306423485
Validation loss = 0.0009668519487604499
Validation loss = 0.0010216936934739351
Validation loss = 0.001100457739084959
Validation loss = 0.000794821185991168
Validation loss = 0.0009009767090901732
Validation loss = 0.0007903764490038157
Validation loss = 0.0009925641352310777
Validation loss = 0.000919826328754425
Validation loss = 0.000857891864143312
Validation loss = 0.001163117471151054
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011847774730995297
Validation loss = 0.0012220997596159577
Validation loss = 0.0011807633563876152
Validation loss = 0.000992190558463335
Validation loss = 0.0010016774758696556
Validation loss = 0.0012375632068142295
Validation loss = 0.001384546747431159
Validation loss = 0.001105317729525268
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 337      |
| Iteration     | 30       |
| MaximumReturn | 340      |
| MinimumReturn | 335      |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011779654305428267
Validation loss = 0.0009722943650558591
Validation loss = 0.0014048849698156118
Validation loss = 0.001165690366178751
Validation loss = 0.0008363832021132112
Validation loss = 0.0008695037104189396
Validation loss = 0.0010353828547522426
Validation loss = 0.0008343890076503158
Validation loss = 0.0012461661826819181
Validation loss = 0.001071175909601152
Validation loss = 0.0016255686059594154
Validation loss = 0.0008376067853532732
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010840333998203278
Validation loss = 0.0007992188329808414
Validation loss = 0.0009760414832271636
Validation loss = 0.001297280192375183
Validation loss = 0.000846640847157687
Validation loss = 0.0014341413043439388
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008669592207297683
Validation loss = 0.0009933251421898603
Validation loss = 0.0009362366399727762
Validation loss = 0.0008417082717642188
Validation loss = 0.0008492840570397675
Validation loss = 0.0009485214250162244
Validation loss = 0.0009611924760974944
Validation loss = 0.0009178096079267561
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001234384486451745
Validation loss = 0.001495888689532876
Validation loss = 0.0009532716358080506
Validation loss = 0.0007943315431475639
Validation loss = 0.000998052186332643
Validation loss = 0.0009003753657452762
Validation loss = 0.0007697875844314694
Validation loss = 0.0008793974411673844
Validation loss = 0.0012351259356364608
Validation loss = 0.001229841378517449
Validation loss = 0.0007980398368090391
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000941461359616369
Validation loss = 0.0008328162948600948
Validation loss = 0.0009238222846761346
Validation loss = 0.0010566647397354245
Validation loss = 0.0008305836818180978
Validation loss = 0.0010435018921270967
Validation loss = 0.0011735655134543777
Validation loss = 0.0009031525114551187
Validation loss = 0.000841484172269702
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 340      |
| Iteration     | 31       |
| MaximumReturn | 341      |
| MinimumReturn | 338      |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009895209223031998
Validation loss = 0.0008237231522798538
Validation loss = 0.0008764645317569375
Validation loss = 0.0012222814839333296
Validation loss = 0.001008415361866355
Validation loss = 0.0008329011034220457
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001277108327485621
Validation loss = 0.0009872537339106202
Validation loss = 0.0011861907551065087
Validation loss = 0.0008510022307746112
Validation loss = 0.0011681698961183429
Validation loss = 0.0008190091466531157
Validation loss = 0.0007828092784620821
Validation loss = 0.0014826311962679029
Validation loss = 0.0007321556331589818
Validation loss = 0.0009456134866923094
Validation loss = 0.0008455676143057644
Validation loss = 0.0008802172378636897
Validation loss = 0.0007998617365956306
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009044703328981996
Validation loss = 0.0009242863743565977
Validation loss = 0.0008540452108718455
Validation loss = 0.0007660964620299637
Validation loss = 0.0009318274678662419
Validation loss = 0.0012989669339731336
Validation loss = 0.0008022349211387336
Validation loss = 0.0011276457225903869
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007227053865790367
Validation loss = 0.0007718781707808375
Validation loss = 0.0010073806624859571
Validation loss = 0.0008691176190041006
Validation loss = 0.0008886460564099252
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008616963168606162
Validation loss = 0.0012088577495887876
Validation loss = 0.000990382512100041
Validation loss = 0.0009988658130168915
Validation loss = 0.0008904472924768925
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 344      |
| Iteration     | 32       |
| MaximumReturn | 346      |
| MinimumReturn | 340      |
| TotalSamples  | 136000   |
----------------------------
