Logging to experiments/gym_cheetahA01/gym_cheetahA01/Fri-28-Oct-2022-03-06-00-PM-CDT_gym_cheetahA01_trpo_iteration_20_seed2341
Print configuration .....
{'env_name': 'gym_cheetahA01', 'random_seeds': [4321, 2314, 2341, 3421], 'save_variables': False, 'model_save_dir': '/tmp/gym_cheetahA01_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7772402763366699
Validation loss = 0.1847214698791504
Validation loss = 0.09835736453533173
Validation loss = 0.08882872760295868
Validation loss = 0.08126182854175568
Validation loss = 0.08004678785800934
Validation loss = 0.07740040123462677
Validation loss = 0.07700581848621368
Validation loss = 0.07527876645326614
Validation loss = 0.07058893144130707
Validation loss = 0.07823270559310913
Validation loss = 0.07268285751342773
Validation loss = 0.06969614326953888
Validation loss = 0.11254376173019409
Validation loss = 0.07185646891593933
Validation loss = 0.06938601285219193
Validation loss = 0.0714479386806488
Validation loss = 0.06885538250207901
Validation loss = 0.07261347770690918
Validation loss = 0.0705142468214035
Validation loss = 0.07486305385828018
Validation loss = 0.08122745156288147
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5402370691299438
Validation loss = 0.18549305200576782
Validation loss = 0.10154932737350464
Validation loss = 0.08724573254585266
Validation loss = 0.08857373148202896
Validation loss = 0.08032643795013428
Validation loss = 0.07949477434158325
Validation loss = 0.07312086224555969
Validation loss = 0.08051497489213943
Validation loss = 0.0717838704586029
Validation loss = 0.07563260942697525
Validation loss = 0.07384602725505829
Validation loss = 0.07780821621417999
Validation loss = 0.07798890769481659
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49337372183799744
Validation loss = 0.18340688943862915
Validation loss = 0.10107101500034332
Validation loss = 0.09337182343006134
Validation loss = 0.0835242047905922
Validation loss = 0.07746442407369614
Validation loss = 0.08063696324825287
Validation loss = 0.07644963264465332
Validation loss = 0.07599649578332901
Validation loss = 0.07298049330711365
Validation loss = 0.07787109166383743
Validation loss = 0.07467722147703171
Validation loss = 0.07191498577594757
Validation loss = 0.0792921632528305
Validation loss = 0.07423092424869537
Validation loss = 0.06764118373394012
Validation loss = 0.0765782967209816
Validation loss = 0.06829489767551422
Validation loss = 0.06758482754230499
Validation loss = 0.09564714133739471
Validation loss = 0.07257051765918732
Validation loss = 0.06683342158794403
Validation loss = 0.07941330969333649
Validation loss = 0.0674915760755539
Validation loss = 0.06859008967876434
Validation loss = 0.07822476327419281
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6631736159324646
Validation loss = 0.18403276801109314
Validation loss = 0.10286921262741089
Validation loss = 0.08838918060064316
Validation loss = 0.08295483887195587
Validation loss = 0.08714424073696136
Validation loss = 0.08055265247821808
Validation loss = 0.07879753410816193
Validation loss = 0.08104969561100006
Validation loss = 0.07827398180961609
Validation loss = 0.07464636117219925
Validation loss = 0.07366975396871567
Validation loss = 0.0707835927605629
Validation loss = 0.07370348274707794
Validation loss = 0.06861267238855362
Validation loss = 0.08024906367063522
Validation loss = 0.07920853048563004
Validation loss = 0.08789625763893127
Validation loss = 0.0670803040266037
Validation loss = 0.08516612648963928
Validation loss = 0.06725139170885086
Validation loss = 0.06727956235408783
Validation loss = 0.06858053803443909
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6885250806808472
Validation loss = 0.1718595325946808
Validation loss = 0.09818863868713379
Validation loss = 0.08856306970119476
Validation loss = 0.08393287658691406
Validation loss = 0.09074417501688004
Validation loss = 0.07611684501171112
Validation loss = 0.07574515044689178
Validation loss = 0.07565853744745255
Validation loss = 0.07181917130947113
Validation loss = 0.08280286192893982
Validation loss = 0.07059545069932938
Validation loss = 0.07050290703773499
Validation loss = 0.08044004440307617
Validation loss = 0.10969908535480499
Validation loss = 0.07074581831693649
Validation loss = 0.07176875323057175
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -384     |
| Iteration     | 0        |
| MaximumReturn | -310     |
| MinimumReturn | -448     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.13717429339885712
Validation loss = 0.09586017578840256
Validation loss = 0.08637088537216187
Validation loss = 0.07933200895786285
Validation loss = 0.0761602520942688
Validation loss = 0.08213123679161072
Validation loss = 0.07239887118339539
Validation loss = 0.06916381418704987
Validation loss = 0.07709574699401855
Validation loss = 0.06788864731788635
Validation loss = 0.09079311788082123
Validation loss = 0.06614819169044495
Validation loss = 0.06721808761358261
Validation loss = 0.06581299751996994
Validation loss = 0.06570969521999359
Validation loss = 0.06605435907840729
Validation loss = 0.07270278036594391
Validation loss = 0.06526721268892288
Validation loss = 0.06785658746957779
Validation loss = 0.06692886352539062
Validation loss = 0.0659116804599762
Validation loss = 0.06555549055337906
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14427325129508972
Validation loss = 0.09611254185438156
Validation loss = 0.0908420979976654
Validation loss = 0.0814509466290474
Validation loss = 0.07957697659730911
Validation loss = 0.07558386027812958
Validation loss = 0.07218840718269348
Validation loss = 0.07310474663972855
Validation loss = 0.07131227105855942
Validation loss = 0.06929993629455566
Validation loss = 0.07331633567810059
Validation loss = 0.06817322969436646
Validation loss = 0.09451780468225479
Validation loss = 0.06945249438285828
Validation loss = 0.06842389702796936
Validation loss = 0.06748365610837936
Validation loss = 0.0706222876906395
Validation loss = 0.06990858912467957
Validation loss = 0.07421550154685974
Validation loss = 0.06561624258756638
Validation loss = 0.06673184037208557
Validation loss = 0.07875173538923264
Validation loss = 0.06758097559213638
Validation loss = 0.06575680524110794
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.13901132345199585
Validation loss = 0.09426893293857574
Validation loss = 0.08607172966003418
Validation loss = 0.0789320170879364
Validation loss = 0.07586538791656494
Validation loss = 0.0770692527294159
Validation loss = 0.07693776488304138
Validation loss = 0.07224512100219727
Validation loss = 0.07038755714893341
Validation loss = 0.0721442699432373
Validation loss = 0.06934560835361481
Validation loss = 0.06905452907085419
Validation loss = 0.06848392635583878
Validation loss = 0.07879143208265305
Validation loss = 0.06956996023654938
Validation loss = 0.06957396119832993
Validation loss = 0.06788929551839828
Validation loss = 0.07037224620580673
Validation loss = 0.0726562887430191
Validation loss = 0.07238025963306427
Validation loss = 0.07023950666189194
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.15891632437705994
Validation loss = 0.09505318850278854
Validation loss = 0.08907336741685867
Validation loss = 0.08212968707084656
Validation loss = 0.07909157872200012
Validation loss = 0.08182011544704437
Validation loss = 0.07764101028442383
Validation loss = 0.07155407220125198
Validation loss = 0.07173888385295868
Validation loss = 0.07389684021472931
Validation loss = 0.07498148083686829
Validation loss = 0.06917678564786911
Validation loss = 0.06944601982831955
Validation loss = 0.0704600065946579
Validation loss = 0.06914207339286804
Validation loss = 0.06871416419744492
Validation loss = 0.0764101892709732
Validation loss = 0.06802225857973099
Validation loss = 0.0699947327375412
Validation loss = 0.0697028636932373
Validation loss = 0.06674899160861969
Validation loss = 0.07130211591720581
Validation loss = 0.07110494375228882
Validation loss = 0.07005273550748825
Validation loss = 0.06866933405399323
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14215317368507385
Validation loss = 0.09454195201396942
Validation loss = 0.08915825188159943
Validation loss = 0.08495770394802094
Validation loss = 0.09254227578639984
Validation loss = 0.08250107616186142
Validation loss = 0.07743185758590698
Validation loss = 0.08164505660533905
Validation loss = 0.09445247054100037
Validation loss = 0.07053935527801514
Validation loss = 0.0721321776509285
Validation loss = 0.06977757066488266
Validation loss = 0.07058413326740265
Validation loss = 0.06975241005420685
Validation loss = 0.06934943795204163
Validation loss = 0.06913565844297409
Validation loss = 0.0699482411146164
Validation loss = 0.06902296841144562
Validation loss = 0.07064489275217056
Validation loss = 0.06718277186155319
Validation loss = 0.07424335181713104
Validation loss = 0.08374179154634476
Validation loss = 0.06853826344013214
Validation loss = 0.06945868581533432
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.9    |
| Iteration     | 1        |
| MaximumReturn | 65       |
| MinimumReturn | -165     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07633311301469803
Validation loss = 0.06534776091575623
Validation loss = 0.0633489191532135
Validation loss = 0.06428411602973938
Validation loss = 0.06129692867398262
Validation loss = 0.06101725995540619
Validation loss = 0.059524793177843094
Validation loss = 0.05790555849671364
Validation loss = 0.057816118001937866
Validation loss = 0.05841338634490967
Validation loss = 0.059038180857896805
Validation loss = 0.05893504619598389
Validation loss = 0.05695825442671776
Validation loss = 0.0579521618783474
Validation loss = 0.057482779026031494
Validation loss = 0.06078706681728363
Validation loss = 0.06142229959368706
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07390958815813065
Validation loss = 0.06547630578279495
Validation loss = 0.06371555477380753
Validation loss = 0.06187347695231438
Validation loss = 0.06407087296247482
Validation loss = 0.0577201209962368
Validation loss = 0.05938270688056946
Validation loss = 0.0618966780602932
Validation loss = 0.05813649669289589
Validation loss = 0.060411352664232254
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0897153913974762
Validation loss = 0.06574327498674393
Validation loss = 0.06191644072532654
Validation loss = 0.06382222473621368
Validation loss = 0.061033207923173904
Validation loss = 0.07119964808225632
Validation loss = 0.05988301336765289
Validation loss = 0.06314540654420853
Validation loss = 0.060555219650268555
Validation loss = 0.0604197196662426
Validation loss = 0.06318686157464981
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07835652679204941
Validation loss = 0.06634499132633209
Validation loss = 0.06152726337313652
Validation loss = 0.06290650367736816
Validation loss = 0.058780401945114136
Validation loss = 0.06594505161046982
Validation loss = 0.06273660063743591
Validation loss = 0.06087518855929375
Validation loss = 0.05958956852555275
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08731355518102646
Validation loss = 0.06410956382751465
Validation loss = 0.06717710942029953
Validation loss = 0.07610896974802017
Validation loss = 0.06670507043600082
Validation loss = 0.0631343349814415
Validation loss = 0.058847904205322266
Validation loss = 0.06028975918889046
Validation loss = 0.05805721506476402
Validation loss = 0.0597861111164093
Validation loss = 0.05918971821665764
Validation loss = 0.061170417815446854
Validation loss = 0.05912094935774803
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 144      |
| Iteration     | 2        |
| MaximumReturn | 314      |
| MinimumReturn | -143     |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06135392561554909
Validation loss = 0.053860142827034
Validation loss = 0.053872205317020416
Validation loss = 0.04801259562373161
Validation loss = 0.04871974512934685
Validation loss = 0.048072732985019684
Validation loss = 0.0485236756503582
Validation loss = 0.04798328876495361
Validation loss = 0.046248484402894974
Validation loss = 0.04978173226118088
Validation loss = 0.04703865945339203
Validation loss = 0.046730026602745056
Validation loss = 0.04764945060014725
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06276378035545349
Validation loss = 0.053651198744773865
Validation loss = 0.052063196897506714
Validation loss = 0.05068737268447876
Validation loss = 0.051583193242549896
Validation loss = 0.049767352640628815
Validation loss = 0.0542178675532341
Validation loss = 0.048184677958488464
Validation loss = 0.04969235882163048
Validation loss = 0.04935788735747337
Validation loss = 0.04792541265487671
Validation loss = 0.046385977417230606
Validation loss = 0.047563664615154266
Validation loss = 0.0475836805999279
Validation loss = 0.04834441840648651
Validation loss = 0.04920905828475952
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06154085695743561
Validation loss = 0.056645773351192474
Validation loss = 0.053450293838977814
Validation loss = 0.053835079073905945
Validation loss = 0.05008330196142197
Validation loss = 0.055086906999349594
Validation loss = 0.049940548837184906
Validation loss = 0.051440831273794174
Validation loss = 0.050720248371362686
Validation loss = 0.050500381737947464
Validation loss = 0.047893933951854706
Validation loss = 0.04941153526306152
Validation loss = 0.04855993762612343
Validation loss = 0.047914378345012665
Validation loss = 0.0498916432261467
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.062030091881752014
Validation loss = 0.05329585820436478
Validation loss = 0.05600614845752716
Validation loss = 0.05382633954286575
Validation loss = 0.05446716770529747
Validation loss = 0.05120956152677536
Validation loss = 0.05158650875091553
Validation loss = 0.04963481053709984
Validation loss = 0.048364438116550446
Validation loss = 0.05112551897764206
Validation loss = 0.04802872985601425
Validation loss = 0.049136221408843994
Validation loss = 0.05021114647388458
Validation loss = 0.04752272367477417
Validation loss = 0.04709894210100174
Validation loss = 0.04613387584686279
Validation loss = 0.04852558672428131
Validation loss = 0.04761244356632233
Validation loss = 0.04554561525583267
Validation loss = 0.04642144590616226
Validation loss = 0.04926890507340431
Validation loss = 0.051401808857917786
Validation loss = 0.04401586204767227
Validation loss = 0.04624573886394501
Validation loss = 0.05005112662911415
Validation loss = 0.04675153270363808
Validation loss = 0.0455077588558197
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06265558302402496
Validation loss = 0.05399371311068535
Validation loss = 0.05231441557407379
Validation loss = 0.051786985248327255
Validation loss = 0.04973362386226654
Validation loss = 0.0504007413983345
Validation loss = 0.04901549965143204
Validation loss = 0.049991004168987274
Validation loss = 0.04918777570128441
Validation loss = 0.055102091282606125
Validation loss = 0.053163520991802216
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 394      |
| Iteration     | 3        |
| MaximumReturn | 475      |
| MinimumReturn | 350      |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04488276317715645
Validation loss = 0.03910306841135025
Validation loss = 0.037192679941654205
Validation loss = 0.03738197684288025
Validation loss = 0.03955253213644028
Validation loss = 0.03546106070280075
Validation loss = 0.035996053367853165
Validation loss = 0.03694213181734085
Validation loss = 0.033871110528707504
Validation loss = 0.035325177013874054
Validation loss = 0.03798386827111244
Validation loss = 0.033019058406353
Validation loss = 0.0344650074839592
Validation loss = 0.03301919996738434
Validation loss = 0.03604697808623314
Validation loss = 0.034636177122592926
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04813387617468834
Validation loss = 0.038952697068452835
Validation loss = 0.036698922514915466
Validation loss = 0.036626771092414856
Validation loss = 0.03572046011686325
Validation loss = 0.03541820868849754
Validation loss = 0.03737917169928551
Validation loss = 0.03541596978902817
Validation loss = 0.03642886132001877
Validation loss = 0.03673482686281204
Validation loss = 0.03367151319980621
Validation loss = 0.03396749868988991
Validation loss = 0.034728895872831345
Validation loss = 0.0348006933927536
Validation loss = 0.03420209139585495
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.051974229514598846
Validation loss = 0.04224754124879837
Validation loss = 0.03988468274474144
Validation loss = 0.03786885365843773
Validation loss = 0.038688547909259796
Validation loss = 0.03851250186562538
Validation loss = 0.03629319369792938
Validation loss = 0.03824704512953758
Validation loss = 0.04095505550503731
Validation loss = 0.03567586839199066
Validation loss = 0.0361759178340435
Validation loss = 0.03653941676020622
Validation loss = 0.03995353728532791
Validation loss = 0.036903511732816696
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04741469770669937
Validation loss = 0.038586925715208054
Validation loss = 0.038090791553258896
Validation loss = 0.036695294082164764
Validation loss = 0.03483574092388153
Validation loss = 0.03727181628346443
Validation loss = 0.035659343004226685
Validation loss = 0.03521738946437836
Validation loss = 0.03775360807776451
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.045695267617702484
Validation loss = 0.04138047248125076
Validation loss = 0.03882372006773949
Validation loss = 0.039230555295944214
Validation loss = 0.03645458072423935
Validation loss = 0.036722492426633835
Validation loss = 0.03548239544034004
Validation loss = 0.03603731840848923
Validation loss = 0.036019302904605865
Validation loss = 0.03524518013000488
Validation loss = 0.0349278561770916
Validation loss = 0.03655770421028137
Validation loss = 0.03785014897584915
Validation loss = 0.03404384106397629
Validation loss = 0.036391038447618484
Validation loss = 0.03500639274716377
Validation loss = 0.03450397402048111
Validation loss = 0.03339296206831932
Validation loss = 0.03554190695285797
Validation loss = 0.032821305096149445
Validation loss = 0.03405461832880974
Validation loss = 0.03430582210421562
Validation loss = 0.03284324333071709
Validation loss = 0.0349319763481617
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 541      |
| Iteration     | 4        |
| MaximumReturn | 593      |
| MinimumReturn | 471      |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03526250645518303
Validation loss = 0.029160717502236366
Validation loss = 0.028576882556080818
Validation loss = 0.02811141312122345
Validation loss = 0.028073465451598167
Validation loss = 0.027243494987487793
Validation loss = 0.028444722294807434
Validation loss = 0.029198981821537018
Validation loss = 0.026145359501242638
Validation loss = 0.02591676451265812
Validation loss = 0.02529805153608322
Validation loss = 0.026423098519444466
Validation loss = 0.027145355939865112
Validation loss = 0.026051266118884087
Validation loss = 0.0254718828946352
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03616847097873688
Validation loss = 0.03052498959004879
Validation loss = 0.02775707095861435
Validation loss = 0.027658918872475624
Validation loss = 0.026744067668914795
Validation loss = 0.027300819754600525
Validation loss = 0.028048992156982422
Validation loss = 0.027067705988883972
Validation loss = 0.02503625862300396
Validation loss = 0.026578957214951515
Validation loss = 0.025771135464310646
Validation loss = 0.026350585743784904
Validation loss = 0.027852743864059448
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03518174961209297
Validation loss = 0.030122799798846245
Validation loss = 0.03247745335102081
Validation loss = 0.029980473220348358
Validation loss = 0.02978385239839554
Validation loss = 0.03186621889472008
Validation loss = 0.02870354615151882
Validation loss = 0.029249565675854683
Validation loss = 0.043400973081588745
Validation loss = 0.02850826270878315
Validation loss = 0.0289597287774086
Validation loss = 0.027941742911934853
Validation loss = 0.027037085965275764
Validation loss = 0.027934148907661438
Validation loss = 0.027514846995472908
Validation loss = 0.027303079143166542
Validation loss = 0.02791035734117031
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.036347996443510056
Validation loss = 0.030487729236483574
Validation loss = 0.028888048604130745
Validation loss = 0.02915267087519169
Validation loss = 0.03305414319038391
Validation loss = 0.027746791020035744
Validation loss = 0.03367120400071144
Validation loss = 0.027055790647864342
Validation loss = 0.027299776673316956
Validation loss = 0.02646329253911972
Validation loss = 0.028263233602046967
Validation loss = 0.027904847636818886
Validation loss = 0.026736408472061157
Validation loss = 0.02737773023545742
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0352921336889267
Validation loss = 0.028249097988009453
Validation loss = 0.03072119690477848
Validation loss = 0.02664817124605179
Validation loss = 0.027696633711457253
Validation loss = 0.028609076514840126
Validation loss = 0.02810424007475376
Validation loss = 0.030469151213765144
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 712      |
| Iteration     | 5        |
| MaximumReturn | 780      |
| MinimumReturn | 672      |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03077032044529915
Validation loss = 0.022359570488333702
Validation loss = 0.022876575589179993
Validation loss = 0.023402243852615356
Validation loss = 0.022801363840699196
Validation loss = 0.023011943325400352
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02799239382147789
Validation loss = 0.023743515834212303
Validation loss = 0.023603182286024094
Validation loss = 0.02160617709159851
Validation loss = 0.02249378152191639
Validation loss = 0.022475121542811394
Validation loss = 0.020900441333651543
Validation loss = 0.022608567029237747
Validation loss = 0.023439770564436913
Validation loss = 0.02040686272084713
Validation loss = 0.021598638966679573
Validation loss = 0.023338204249739647
Validation loss = 0.023426519706845284
Validation loss = 0.02183637209236622
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.031821321696043015
Validation loss = 0.023273155093193054
Validation loss = 0.022516924887895584
Validation loss = 0.025325100868940353
Validation loss = 0.023395871743559837
Validation loss = 0.023380862548947334
Validation loss = 0.024187257513403893
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.030092518776655197
Validation loss = 0.022513093426823616
Validation loss = 0.02487119846045971
Validation loss = 0.021814020350575447
Validation loss = 0.02438238449394703
Validation loss = 0.024898771196603775
Validation loss = 0.024576712399721146
Validation loss = 0.022217947989702225
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.028959928080439568
Validation loss = 0.02418864145874977
Validation loss = 0.022180134430527687
Validation loss = 0.025077050551772118
Validation loss = 0.022906241938471794
Validation loss = 0.024609815329313278
Validation loss = 0.021822214126586914
Validation loss = 0.02229696325957775
Validation loss = 0.021114537492394447
Validation loss = 0.021854203194379807
Validation loss = 0.02176053449511528
Validation loss = 0.022818580269813538
Validation loss = 0.021229282021522522
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 718      |
| Iteration     | 6        |
| MaximumReturn | 887      |
| MinimumReturn | 158      |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06279035657644272
Validation loss = 0.03260267898440361
Validation loss = 0.027445469051599503
Validation loss = 0.025326818227767944
Validation loss = 0.024706240743398666
Validation loss = 0.025599531829357147
Validation loss = 0.023238258436322212
Validation loss = 0.023499786853790283
Validation loss = 0.02376316860318184
Validation loss = 0.022930098697543144
Validation loss = 0.022580064833164215
Validation loss = 0.023326298221945763
Validation loss = 0.02529480680823326
Validation loss = 0.022154174745082855
Validation loss = 0.021256180480122566
Validation loss = 0.02254083938896656
Validation loss = 0.023556336760520935
Validation loss = 0.021227223798632622
Validation loss = 0.02119707688689232
Validation loss = 0.024338820949196815
Validation loss = 0.021839942783117294
Validation loss = 0.020880743861198425
Validation loss = 0.02035895362496376
Validation loss = 0.020908381789922714
Validation loss = 0.020657803863286972
Validation loss = 0.021405017003417015
Validation loss = 0.020915787667036057
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.060698822140693665
Validation loss = 0.034293487668037415
Validation loss = 0.027109581977128983
Validation loss = 0.024889104068279266
Validation loss = 0.024549880996346474
Validation loss = 0.023650765419006348
Validation loss = 0.021686851978302002
Validation loss = 0.021585699170827866
Validation loss = 0.021430006250739098
Validation loss = 0.0214049331843853
Validation loss = 0.020816629752516747
Validation loss = 0.020311061292886734
Validation loss = 0.020881202071905136
Validation loss = 0.021106483414769173
Validation loss = 0.02078789286315441
Validation loss = 0.02026815339922905
Validation loss = 0.02019667439162731
Validation loss = 0.02057018131017685
Validation loss = 0.025767646729946136
Validation loss = 0.01885703019797802
Validation loss = 0.019818933680653572
Validation loss = 0.020345665514469147
Validation loss = 0.020020708441734314
Validation loss = 0.019688494503498077
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06400513648986816
Validation loss = 0.036064282059669495
Validation loss = 0.027824558317661285
Validation loss = 0.027225356549024582
Validation loss = 0.025187460705637932
Validation loss = 0.024360336363315582
Validation loss = 0.023204047232866287
Validation loss = 0.022685671225190163
Validation loss = 0.023153427988290787
Validation loss = 0.02304912731051445
Validation loss = 0.021900739520788193
Validation loss = 0.022272802889347076
Validation loss = 0.021714016795158386
Validation loss = 0.020959798246622086
Validation loss = 0.02317124232649803
Validation loss = 0.021085452288389206
Validation loss = 0.021383170038461685
Validation loss = 0.02225659228861332
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07223527133464813
Validation loss = 0.03792013227939606
Validation loss = 0.029386013746261597
Validation loss = 0.025528956204652786
Validation loss = 0.024076666682958603
Validation loss = 0.023671898990869522
Validation loss = 0.023250870406627655
Validation loss = 0.02328798733651638
Validation loss = 0.02226301282644272
Validation loss = 0.021374067291617393
Validation loss = 0.022358063608407974
Validation loss = 0.021195268258452415
Validation loss = 0.021706968545913696
Validation loss = 0.021063417196273804
Validation loss = 0.020935311913490295
Validation loss = 0.021254314109683037
Validation loss = 0.022852685302495956
Validation loss = 0.020735828205943108
Validation loss = 0.020284676924347878
Validation loss = 0.020352348685264587
Validation loss = 0.02101757563650608
Validation loss = 0.01946459338068962
Validation loss = 0.01988718844950199
Validation loss = 0.02151910960674286
Validation loss = 0.01939736306667328
Validation loss = 0.01945367455482483
Validation loss = 0.024730365723371506
Validation loss = 0.019652973860502243
Validation loss = 0.01904793456196785
Validation loss = 0.020379895344376564
Validation loss = 0.018718572333455086
Validation loss = 0.019920222461223602
Validation loss = 0.02106567844748497
Validation loss = 0.020441513508558273
Validation loss = 0.018494989722967148
Validation loss = 0.020335715264081955
Validation loss = 0.01864427700638771
Validation loss = 0.01820370927453041
Validation loss = 0.018952155485749245
Validation loss = 0.018172699958086014
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05824439972639084
Validation loss = 0.03180735185742378
Validation loss = 0.02846607193350792
Validation loss = 0.02457038313150406
Validation loss = 0.02412429451942444
Validation loss = 0.02341143786907196
Validation loss = 0.02345118671655655
Validation loss = 0.021845031529664993
Validation loss = 0.024945959448814392
Validation loss = 0.02715487778186798
Validation loss = 0.02079521119594574
Validation loss = 0.02295994758605957
Validation loss = 0.022263705730438232
Validation loss = 0.02123219333589077
Validation loss = 0.02128683403134346
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 687      |
| Iteration     | 7        |
| MaximumReturn | 843      |
| MinimumReturn | 36.7     |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026406804099678993
Validation loss = 0.020758692175149918
Validation loss = 0.020610138773918152
Validation loss = 0.020820075646042824
Validation loss = 0.02118799462914467
Validation loss = 0.019285881891846657
Validation loss = 0.02091299369931221
Validation loss = 0.019015876576304436
Validation loss = 0.020595552399754524
Validation loss = 0.019133592024445534
Validation loss = 0.018793296068906784
Validation loss = 0.019482918083667755
Validation loss = 0.018651384860277176
Validation loss = 0.01939467526972294
Validation loss = 0.02134024351835251
Validation loss = 0.017650935798883438
Validation loss = 0.018243156373500824
Validation loss = 0.018467813730239868
Validation loss = 0.019692130386829376
Validation loss = 0.018258189782500267
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.027680519968271255
Validation loss = 0.019212894141674042
Validation loss = 0.018936898559331894
Validation loss = 0.020286526530981064
Validation loss = 0.018363822251558304
Validation loss = 0.018265359103679657
Validation loss = 0.018575184047222137
Validation loss = 0.01815096288919449
Validation loss = 0.021890856325626373
Validation loss = 0.01774519681930542
Validation loss = 0.01733502373099327
Validation loss = 0.018561601638793945
Validation loss = 0.01872558891773224
Validation loss = 0.018610117956995964
Validation loss = 0.018059952184557915
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.028757359832525253
Validation loss = 0.020885994657874107
Validation loss = 0.019467098638415337
Validation loss = 0.019830189645290375
Validation loss = 0.019588204100728035
Validation loss = 0.02047126740217209
Validation loss = 0.01952042430639267
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.029425688087940216
Validation loss = 0.01859591342508793
Validation loss = 0.017654934898018837
Validation loss = 0.017071319743990898
Validation loss = 0.016605284065008163
Validation loss = 0.017950961366295815
Validation loss = 0.017544712871313095
Validation loss = 0.016735337674617767
Validation loss = 0.01746388152241707
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.028952227905392647
Validation loss = 0.021593719720840454
Validation loss = 0.02017902210354805
Validation loss = 0.02032448537647724
Validation loss = 0.01911208964884281
Validation loss = 0.019758576527237892
Validation loss = 0.01962513104081154
Validation loss = 0.02034175582230091
Validation loss = 0.018459662795066833
Validation loss = 0.018597934395074844
Validation loss = 0.019503604620695114
Validation loss = 0.018867727369070053
Validation loss = 0.020235789939761162
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 686      |
| Iteration     | 8        |
| MaximumReturn | 965      |
| MinimumReturn | -425     |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026718828827142715
Validation loss = 0.018129874020814896
Validation loss = 0.01743088848888874
Validation loss = 0.0186158400028944
Validation loss = 0.01840706542134285
Validation loss = 0.017659753561019897
Validation loss = 0.017682624980807304
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.028104916214942932
Validation loss = 0.018169023096561432
Validation loss = 0.01710420846939087
Validation loss = 0.017960047349333763
Validation loss = 0.016656111925840378
Validation loss = 0.018731359392404556
Validation loss = 0.018035151064395905
Validation loss = 0.016492966562509537
Validation loss = 0.016056669875979424
Validation loss = 0.0170732531696558
Validation loss = 0.016953617334365845
Validation loss = 0.017407137900590897
Validation loss = 0.016661453992128372
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.028399575501680374
Validation loss = 0.020141832530498505
Validation loss = 0.019453171640634537
Validation loss = 0.020126167684793472
Validation loss = 0.01875654049217701
Validation loss = 0.01785421371459961
Validation loss = 0.01899607852101326
Validation loss = 0.019570443779230118
Validation loss = 0.018585458397865295
Validation loss = 0.017998110502958298
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023363146930933
Validation loss = 0.01859932206571102
Validation loss = 0.016976837068796158
Validation loss = 0.017001496627926826
Validation loss = 0.017033588141202927
Validation loss = 0.016618216410279274
Validation loss = 0.015494072809815407
Validation loss = 0.018395498394966125
Validation loss = 0.01555747352540493
Validation loss = 0.015885479748249054
Validation loss = 0.016687780618667603
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026579836383461952
Validation loss = 0.02009831927716732
Validation loss = 0.018345143646001816
Validation loss = 0.019126567989587784
Validation loss = 0.01856503263115883
Validation loss = 0.018266258761286736
Validation loss = 0.0177915096282959
Validation loss = 0.018014326691627502
Validation loss = 0.01798478700220585
Validation loss = 0.018155615776777267
Validation loss = 0.018363792449235916
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 914      |
| Iteration     | 9        |
| MaximumReturn | 955      |
| MinimumReturn | 887      |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019029704853892326
Validation loss = 0.017184503376483917
Validation loss = 0.01841413415968418
Validation loss = 0.01569700799882412
Validation loss = 0.015864141285419464
Validation loss = 0.016506778076291084
Validation loss = 0.016323478892445564
Validation loss = 0.016983430832624435
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018696073442697525
Validation loss = 0.016219986602663994
Validation loss = 0.015681643038988113
Validation loss = 0.015182268805801868
Validation loss = 0.019237665459513664
Validation loss = 0.014970948919653893
Validation loss = 0.01444404199719429
Validation loss = 0.015089651569724083
Validation loss = 0.01606491394340992
Validation loss = 0.014434587210416794
Validation loss = 0.015841970220208168
Validation loss = 0.014688655734062195
Validation loss = 0.015541484579443932
Validation loss = 0.015984149649739265
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021965278312563896
Validation loss = 0.017097393050789833
Validation loss = 0.017073845490813255
Validation loss = 0.016668401658535004
Validation loss = 0.016267525032162666
Validation loss = 0.016470782458782196
Validation loss = 0.016775794327259064
Validation loss = 0.016720185056328773
Validation loss = 0.018462272360920906
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01775546558201313
Validation loss = 0.015575003810226917
Validation loss = 0.015609007328748703
Validation loss = 0.015348697081208229
Validation loss = 0.014713261276483536
Validation loss = 0.014195635914802551
Validation loss = 0.014912446960806847
Validation loss = 0.014749255031347275
Validation loss = 0.016003983095288277
Validation loss = 0.016447128728032112
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02016141079366207
Validation loss = 0.016691675409674644
Validation loss = 0.016360728070139885
Validation loss = 0.01706082373857498
Validation loss = 0.01644108071923256
Validation loss = 0.01530527789145708
Validation loss = 0.015666544437408447
Validation loss = 0.017272917553782463
Validation loss = 0.01584714651107788
Validation loss = 0.015665702521800995
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 747      |
| Iteration     | 10       |
| MaximumReturn | 928      |
| MinimumReturn | 148      |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018557408824563026
Validation loss = 0.017088046297430992
Validation loss = 0.01666988432407379
Validation loss = 0.01760897785425186
Validation loss = 0.016724320128560066
Validation loss = 0.017162786796689034
Validation loss = 0.01762273721396923
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01728568971157074
Validation loss = 0.01538509875535965
Validation loss = 0.016630245372653008
Validation loss = 0.016129763796925545
Validation loss = 0.015346922911703587
Validation loss = 0.01542737241834402
Validation loss = 0.015607773326337337
Validation loss = 0.01573377288877964
Validation loss = 0.015407037921249866
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01888827420771122
Validation loss = 0.017075149342417717
Validation loss = 0.017153887078166008
Validation loss = 0.017549792304635048
Validation loss = 0.016896966844797134
Validation loss = 0.016631195321679115
Validation loss = 0.016856582835316658
Validation loss = 0.01829925924539566
Validation loss = 0.016653014346957207
Validation loss = 0.01675696112215519
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01716199703514576
Validation loss = 0.015776069834828377
Validation loss = 0.01496325433254242
Validation loss = 0.016478363424539566
Validation loss = 0.01593482494354248
Validation loss = 0.0160786435008049
Validation loss = 0.015326340682804585
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019602416083216667
Validation loss = 0.018719293177127838
Validation loss = 0.016680018976330757
Validation loss = 0.017099654302001
Validation loss = 0.016351746395230293
Validation loss = 0.019025282934308052
Validation loss = 0.016301609575748444
Validation loss = 0.018103698268532753
Validation loss = 0.017045078799128532
Validation loss = 0.016069961711764336
Validation loss = 0.015741104260087013
Validation loss = 0.01744350977241993
Validation loss = 0.01687362976372242
Validation loss = 0.015611332841217518
Validation loss = 0.016429251059889793
Validation loss = 0.01636926829814911
Validation loss = 0.01538296788930893
Validation loss = 0.016322584822773933
Validation loss = 0.01586066372692585
Validation loss = 0.01675790548324585
Validation loss = 0.015826888382434845
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 970      |
| Iteration     | 11       |
| MaximumReturn | 1.02e+03 |
| MinimumReturn | 902      |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018560240045189857
Validation loss = 0.01812170073390007
Validation loss = 0.015912648290395737
Validation loss = 0.017835788428783417
Validation loss = 0.015601813793182373
Validation loss = 0.017318041995167732
Validation loss = 0.017222167924046516
Validation loss = 0.016513550654053688
Validation loss = 0.01522259321063757
Validation loss = 0.014931202866137028
Validation loss = 0.015553532168269157
Validation loss = 0.015036324970424175
Validation loss = 0.015270505100488663
Validation loss = 0.015673745423555374
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01749914512038231
Validation loss = 0.015022440813481808
Validation loss = 0.015501677989959717
Validation loss = 0.015223151072859764
Validation loss = 0.01591496542096138
Validation loss = 0.015616984106600285
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019907359033823013
Validation loss = 0.01632828824222088
Validation loss = 0.016633983701467514
Validation loss = 0.015334153547883034
Validation loss = 0.017657814547419548
Validation loss = 0.01513539720326662
Validation loss = 0.016101403161883354
Validation loss = 0.015191031619906425
Validation loss = 0.015306573361158371
Validation loss = 0.014902714639902115
Validation loss = 0.015935996547341347
Validation loss = 0.015008036978542805
Validation loss = 0.015354203060269356
Validation loss = 0.015606229193508625
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016058694571256638
Validation loss = 0.016095243394374847
Validation loss = 0.01454323623329401
Validation loss = 0.014716248959302902
Validation loss = 0.015233766287565231
Validation loss = 0.01526914443820715
Validation loss = 0.01467632595449686
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017756912857294083
Validation loss = 0.014948801137506962
Validation loss = 0.014874208718538284
Validation loss = 0.015040991827845573
Validation loss = 0.016099246218800545
Validation loss = 0.01687137596309185
Validation loss = 0.01493917964398861
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 928      |
| Iteration     | 12       |
| MaximumReturn | 977      |
| MinimumReturn | 897      |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01579047180712223
Validation loss = 0.014247631654143333
Validation loss = 0.014652102254331112
Validation loss = 0.014357072301208973
Validation loss = 0.015436286106705666
Validation loss = 0.014361993409693241
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016715068370103836
Validation loss = 0.014948509633541107
Validation loss = 0.014132529497146606
Validation loss = 0.014590360224246979
Validation loss = 0.014465099200606346
Validation loss = 0.014791419729590416
Validation loss = 0.01369011402130127
Validation loss = 0.014632019214332104
Validation loss = 0.01415534783154726
Validation loss = 0.014031581580638885
Validation loss = 0.014399213716387749
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01558717805892229
Validation loss = 0.01833225227892399
Validation loss = 0.01527354959398508
Validation loss = 0.014811788685619831
Validation loss = 0.01570216938853264
Validation loss = 0.015500742010772228
Validation loss = 0.015909822657704353
Validation loss = 0.013746282085776329
Validation loss = 0.014428919181227684
Validation loss = 0.01441545132547617
Validation loss = 0.01473582349717617
Validation loss = 0.013500271365046501
Validation loss = 0.014477537013590336
Validation loss = 0.015500339679419994
Validation loss = 0.013672294095158577
Validation loss = 0.013538232073187828
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01733493246138096
Validation loss = 0.01379607804119587
Validation loss = 0.014314516447484493
Validation loss = 0.013870514929294586
Validation loss = 0.013981951400637627
Validation loss = 0.014335432089865208
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016725772991776466
Validation loss = 0.015131938271224499
Validation loss = 0.01386492419987917
Validation loss = 0.01433639694005251
Validation loss = 0.015140680596232414
Validation loss = 0.014090962707996368
Validation loss = 0.013844171538949013
Validation loss = 0.014428436756134033
Validation loss = 0.013649990782141685
Validation loss = 0.014930563978850842
Validation loss = 0.01354118250310421
Validation loss = 0.015263060107827187
Validation loss = 0.014136514626443386
Validation loss = 0.013515882194042206
Validation loss = 0.01442236639559269
Validation loss = 0.015025883913040161
Validation loss = 0.0138463843613863
Validation loss = 0.013581587933003902
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 907      |
| Iteration     | 13       |
| MaximumReturn | 986      |
| MinimumReturn | 832      |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015130016952753067
Validation loss = 0.014427261427044868
Validation loss = 0.014243368059396744
Validation loss = 0.01327545940876007
Validation loss = 0.014320467598736286
Validation loss = 0.013651788234710693
Validation loss = 0.013133063912391663
Validation loss = 0.01458379253745079
Validation loss = 0.012664242647588253
Validation loss = 0.013907993212342262
Validation loss = 0.013309309259057045
Validation loss = 0.014221609570086002
Validation loss = 0.012999281287193298
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0142896119505167
Validation loss = 0.013627924956381321
Validation loss = 0.013207793235778809
Validation loss = 0.014970356598496437
Validation loss = 0.013154994696378708
Validation loss = 0.014159648679196835
Validation loss = 0.014061765745282173
Validation loss = 0.01294825691729784
Validation loss = 0.013468298129737377
Validation loss = 0.01308791246265173
Validation loss = 0.01446350384503603
Validation loss = 0.014009012840688229
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016226045787334442
Validation loss = 0.013342680409550667
Validation loss = 0.013999534770846367
Validation loss = 0.013377859257161617
Validation loss = 0.01369589101523161
Validation loss = 0.013831720687448978
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015794625505805016
Validation loss = 0.0143097760155797
Validation loss = 0.013663025572896004
Validation loss = 0.015321018174290657
Validation loss = 0.014612669125199318
Validation loss = 0.013263239525258541
Validation loss = 0.01464676484465599
Validation loss = 0.013934598304331303
Validation loss = 0.013970891013741493
Validation loss = 0.01317864004522562
Validation loss = 0.013187694363296032
Validation loss = 0.013815709389746189
Validation loss = 0.013693821616470814
Validation loss = 0.012717862613499165
Validation loss = 0.013171595521271229
Validation loss = 0.01383646484464407
Validation loss = 0.012606058269739151
Validation loss = 0.012418776750564575
Validation loss = 0.012887290678918362
Validation loss = 0.01442759856581688
Validation loss = 0.012289308942854404
Validation loss = 0.013194816187024117
Validation loss = 0.013531418517231941
Validation loss = 0.01294186245650053
Validation loss = 0.013026652857661247
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015207636170089245
Validation loss = 0.01374500710517168
Validation loss = 0.013864113949239254
Validation loss = 0.014526765793561935
Validation loss = 0.013230533339083195
Validation loss = 0.01416406873613596
Validation loss = 0.012898922897875309
Validation loss = 0.01273307017982006
Validation loss = 0.014726611785590649
Validation loss = 0.01394685823470354
Validation loss = 0.015020948834717274
Validation loss = 0.013267374597489834
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.01e+03 |
| Iteration     | 14       |
| MaximumReturn | 1.08e+03 |
| MinimumReturn | 886      |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014171913266181946
Validation loss = 0.012312276288866997
Validation loss = 0.0129847452044487
Validation loss = 0.012342181988060474
Validation loss = 0.013753293082118034
Validation loss = 0.012643635272979736
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013938281685113907
Validation loss = 0.012851415202021599
Validation loss = 0.01412254385650158
Validation loss = 0.012685379013419151
Validation loss = 0.013014002703130245
Validation loss = 0.013474443927407265
Validation loss = 0.012275522574782372
Validation loss = 0.012411501258611679
Validation loss = 0.013084519654512405
Validation loss = 0.01266470830887556
Validation loss = 0.012943028472363949
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013933892361819744
Validation loss = 0.013448827899992466
Validation loss = 0.013280730694532394
Validation loss = 0.01440357230603695
Validation loss = 0.012915823608636856
Validation loss = 0.013362610712647438
Validation loss = 0.013481071218848228
Validation loss = 0.013041880913078785
Validation loss = 0.012640127912163734
Validation loss = 0.012486325576901436
Validation loss = 0.012992164120078087
Validation loss = 0.01274068746715784
Validation loss = 0.012447066605091095
Validation loss = 0.012486192397773266
Validation loss = 0.013379881158471107
Validation loss = 0.012076537124812603
Validation loss = 0.01233147643506527
Validation loss = 0.012566715478897095
Validation loss = 0.014293838292360306
Validation loss = 0.01289596688002348
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015538370236754417
Validation loss = 0.012489395216107368
Validation loss = 0.012876138091087341
Validation loss = 0.011989947408437729
Validation loss = 0.01235449779778719
Validation loss = 0.012129141017794609
Validation loss = 0.011585468426346779
Validation loss = 0.011608308181166649
Validation loss = 0.012307986617088318
Validation loss = 0.012213206849992275
Validation loss = 0.011859694495797157
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01432722806930542
Validation loss = 0.012548868544399738
Validation loss = 0.012923491187393665
Validation loss = 0.013812229037284851
Validation loss = 0.012094303965568542
Validation loss = 0.012301366776227951
Validation loss = 0.012524783611297607
Validation loss = 0.012511812150478363
Validation loss = 0.013129557482898235
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 978      |
| Iteration     | 15       |
| MaximumReturn | 1.05e+03 |
| MinimumReturn | 935      |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013611535541713238
Validation loss = 0.012719682417809963
Validation loss = 0.012083050794899464
Validation loss = 0.01206571888178587
Validation loss = 0.011973082087934017
Validation loss = 0.013877109624445438
Validation loss = 0.013068823143839836
Validation loss = 0.0122972521930933
Validation loss = 0.011862673796713352
Validation loss = 0.01260407455265522
Validation loss = 0.01339082419872284
Validation loss = 0.011605331674218178
Validation loss = 0.01221273373812437
Validation loss = 0.01258758082985878
Validation loss = 0.012062516063451767
Validation loss = 0.011779211461544037
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013974305242300034
Validation loss = 0.012138028629124165
Validation loss = 0.012818817980587482
Validation loss = 0.01197785884141922
Validation loss = 0.012241160497069359
Validation loss = 0.011950029991567135
Validation loss = 0.01221508253365755
Validation loss = 0.0118181724101305
Validation loss = 0.01189983356744051
Validation loss = 0.013054406270384789
Validation loss = 0.01241858396679163
Validation loss = 0.012070076540112495
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014250211417675018
Validation loss = 0.012036489322781563
Validation loss = 0.012678001075983047
Validation loss = 0.011549827642738819
Validation loss = 0.011938410811126232
Validation loss = 0.011980945244431496
Validation loss = 0.011488592252135277
Validation loss = 0.011808894574642181
Validation loss = 0.011717517860233784
Validation loss = 0.013287139125168324
Validation loss = 0.011776769533753395
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013145225122570992
Validation loss = 0.012445764616131783
Validation loss = 0.011838180013000965
Validation loss = 0.011313768103718758
Validation loss = 0.011613545939326286
Validation loss = 0.012223443016409874
Validation loss = 0.01218324713408947
Validation loss = 0.011849765665829182
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014385120011866093
Validation loss = 0.012514777481555939
Validation loss = 0.012357572093605995
Validation loss = 0.011967085301876068
Validation loss = 0.013188693672418594
Validation loss = 0.01224172580987215
Validation loss = 0.011975770816206932
Validation loss = 0.012063278816640377
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 925      |
| Iteration     | 16       |
| MaximumReturn | 1.08e+03 |
| MinimumReturn | 577      |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014348707161843777
Validation loss = 0.011744766496121883
Validation loss = 0.011253973469138145
Validation loss = 0.011553308926522732
Validation loss = 0.01224985346198082
Validation loss = 0.011550132185220718
Validation loss = 0.011350340209901333
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013789771124720573
Validation loss = 0.011406872421503067
Validation loss = 0.01237208116799593
Validation loss = 0.012274438515305519
Validation loss = 0.011549331247806549
Validation loss = 0.011313823983073235
Validation loss = 0.011379417963325977
Validation loss = 0.011250605806708336
Validation loss = 0.010963495820760727
Validation loss = 0.012575800530612469
Validation loss = 0.011825162917375565
Validation loss = 0.011331846006214619
Validation loss = 0.011291248723864555
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012695802375674248
Validation loss = 0.011483917012810707
Validation loss = 0.011783255264163017
Validation loss = 0.011435113847255707
Validation loss = 0.012191353365778923
Validation loss = 0.011589335277676582
Validation loss = 0.011659231036901474
Validation loss = 0.011317795142531395
Validation loss = 0.012409457936882973
Validation loss = 0.011143030598759651
Validation loss = 0.011389482766389847
Validation loss = 0.013499462977051735
Validation loss = 0.011167759075760841
Validation loss = 0.01098704431205988
Validation loss = 0.010542762465775013
Validation loss = 0.011771734803915024
Validation loss = 0.011172052472829819
Validation loss = 0.01090813148766756
Validation loss = 0.011580683290958405
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013439948670566082
Validation loss = 0.011299926787614822
Validation loss = 0.01161951757967472
Validation loss = 0.01172610279172659
Validation loss = 0.010945392772555351
Validation loss = 0.01081333588808775
Validation loss = 0.011127148754894733
Validation loss = 0.01122024655342102
Validation loss = 0.010612322948873043
Validation loss = 0.011162818409502506
Validation loss = 0.010782823897898197
Validation loss = 0.01079602912068367
Validation loss = 0.011942457407712936
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013490647077560425
Validation loss = 0.011305657215416431
Validation loss = 0.011536267586052418
Validation loss = 0.011666040867567062
Validation loss = 0.01148303598165512
Validation loss = 0.011165378615260124
Validation loss = 0.012516815215349197
Validation loss = 0.011629100888967514
Validation loss = 0.011116750538349152
Validation loss = 0.012190860696136951
Validation loss = 0.011978114023804665
Validation loss = 0.011497502215206623
Validation loss = 0.011651791632175446
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 985      |
| Iteration     | 17       |
| MaximumReturn | 1.06e+03 |
| MinimumReturn | 924      |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012517658062279224
Validation loss = 0.011279175989329815
Validation loss = 0.011312048882246017
Validation loss = 0.011216887272894382
Validation loss = 0.011198054999113083
Validation loss = 0.01200877409428358
Validation loss = 0.011345450766384602
Validation loss = 0.011974276974797249
Validation loss = 0.01152881421148777
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012463245540857315
Validation loss = 0.011626936495304108
Validation loss = 0.01122550293803215
Validation loss = 0.01114576030522585
Validation loss = 0.011270947754383087
Validation loss = 0.011310736648738384
Validation loss = 0.010983796790242195
Validation loss = 0.012336090207099915
Validation loss = 0.0109947444871068
Validation loss = 0.010466651991009712
Validation loss = 0.010646280832588673
Validation loss = 0.010902655310928822
Validation loss = 0.010778495110571384
Validation loss = 0.010325280949473381
Validation loss = 0.010915519669651985
Validation loss = 0.011298911646008492
Validation loss = 0.011679663322865963
Validation loss = 0.010338671505451202
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012164493091404438
Validation loss = 0.010524898767471313
Validation loss = 0.011281081475317478
Validation loss = 0.010587213560938835
Validation loss = 0.011190113611519337
Validation loss = 0.010719708167016506
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01165852416306734
Validation loss = 0.01077855285257101
Validation loss = 0.010087816044688225
Validation loss = 0.011186590418219566
Validation loss = 0.010828506201505661
Validation loss = 0.010438559576869011
Validation loss = 0.0102941719815135
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012510198168456554
Validation loss = 0.01136796548962593
Validation loss = 0.010858128778636456
Validation loss = 0.010787679813802242
Validation loss = 0.01152318436652422
Validation loss = 0.010705741122364998
Validation loss = 0.011479469016194344
Validation loss = 0.010782253928482533
Validation loss = 0.011565285734832287
Validation loss = 0.01039259135723114
Validation loss = 0.011603154242038727
Validation loss = 0.01184019073843956
Validation loss = 0.011284229345619678
Validation loss = 0.011056539602577686
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 824      |
| Iteration     | 18       |
| MaximumReturn | 1.07e+03 |
| MinimumReturn | -269     |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014256213791668415
Validation loss = 0.013316035270690918
Validation loss = 0.01281377486884594
Validation loss = 0.013538992032408714
Validation loss = 0.013094792142510414
Validation loss = 0.012530960142612457
Validation loss = 0.012567033991217613
Validation loss = 0.012936389073729515
Validation loss = 0.012062793597579002
Validation loss = 0.013147914782166481
Validation loss = 0.012713029980659485
Validation loss = 0.013299785554409027
Validation loss = 0.012332871556282043
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012853488326072693
Validation loss = 0.012346806935966015
Validation loss = 0.01245852280408144
Validation loss = 0.011719482019543648
Validation loss = 0.013178154826164246
Validation loss = 0.012842638418078423
Validation loss = 0.011902767233550549
Validation loss = 0.01169392466545105
Validation loss = 0.012095397338271141
Validation loss = 0.012211731635034084
Validation loss = 0.012397052720189095
Validation loss = 0.012823089957237244
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014159751124680042
Validation loss = 0.011851081624627113
Validation loss = 0.012684525921940804
Validation loss = 0.012611964717507362
Validation loss = 0.012076422572135925
Validation loss = 0.012497511692345142
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01225301343947649
Validation loss = 0.012679475359618664
Validation loss = 0.012629285454750061
Validation loss = 0.011819189414381981
Validation loss = 0.012638220563530922
Validation loss = 0.011799737811088562
Validation loss = 0.012668070383369923
Validation loss = 0.012243839912116528
Validation loss = 0.012284663505852222
Validation loss = 0.012518894858658314
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012325877323746681
Validation loss = 0.012506434693932533
Validation loss = 0.01187884621322155
Validation loss = 0.012476545758545399
Validation loss = 0.012421705760061741
Validation loss = 0.012276857160031796
Validation loss = 0.013025425374507904
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.05e+03 |
| Iteration     | 19       |
| MaximumReturn | 1.1e+03  |
| MinimumReturn | 976      |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014657879248261452
Validation loss = 0.011566395871341228
Validation loss = 0.012233535759150982
Validation loss = 0.012185066938400269
Validation loss = 0.012430702336132526
Validation loss = 0.011432088911533356
Validation loss = 0.011945031583309174
Validation loss = 0.011531364172697067
Validation loss = 0.0125668840482831
Validation loss = 0.011694060638546944
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013269882649183273
Validation loss = 0.011341183446347713
Validation loss = 0.011973665095865726
Validation loss = 0.011460201814770699
Validation loss = 0.012064392678439617
Validation loss = 0.011492203921079636
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012781964614987373
Validation loss = 0.011685410514473915
Validation loss = 0.01169512327760458
Validation loss = 0.012151412665843964
Validation loss = 0.011696115136146545
Validation loss = 0.012152260169386864
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01246481854468584
Validation loss = 0.011981867253780365
Validation loss = 0.011506281793117523
Validation loss = 0.011454985477030277
Validation loss = 0.011209269054234028
Validation loss = 0.011611930094659328
Validation loss = 0.010792623274028301
Validation loss = 0.011108030565083027
Validation loss = 0.011042450554668903
Validation loss = 0.010855409316718578
Validation loss = 0.011003205552697182
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014181077480316162
Validation loss = 0.0113640446215868
Validation loss = 0.012106560170650482
Validation loss = 0.011898654513061047
Validation loss = 0.011960631236433983
Validation loss = 0.011624403297901154
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.06e+03 |
| Iteration     | 20       |
| MaximumReturn | 1.16e+03 |
| MinimumReturn | 946      |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012258998118340969
Validation loss = 0.011365747079253197
Validation loss = 0.011543933302164078
Validation loss = 0.012668376788496971
Validation loss = 0.011305125430226326
Validation loss = 0.011064761318266392
Validation loss = 0.011067946441471577
Validation loss = 0.01105902437120676
Validation loss = 0.01141832210123539
Validation loss = 0.012304341420531273
Validation loss = 0.011378004215657711
Validation loss = 0.01101340726017952
Validation loss = 0.011209382675588131
Validation loss = 0.01098775677382946
Validation loss = 0.011223747394979
Validation loss = 0.01136353425681591
Validation loss = 0.011663887649774551
Validation loss = 0.01110063400119543
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012706593610346317
Validation loss = 0.011713754385709763
Validation loss = 0.011426604352891445
Validation loss = 0.011551886796951294
Validation loss = 0.010827401652932167
Validation loss = 0.011814957484602928
Validation loss = 0.010802214033901691
Validation loss = 0.010866258293390274
Validation loss = 0.01127457246184349
Validation loss = 0.011462807655334473
Validation loss = 0.011943453922867775
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011719074100255966
Validation loss = 0.01212275866419077
Validation loss = 0.011597498320043087
Validation loss = 0.011185172945261002
Validation loss = 0.01184624433517456
Validation loss = 0.010939056053757668
Validation loss = 0.011645559221506119
Validation loss = 0.011417478322982788
Validation loss = 0.011846336536109447
Validation loss = 0.011759129352867603
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011822176165878773
Validation loss = 0.011273222044110298
Validation loss = 0.01085087563842535
Validation loss = 0.011577337048947811
Validation loss = 0.011221067979931831
Validation loss = 0.011286484077572823
Validation loss = 0.011055312119424343
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012351024895906448
Validation loss = 0.01175418496131897
Validation loss = 0.010855787433683872
Validation loss = 0.011238462291657925
Validation loss = 0.010867249220609665
Validation loss = 0.011640415526926517
Validation loss = 0.012008532881736755
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.04e+03 |
| Iteration     | 21       |
| MaximumReturn | 1.12e+03 |
| MinimumReturn | 914      |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011479928158223629
Validation loss = 0.011799613013863564
Validation loss = 0.010691803880035877
Validation loss = 0.010381079278886318
Validation loss = 0.010870748199522495
Validation loss = 0.011229963041841984
Validation loss = 0.010683032684028149
Validation loss = 0.010822070762515068
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01095342542976141
Validation loss = 0.010489222593605518
Validation loss = 0.010643287561833858
Validation loss = 0.010559314861893654
Validation loss = 0.010720617137849331
Validation loss = 0.011546595022082329
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012541644275188446
Validation loss = 0.010999474674463272
Validation loss = 0.010759654454886913
Validation loss = 0.010673721320927143
Validation loss = 0.012637532316148281
Validation loss = 0.0116593511775136
Validation loss = 0.01113243866711855
Validation loss = 0.010054541751742363
Validation loss = 0.010650995187461376
Validation loss = 0.010898149572312832
Validation loss = 0.010477510280907154
Validation loss = 0.011131702922284603
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010729670524597168
Validation loss = 0.011128869839012623
Validation loss = 0.010732047259807587
Validation loss = 0.01061988528817892
Validation loss = 0.010497307404875755
Validation loss = 0.01131670642644167
Validation loss = 0.010486619547009468
Validation loss = 0.010524621233344078
Validation loss = 0.0107609573751688
Validation loss = 0.011028967797756195
Validation loss = 0.010285438969731331
Validation loss = 0.010754602961242199
Validation loss = 0.010658572427928448
Validation loss = 0.010490112006664276
Validation loss = 0.010128364898264408
Validation loss = 0.010663682594895363
Validation loss = 0.010155100375413895
Validation loss = 0.010617934167385101
Validation loss = 0.009876349940896034
Validation loss = 0.009902863763272762
Validation loss = 0.010245388373732567
Validation loss = 0.009847779758274555
Validation loss = 0.010781570337712765
Validation loss = 0.010072007775306702
Validation loss = 0.009847010485827923
Validation loss = 0.010532715357840061
Validation loss = 0.00973571464419365
Validation loss = 0.009492240846157074
Validation loss = 0.01061977818608284
Validation loss = 0.009679783135652542
Validation loss = 0.00987507589161396
Validation loss = 0.010225765407085419
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01164747029542923
Validation loss = 0.011099444702267647
Validation loss = 0.011454337276518345
Validation loss = 0.011369284242391586
Validation loss = 0.010946368798613548
Validation loss = 0.011218760162591934
Validation loss = 0.010977759025990963
Validation loss = 0.011439178138971329
Validation loss = 0.010150241665542126
Validation loss = 0.010786310769617558
Validation loss = 0.011955817230045795
Validation loss = 0.010700425133109093
Validation loss = 0.011248314753174782
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 915      |
| Iteration     | 22       |
| MaximumReturn | 1.16e+03 |
| MinimumReturn | 392      |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012291081249713898
Validation loss = 0.011394406668841839
Validation loss = 0.011174957267940044
Validation loss = 0.011283577419817448
Validation loss = 0.011126709170639515
Validation loss = 0.011573371477425098
Validation loss = 0.011233486235141754
Validation loss = 0.011889186687767506
Validation loss = 0.010744664818048477
Validation loss = 0.011382673867046833
Validation loss = 0.01114016491919756
Validation loss = 0.011569485999643803
Validation loss = 0.011372119188308716
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012303936295211315
Validation loss = 0.011600539088249207
Validation loss = 0.011362667195498943
Validation loss = 0.01135089248418808
Validation loss = 0.01159772276878357
Validation loss = 0.011085285805165768
Validation loss = 0.011048992164433002
Validation loss = 0.011140729300677776
Validation loss = 0.010961598716676235
Validation loss = 0.01164692360907793
Validation loss = 0.011592860333621502
Validation loss = 0.01101293321698904
Validation loss = 0.011374074034392834
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011470038443803787
Validation loss = 0.01106011588126421
Validation loss = 0.011542685329914093
Validation loss = 0.01164967566728592
Validation loss = 0.011037672869861126
Validation loss = 0.01130758598446846
Validation loss = 0.011270415969192982
Validation loss = 0.011474731378257275
Validation loss = 0.011386287398636341
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011261851526796818
Validation loss = 0.011359593831002712
Validation loss = 0.010947614908218384
Validation loss = 0.010413224808871746
Validation loss = 0.010673336684703827
Validation loss = 0.01091520581394434
Validation loss = 0.010983292944729328
Validation loss = 0.01070259977132082
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011407990008592606
Validation loss = 0.011281929910182953
Validation loss = 0.011834842152893543
Validation loss = 0.01132412999868393
Validation loss = 0.01111640501767397
Validation loss = 0.010828104801476002
Validation loss = 0.011627511121332645
Validation loss = 0.011453758925199509
Validation loss = 0.011404874734580517
Validation loss = 0.010567188262939453
Validation loss = 0.011209799908101559
Validation loss = 0.011166051030158997
Validation loss = 0.0111984359100461
Validation loss = 0.010754470713436604
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.01e+03 |
| Iteration     | 23       |
| MaximumReturn | 1.09e+03 |
| MinimumReturn | 906      |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011267989873886108
Validation loss = 0.010551431216299534
Validation loss = 0.010866093449294567
Validation loss = 0.011432042345404625
Validation loss = 0.010849516838788986
Validation loss = 0.010736491531133652
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012005303055047989
Validation loss = 0.011162957176566124
Validation loss = 0.011267009191215038
Validation loss = 0.010923421010375023
Validation loss = 0.011128322221338749
Validation loss = 0.011268939822912216
Validation loss = 0.010996234603226185
Validation loss = 0.010679737664759159
Validation loss = 0.010644947178661823
Validation loss = 0.010991067625582218
Validation loss = 0.010440487414598465
Validation loss = 0.010608494281768799
Validation loss = 0.011168953962624073
Validation loss = 0.010343501344323158
Validation loss = 0.010973190888762474
Validation loss = 0.010833234526216984
Validation loss = 0.010865245014429092
Validation loss = 0.010272516869008541
Validation loss = 0.010786979459226131
Validation loss = 0.011505947448313236
Validation loss = 0.010747352614998817
Validation loss = 0.010643189772963524
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011803455650806427
Validation loss = 0.010956463403999805
Validation loss = 0.010745075531303883
Validation loss = 0.010866942815482616
Validation loss = 0.010758326388895512
Validation loss = 0.010799795389175415
Validation loss = 0.011275639757514
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011463930830359459
Validation loss = 0.010592877864837646
Validation loss = 0.010631224140524864
Validation loss = 0.010419095866382122
Validation loss = 0.0101392837241292
Validation loss = 0.010728410445153713
Validation loss = 0.009999376721680164
Validation loss = 0.01072443462908268
Validation loss = 0.01126219891011715
Validation loss = 0.010075273923575878
Validation loss = 0.009861381724476814
Validation loss = 0.010357948951423168
Validation loss = 0.01024033036082983
Validation loss = 0.010068234987556934
Validation loss = 0.010034502483904362
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011280430480837822
Validation loss = 0.011149103753268719
Validation loss = 0.011227103881537914
Validation loss = 0.01210301648825407
Validation loss = 0.010613871738314629
Validation loss = 0.0110119404271245
Validation loss = 0.010637578554451466
Validation loss = 0.010955968871712685
Validation loss = 0.010491009801626205
Validation loss = 0.010556217283010483
Validation loss = 0.011215627193450928
Validation loss = 0.010287982411682606
Validation loss = 0.010435383766889572
Validation loss = 0.010780801996588707
Validation loss = 0.010792933404445648
Validation loss = 0.010602062568068504
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.05e+03 |
| Iteration     | 24       |
| MaximumReturn | 1.18e+03 |
| MinimumReturn | 995      |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010497376322746277
Validation loss = 0.011173401027917862
Validation loss = 0.010439619421958923
Validation loss = 0.010733755305409431
Validation loss = 0.01112016849219799
Validation loss = 0.01044695544987917
Validation loss = 0.010793492197990417
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010925022885203362
Validation loss = 0.010258867405354977
Validation loss = 0.010150829330086708
Validation loss = 0.010096164420247078
Validation loss = 0.010063376277685165
Validation loss = 0.010501974262297153
Validation loss = 0.010288078337907791
Validation loss = 0.010251224040985107
Validation loss = 0.010357977822422981
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010874100960791111
Validation loss = 0.010449957102537155
Validation loss = 0.011004108935594559
Validation loss = 0.011013121344149113
Validation loss = 0.01062236912548542
Validation loss = 0.010247261263430119
Validation loss = 0.010763296857476234
Validation loss = 0.010479395277798176
Validation loss = 0.010399124585092068
Validation loss = 0.010717122815549374
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01037575677037239
Validation loss = 0.009667948819696903
Validation loss = 0.010716528631746769
Validation loss = 0.010082651861011982
Validation loss = 0.010702782310545444
Validation loss = 0.009991026483476162
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010628834366798401
Validation loss = 0.010408886708319187
Validation loss = 0.010751268826425076
Validation loss = 0.010162537917494774
Validation loss = 0.010533741675317287
Validation loss = 0.010082666762173176
Validation loss = 0.009811138734221458
Validation loss = 0.009944628924131393
Validation loss = 0.010252749547362328
Validation loss = 0.010221417993307114
Validation loss = 0.010315867140889168
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.08e+03 |
| Iteration     | 25       |
| MaximumReturn | 1.17e+03 |
| MinimumReturn | 938      |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010601164773106575
Validation loss = 0.011081243865191936
Validation loss = 0.010169911198318005
Validation loss = 0.011144101619720459
Validation loss = 0.010598473250865936
Validation loss = 0.010567690245807171
Validation loss = 0.010604028590023518
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01048290729522705
Validation loss = 0.010262428782880306
Validation loss = 0.010041527450084686
Validation loss = 0.010342084802687168
Validation loss = 0.010200445540249348
Validation loss = 0.010243123397231102
Validation loss = 0.009805064648389816
Validation loss = 0.009898560121655464
Validation loss = 0.009600869379937649
Validation loss = 0.009873735718429089
Validation loss = 0.010161979123950005
Validation loss = 0.010489067994058132
Validation loss = 0.009739197790622711
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01062395516782999
Validation loss = 0.010040801018476486
Validation loss = 0.010809345170855522
Validation loss = 0.010556619614362717
Validation loss = 0.01030160766094923
Validation loss = 0.010060683824121952
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01113987062126398
Validation loss = 0.010359909385442734
Validation loss = 0.009835041128098965
Validation loss = 0.010227126069366932
Validation loss = 0.010400424711406231
Validation loss = 0.010216440074145794
Validation loss = 0.00994646642357111
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010483410209417343
Validation loss = 0.01006725337356329
Validation loss = 0.009788000024855137
Validation loss = 0.010409022681415081
Validation loss = 0.00996478833258152
Validation loss = 0.009424066171050072
Validation loss = 0.009826671332120895
Validation loss = 0.009794224053621292
Validation loss = 0.01015139278024435
Validation loss = 0.009923321194946766
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.05e+03 |
| Iteration     | 26       |
| MaximumReturn | 1.12e+03 |
| MinimumReturn | 899      |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010896682739257812
Validation loss = 0.010383086279034615
Validation loss = 0.009903840720653534
Validation loss = 0.010095498524606228
Validation loss = 0.00983812753111124
Validation loss = 0.009881417267024517
Validation loss = 0.010013720951974392
Validation loss = 0.009832242503762245
Validation loss = 0.010115072131156921
Validation loss = 0.009997968561947346
Validation loss = 0.009638102725148201
Validation loss = 0.010165772400796413
Validation loss = 0.009644380770623684
Validation loss = 0.009778299368917942
Validation loss = 0.009789420291781425
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010433177463710308
Validation loss = 0.009999774396419525
Validation loss = 0.009619250893592834
Validation loss = 0.00939701497554779
Validation loss = 0.009985322132706642
Validation loss = 0.009434153325855732
Validation loss = 0.009364945814013481
Validation loss = 0.009205617941915989
Validation loss = 0.009885122068226337
Validation loss = 0.010059831663966179
Validation loss = 0.009765475988388062
Validation loss = 0.009237040765583515
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010261178016662598
Validation loss = 0.010357162915170193
Validation loss = 0.0096531817689538
Validation loss = 0.009938769973814487
Validation loss = 0.009876714088022709
Validation loss = 0.010061183013021946
Validation loss = 0.009604385122656822
Validation loss = 0.010066098533570766
Validation loss = 0.009931919164955616
Validation loss = 0.009892602451145649
Validation loss = 0.009940293617546558
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010251142084598541
Validation loss = 0.009401092305779457
Validation loss = 0.009965318255126476
Validation loss = 0.009390654973685741
Validation loss = 0.009606131352484226
Validation loss = 0.009381049312651157
Validation loss = 0.009601433761417866
Validation loss = 0.009494724683463573
Validation loss = 0.009850923903286457
Validation loss = 0.009571158327162266
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010310826823115349
Validation loss = 0.009796581231057644
Validation loss = 0.009624104015529156
Validation loss = 0.009440732188522816
Validation loss = 0.009339977987110615
Validation loss = 0.009624618105590343
Validation loss = 0.009628982283174992
Validation loss = 0.00980181060731411
Validation loss = 0.009799516759812832
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.02e+03 |
| Iteration     | 27       |
| MaximumReturn | 1.18e+03 |
| MinimumReturn | 736      |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01040184497833252
Validation loss = 0.009509432129561901
Validation loss = 0.009601726196706295
Validation loss = 0.01054926123470068
Validation loss = 0.009757565334439278
Validation loss = 0.009802400134503841
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009903199970722198
Validation loss = 0.009091133251786232
Validation loss = 0.009786568582057953
Validation loss = 0.009383264929056168
Validation loss = 0.009685983881354332
Validation loss = 0.010048319585621357
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010706049390137196
Validation loss = 0.009604469873011112
Validation loss = 0.009884452447295189
Validation loss = 0.009307033382356167
Validation loss = 0.009420233778655529
Validation loss = 0.009877240285277367
Validation loss = 0.00972237903624773
Validation loss = 0.009745479561388493
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009605063125491142
Validation loss = 0.009260203689336777
Validation loss = 0.008703475818037987
Validation loss = 0.009774417616426945
Validation loss = 0.009771446697413921
Validation loss = 0.008945964276790619
Validation loss = 0.009321452118456364
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010395676828920841
Validation loss = 0.009464239701628685
Validation loss = 0.00975735578685999
Validation loss = 0.00967742595821619
Validation loss = 0.008960595354437828
Validation loss = 0.009402232244610786
Validation loss = 0.009515662677586079
Validation loss = 0.009491541422903538
Validation loss = 0.009110532701015472
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.07e+03 |
| Iteration     | 28       |
| MaximumReturn | 1.14e+03 |
| MinimumReturn | 946      |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009534760378301144
Validation loss = 0.00949469767510891
Validation loss = 0.009086443111300468
Validation loss = 0.009126934222877026
Validation loss = 0.00948802474886179
Validation loss = 0.009511278010904789
Validation loss = 0.009198945015668869
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009872907772660255
Validation loss = 0.010237523354589939
Validation loss = 0.00928717665374279
Validation loss = 0.009453684091567993
Validation loss = 0.00949013326317072
Validation loss = 0.009227843955159187
Validation loss = 0.008982974104583263
Validation loss = 0.009253988973796368
Validation loss = 0.009182190522551537
Validation loss = 0.008889892138540745
Validation loss = 0.008918656036257744
Validation loss = 0.00914288405328989
Validation loss = 0.008547252975404263
Validation loss = 0.009150651283562183
Validation loss = 0.009111372753977776
Validation loss = 0.008728752844035625
Validation loss = 0.008809028193354607
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00997433252632618
Validation loss = 0.009597955271601677
Validation loss = 0.009687650948762894
Validation loss = 0.008963470347225666
Validation loss = 0.009220559149980545
Validation loss = 0.00865399744361639
Validation loss = 0.009198964573442936
Validation loss = 0.009396789595484734
Validation loss = 0.00901081319898367
Validation loss = 0.009413710795342922
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009841104038059711
Validation loss = 0.008799226954579353
Validation loss = 0.009180586785078049
Validation loss = 0.009537250734865665
Validation loss = 0.009329364635050297
Validation loss = 0.00893851462751627
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010025018826127052
Validation loss = 0.009406933560967445
Validation loss = 0.008941116742789745
Validation loss = 0.00909226294606924
Validation loss = 0.00907118059694767
Validation loss = 0.009354783222079277
Validation loss = 0.009861985221505165
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.03e+03 |
| Iteration     | 29       |
| MaximumReturn | 1.1e+03  |
| MinimumReturn | 909      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009089541621506214
Validation loss = 0.009316538460552692
Validation loss = 0.009542381390929222
Validation loss = 0.009551250375807285
Validation loss = 0.009540120139718056
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008951147086918354
Validation loss = 0.00889377947896719
Validation loss = 0.008650656789541245
Validation loss = 0.008976798504590988
Validation loss = 0.008936041966080666
Validation loss = 0.008861510083079338
Validation loss = 0.009119587019085884
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009555995464324951
Validation loss = 0.00894096028059721
Validation loss = 0.00933225080370903
Validation loss = 0.008661722764372826
Validation loss = 0.009118860587477684
Validation loss = 0.009313983842730522
Validation loss = 0.00904096383601427
Validation loss = 0.00891261175274849
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009216596372425556
Validation loss = 0.009455638937652111
Validation loss = 0.009225155226886272
Validation loss = 0.009122501127421856
Validation loss = 0.009613457135856152
Validation loss = 0.009446598589420319
Validation loss = 0.008619047701358795
Validation loss = 0.008702089078724384
Validation loss = 0.009003960527479649
Validation loss = 0.00869828462600708
Validation loss = 0.008767927065491676
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009304869920015335
Validation loss = 0.009043683297932148
Validation loss = 0.009050410240888596
Validation loss = 0.009646512567996979
Validation loss = 0.009176363237202168
Validation loss = 0.008912893012166023
Validation loss = 0.008996143005788326
Validation loss = 0.008792883716523647
Validation loss = 0.008910576812922955
Validation loss = 0.008788546547293663
Validation loss = 0.00843802373856306
Validation loss = 0.008748097345232964
Validation loss = 0.010250521823763847
Validation loss = 0.008546982891857624
Validation loss = 0.008486401289701462
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 836      |
| Iteration     | 30       |
| MaximumReturn | 1.15e+03 |
| MinimumReturn | -274     |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009762421250343323
Validation loss = 0.01032241154462099
Validation loss = 0.010102685540914536
Validation loss = 0.009644925594329834
Validation loss = 0.01139291562139988
Validation loss = 0.009881502017378807
Validation loss = 0.010695874691009521
Validation loss = 0.010024973191320896
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01029093749821186
Validation loss = 0.010045532137155533
Validation loss = 0.010150463320314884
Validation loss = 0.009731698781251907
Validation loss = 0.009770037606358528
Validation loss = 0.010103005915880203
Validation loss = 0.009964896366000175
Validation loss = 0.009053150191903114
Validation loss = 0.009141920134425163
Validation loss = 0.009880798868834972
Validation loss = 0.009772158227860928
Validation loss = 0.009656705893576145
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009990421123802662
Validation loss = 0.010195057839155197
Validation loss = 0.009977457113564014
Validation loss = 0.009790701791644096
Validation loss = 0.010145347565412521
Validation loss = 0.010014730505645275
Validation loss = 0.010573290288448334
Validation loss = 0.010021284222602844
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009994067251682281
Validation loss = 0.009457881562411785
Validation loss = 0.010278644040226936
Validation loss = 0.009540914557874203
Validation loss = 0.009604713879525661
Validation loss = 0.009678471833467484
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010612692683935165
Validation loss = 0.009694885462522507
Validation loss = 0.009342427365481853
Validation loss = 0.00957128033041954
Validation loss = 0.009435327723622322
Validation loss = 0.009869697503745556
Validation loss = 0.009882737882435322
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 715      |
| Iteration     | 31       |
| MaximumReturn | 1.22e+03 |
| MinimumReturn | -333     |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0114109106361866
Validation loss = 0.010417168959975243
Validation loss = 0.010301497764885426
Validation loss = 0.010293815284967422
Validation loss = 0.010433461517095566
Validation loss = 0.010527360253036022
Validation loss = 0.010966896079480648
Validation loss = 0.010504945181310177
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01088214572519064
Validation loss = 0.010425346903502941
Validation loss = 0.010036066174507141
Validation loss = 0.009683314710855484
Validation loss = 0.009767417795956135
Validation loss = 0.010090967640280724
Validation loss = 0.010142737999558449
Validation loss = 0.009877962060272694
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010785074904561043
Validation loss = 0.009623206220567226
Validation loss = 0.010608549229800701
Validation loss = 0.01001013908535242
Validation loss = 0.009856854565441608
Validation loss = 0.01015452016144991
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010896192863583565
Validation loss = 0.009551472030580044
Validation loss = 0.010594064369797707
Validation loss = 0.009799951687455177
Validation loss = 0.009351587854325771
Validation loss = 0.009707740508019924
Validation loss = 0.009832857176661491
Validation loss = 0.009840468876063824
Validation loss = 0.00966230221092701
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01138902734965086
Validation loss = 0.009824740700423717
Validation loss = 0.0100938118994236
Validation loss = 0.009988361969590187
Validation loss = 0.01032954826951027
Validation loss = 0.010415531694889069
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.09e+03 |
| Iteration     | 32       |
| MaximumReturn | 1.21e+03 |
| MinimumReturn | 722      |
| TotalSamples  | 136000   |
----------------------------
