Logging to experiments/gym_cheetahA01/gym_cheetahA01/Fri-28-Oct-2022-03-06-00-PM-CDT_gym_cheetahA01_trpo_iteration_20_seed2314
Print configuration .....
{'env_name': 'gym_cheetahA01', 'random_seeds': [4321, 2314, 2341, 3421], 'save_variables': False, 'model_save_dir': '/tmp/gym_cheetahA01_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6803014278411865
Validation loss = 0.1468048393726349
Validation loss = 0.0998963862657547
Validation loss = 0.08236100524663925
Validation loss = 0.07169710099697113
Validation loss = 0.06632834672927856
Validation loss = 0.0638047382235527
Validation loss = 0.062159061431884766
Validation loss = 0.06094106286764145
Validation loss = 0.06169365718960762
Validation loss = 0.056460924446582794
Validation loss = 0.05934552848339081
Validation loss = 0.05876247584819794
Validation loss = 0.06899291276931763
Validation loss = 0.06036358326673508
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.34257206320762634
Validation loss = 0.1479283720254898
Validation loss = 0.10118471086025238
Validation loss = 0.08258499205112457
Validation loss = 0.07261022180318832
Validation loss = 0.06837430596351624
Validation loss = 0.06697365641593933
Validation loss = 0.06372742354869843
Validation loss = 0.06530158966779709
Validation loss = 0.0660916194319725
Validation loss = 0.05688650533556938
Validation loss = 0.0572543740272522
Validation loss = 0.0581815168261528
Validation loss = 0.05884689465165138
Validation loss = 0.05772829055786133
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6774889230728149
Validation loss = 0.13789281249046326
Validation loss = 0.09196928143501282
Validation loss = 0.07931223511695862
Validation loss = 0.07083560526371002
Validation loss = 0.06816654652357101
Validation loss = 0.06672805547714233
Validation loss = 0.06666939705610275
Validation loss = 0.06108459457755089
Validation loss = 0.06529025733470917
Validation loss = 0.06086695194244385
Validation loss = 0.059183333069086075
Validation loss = 0.07762135565280914
Validation loss = 0.0577399879693985
Validation loss = 0.056588053703308105
Validation loss = 0.057424940168857574
Validation loss = 0.05539911612868309
Validation loss = 0.05975330248475075
Validation loss = 0.05506438761949539
Validation loss = 0.05456686019897461
Validation loss = 0.0553259402513504
Validation loss = 0.054651856422424316
Validation loss = 0.06568725407123566
Validation loss = 0.05528903752565384
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.38064733147621155
Validation loss = 0.15929687023162842
Validation loss = 0.10586322844028473
Validation loss = 0.084490105509758
Validation loss = 0.0756816565990448
Validation loss = 0.07016024738550186
Validation loss = 0.06532078981399536
Validation loss = 0.06377261877059937
Validation loss = 0.06658582389354706
Validation loss = 0.07722288370132446
Validation loss = 0.057886697351932526
Validation loss = 0.05891519784927368
Validation loss = 0.06306355446577072
Validation loss = 0.05507410317659378
Validation loss = 0.05636744573712349
Validation loss = 0.055234842002391815
Validation loss = 0.05307082086801529
Validation loss = 0.054657742381095886
Validation loss = 0.05365645885467529
Validation loss = 0.05912298336625099
Validation loss = 0.05484067648649216
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4366210401058197
Validation loss = 0.14219211041927338
Validation loss = 0.10123244673013687
Validation loss = 0.08236691355705261
Validation loss = 0.07346107065677643
Validation loss = 0.06767860054969788
Validation loss = 0.07405373454093933
Validation loss = 0.06331171095371246
Validation loss = 0.060255542397499084
Validation loss = 0.06458698213100433
Validation loss = 0.05824777111411095
Validation loss = 0.05753466486930847
Validation loss = 0.05629141628742218
Validation loss = 0.05682743340730667
Validation loss = 0.05611713230609894
Validation loss = 0.06143246218562126
Validation loss = 0.06044148653745651
Validation loss = 0.056751951575279236
Validation loss = 0.06806736439466476
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -307     |
| Iteration     | 0        |
| MaximumReturn | -270     |
| MinimumReturn | -332     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09515251964330673
Validation loss = 0.06795164942741394
Validation loss = 0.05998827517032623
Validation loss = 0.05734071135520935
Validation loss = 0.05100239813327789
Validation loss = 0.05051750689744949
Validation loss = 0.051134999841451645
Validation loss = 0.053817931562662125
Validation loss = 0.05064849555492401
Validation loss = 0.045961689203977585
Validation loss = 0.052857931703329086
Validation loss = 0.04601198434829712
Validation loss = 0.04895384982228279
Validation loss = 0.04423265904188156
Validation loss = 0.05116467550396919
Validation loss = 0.04423350840806961
Validation loss = 0.04406064376235008
Validation loss = 0.047505933791399
Validation loss = 0.050877414643764496
Validation loss = 0.04433700814843178
Validation loss = 0.042352937161922455
Validation loss = 0.041476599872112274
Validation loss = 0.04353226721286774
Validation loss = 0.04869478940963745
Validation loss = 0.04541812092065811
Validation loss = 0.04315562546253204
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10444527119398117
Validation loss = 0.06535918265581131
Validation loss = 0.05983884260058403
Validation loss = 0.05953091382980347
Validation loss = 0.05503043159842491
Validation loss = 0.059962719678878784
Validation loss = 0.051574576646089554
Validation loss = 0.05551940202713013
Validation loss = 0.048075344413518906
Validation loss = 0.047827452421188354
Validation loss = 0.046202197670936584
Validation loss = 0.046953268349170685
Validation loss = 0.04481040686368942
Validation loss = 0.05242234468460083
Validation loss = 0.04652415215969086
Validation loss = 0.059212084859609604
Validation loss = 0.04878950119018555
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10031282901763916
Validation loss = 0.06469756364822388
Validation loss = 0.05822782218456268
Validation loss = 0.05398138239979744
Validation loss = 0.0614185705780983
Validation loss = 0.048706166446208954
Validation loss = 0.0480467863380909
Validation loss = 0.04836900159716606
Validation loss = 0.04854496195912361
Validation loss = 0.04768849164247513
Validation loss = 0.04789486154913902
Validation loss = 0.05246423929929733
Validation loss = 0.04616779088973999
Validation loss = 0.0485379733145237
Validation loss = 0.053158439695835114
Validation loss = 0.04425516724586487
Validation loss = 0.0446925051510334
Validation loss = 0.04355177283287048
Validation loss = 0.045847088098526
Validation loss = 0.0432303249835968
Validation loss = 0.044777024537324905
Validation loss = 0.04350832849740982
Validation loss = 0.046305786818265915
Validation loss = 0.04898574948310852
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1046234667301178
Validation loss = 0.06287530809640884
Validation loss = 0.060927532613277435
Validation loss = 0.06236425042152405
Validation loss = 0.05174273997545242
Validation loss = 0.050406694412231445
Validation loss = 0.04861770570278168
Validation loss = 0.048919860273599625
Validation loss = 0.049584273248910904
Validation loss = 0.04827476665377617
Validation loss = 0.04887605831027031
Validation loss = 0.046957291662693024
Validation loss = 0.04544340446591377
Validation loss = 0.051162295043468475
Validation loss = 0.046417415142059326
Validation loss = 0.049421295523643494
Validation loss = 0.04704543203115463
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10303079336881638
Validation loss = 0.06518949568271637
Validation loss = 0.060558564960956573
Validation loss = 0.05674433708190918
Validation loss = 0.05585165694355965
Validation loss = 0.05263740196824074
Validation loss = 0.05099041014909744
Validation loss = 0.04891718924045563
Validation loss = 0.052989572286605835
Validation loss = 0.04861640930175781
Validation loss = 0.04798116534948349
Validation loss = 0.04870855063199997
Validation loss = 0.04751509428024292
Validation loss = 0.04932650178670883
Validation loss = 0.051604967564344406
Validation loss = 0.04541538655757904
Validation loss = 0.04726968705654144
Validation loss = 0.046575747430324554
Validation loss = 0.0443992018699646
Validation loss = 0.04958341270685196
Validation loss = 0.045598194003105164
Validation loss = 0.04881536588072777
Validation loss = 0.045186661183834076
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -332     |
| Iteration     | 1        |
| MaximumReturn | -281     |
| MinimumReturn | -385     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11703219264745712
Validation loss = 0.06301359832286835
Validation loss = 0.058852195739746094
Validation loss = 0.060667190700769424
Validation loss = 0.05566208437085152
Validation loss = 0.054945457726716995
Validation loss = 0.05710892379283905
Validation loss = 0.057525768876075745
Validation loss = 0.05696980655193329
Validation loss = 0.05375520512461662
Validation loss = 0.05352654680609703
Validation loss = 0.05417915806174278
Validation loss = 0.058823540806770325
Validation loss = 0.053947705775499344
Validation loss = 0.05157456174492836
Validation loss = 0.053937215358018875
Validation loss = 0.052711743861436844
Validation loss = 0.050104886293411255
Validation loss = 0.05509697273373604
Validation loss = 0.05242536589503288
Validation loss = 0.0511498749256134
Validation loss = 0.054512638598680496
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09852162003517151
Validation loss = 0.061016906052827835
Validation loss = 0.06296209245920181
Validation loss = 0.061123982071876526
Validation loss = 0.056384626775979996
Validation loss = 0.05697370693087578
Validation loss = 0.05580149590969086
Validation loss = 0.056064825505018234
Validation loss = 0.055882036685943604
Validation loss = 0.05665254592895508
Validation loss = 0.056275784969329834
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10686080902814865
Validation loss = 0.059526488184928894
Validation loss = 0.05919361114501953
Validation loss = 0.05748291686177254
Validation loss = 0.05736151710152626
Validation loss = 0.057013850659132004
Validation loss = 0.05359223484992981
Validation loss = 0.05446888133883476
Validation loss = 0.06012631952762604
Validation loss = 0.05266823247075081
Validation loss = 0.052743613719940186
Validation loss = 0.051830947399139404
Validation loss = 0.05210359767079353
Validation loss = 0.053458232432603836
Validation loss = 0.051262784749269485
Validation loss = 0.056905347853899
Validation loss = 0.051376789808273315
Validation loss = 0.052962642163038254
Validation loss = 0.05410609766840935
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09571152925491333
Validation loss = 0.06135650351643562
Validation loss = 0.05850709602236748
Validation loss = 0.05766696110367775
Validation loss = 0.05832682177424431
Validation loss = 0.05733998492360115
Validation loss = 0.05681271851062775
Validation loss = 0.05870167538523674
Validation loss = 0.056097012013196945
Validation loss = 0.0563894547522068
Validation loss = 0.056703079491853714
Validation loss = 0.05557822063565254
Validation loss = 0.057710420340299606
Validation loss = 0.05541510879993439
Validation loss = 0.06001180782914162
Validation loss = 0.060558248311281204
Validation loss = 0.05334897339344025
Validation loss = 0.05425117537379265
Validation loss = 0.05754917860031128
Validation loss = 0.0535145103931427
Validation loss = 0.054944802075624466
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10178767889738083
Validation loss = 0.06089780852198601
Validation loss = 0.05699142441153526
Validation loss = 0.056727334856987
Validation loss = 0.057110387831926346
Validation loss = 0.05526856705546379
Validation loss = 0.05482715368270874
Validation loss = 0.05471178516745567
Validation loss = 0.056261032819747925
Validation loss = 0.05315105617046356
Validation loss = 0.05500779673457146
Validation loss = 0.05867670476436615
Validation loss = 0.052307020872831345
Validation loss = 0.054341357201337814
Validation loss = 0.052621643990278244
Validation loss = 0.0538521371781826
Validation loss = 0.05324052646756172
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 243      |
| Iteration     | 2        |
| MaximumReturn | 352      |
| MinimumReturn | 171      |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.055102959275245667
Validation loss = 0.0463898703455925
Validation loss = 0.045491281896829605
Validation loss = 0.046958036720752716
Validation loss = 0.045945145189762115
Validation loss = 0.04622325301170349
Validation loss = 0.04331177473068237
Validation loss = 0.04439275339245796
Validation loss = 0.04550546407699585
Validation loss = 0.04491083323955536
Validation loss = 0.04198870807886124
Validation loss = 0.055942364037036896
Validation loss = 0.04388958588242531
Validation loss = 0.04273403435945511
Validation loss = 0.04417094215750694
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0559479221701622
Validation loss = 0.049365557730197906
Validation loss = 0.05133339762687683
Validation loss = 0.04762599989771843
Validation loss = 0.051914989948272705
Validation loss = 0.049183763563632965
Validation loss = 0.04901265725493431
Validation loss = 0.04730290174484253
Validation loss = 0.04492408037185669
Validation loss = 0.04448157548904419
Validation loss = 0.04515829682350159
Validation loss = 0.04509358108043671
Validation loss = 0.046382106840610504
Validation loss = 0.04480225592851639
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05671905726194382
Validation loss = 0.045544955879449844
Validation loss = 0.04680013656616211
Validation loss = 0.04490462318062782
Validation loss = 0.04363969713449478
Validation loss = 0.0455753467977047
Validation loss = 0.044086240231990814
Validation loss = 0.045991867780685425
Validation loss = 0.04352884367108345
Validation loss = 0.043859194964170456
Validation loss = 0.04405702278017998
Validation loss = 0.04415290430188179
Validation loss = 0.04379144683480263
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05586818605661392
Validation loss = 0.05070430785417557
Validation loss = 0.045533161610364914
Validation loss = 0.046532534062862396
Validation loss = 0.04777603596448898
Validation loss = 0.04491746798157692
Validation loss = 0.04636247456073761
Validation loss = 0.04568888992071152
Validation loss = 0.045505404472351074
Validation loss = 0.06480458378791809
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06211800128221512
Validation loss = 0.047774672508239746
Validation loss = 0.0481397807598114
Validation loss = 0.04611308127641678
Validation loss = 0.047426849603652954
Validation loss = 0.04737904667854309
Validation loss = 0.050434209406375885
Validation loss = 0.045792482793331146
Validation loss = 0.0440315306186676
Validation loss = 0.04360804706811905
Validation loss = 0.045656345784664154
Validation loss = 0.04707438871264458
Validation loss = 0.045460231602191925
Validation loss = 0.044562239199876785
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 41.8     |
| Iteration     | 3        |
| MaximumReturn | 513      |
| MinimumReturn | -373     |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09574445337057114
Validation loss = 0.054153114557266235
Validation loss = 0.047239236533641815
Validation loss = 0.048056669533252716
Validation loss = 0.0476878397166729
Validation loss = 0.04390571266412735
Validation loss = 0.04362819716334343
Validation loss = 0.04369501769542694
Validation loss = 0.04172199219465256
Validation loss = 0.04380752518773079
Validation loss = 0.043553732335567474
Validation loss = 0.043187301605939865
Validation loss = 0.04211437702178955
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10265179723501205
Validation loss = 0.05695632845163345
Validation loss = 0.053999047726392746
Validation loss = 0.052854038774967194
Validation loss = 0.0476263128221035
Validation loss = 0.0475868359208107
Validation loss = 0.04794054478406906
Validation loss = 0.0453566238284111
Validation loss = 0.04636401683092117
Validation loss = 0.04520728439092636
Validation loss = 0.046163491904735565
Validation loss = 0.04633490741252899
Validation loss = 0.04890967532992363
Validation loss = 0.04467438906431198
Validation loss = 0.045330487191677094
Validation loss = 0.05159733444452286
Validation loss = 0.04562406241893768
Validation loss = 0.045501336455345154
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1133914440870285
Validation loss = 0.05336235091090202
Validation loss = 0.04915861412882805
Validation loss = 0.04773253947496414
Validation loss = 0.04880877211689949
Validation loss = 0.04532523453235626
Validation loss = 0.045479916036129
Validation loss = 0.04542303830385208
Validation loss = 0.04705509915947914
Validation loss = 0.044366031885147095
Validation loss = 0.04507873207330704
Validation loss = 0.04666539654135704
Validation loss = 0.04542458802461624
Validation loss = 0.044020142406225204
Validation loss = 0.04583483189344406
Validation loss = 0.04320920258760452
Validation loss = 0.04273112118244171
Validation loss = 0.043109994381666183
Validation loss = 0.04280934855341911
Validation loss = 0.04323801025748253
Validation loss = 0.043301742523908615
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0988006666302681
Validation loss = 0.05522334575653076
Validation loss = 0.05104900524020195
Validation loss = 0.04715530201792717
Validation loss = 0.04771041497588158
Validation loss = 0.04594355821609497
Validation loss = 0.04545194283127785
Validation loss = 0.04439511522650719
Validation loss = 0.04662670940160751
Validation loss = 0.04532836377620697
Validation loss = 0.044456303119659424
Validation loss = 0.04443467780947685
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.096649169921875
Validation loss = 0.05041065812110901
Validation loss = 0.04816282168030739
Validation loss = 0.04540221393108368
Validation loss = 0.046207133680582047
Validation loss = 0.045007314532995224
Validation loss = 0.04530809074640274
Validation loss = 0.04589629918336868
Validation loss = 0.0441095344722271
Validation loss = 0.04325404018163681
Validation loss = 0.043537743389606476
Validation loss = 0.04377719759941101
Validation loss = 0.04233323037624359
Validation loss = 0.04384685680270195
Validation loss = 0.042393118143081665
Validation loss = 0.04486267641186714
Validation loss = 0.04319554567337036
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -319     |
| Iteration     | 4        |
| MaximumReturn | -158     |
| MinimumReturn | -430     |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04912113770842552
Validation loss = 0.043901097029447556
Validation loss = 0.041285570710897446
Validation loss = 0.042351771146059036
Validation loss = 0.04078326001763344
Validation loss = 0.04052925482392311
Validation loss = 0.045363616198301315
Validation loss = 0.04195811226963997
Validation loss = 0.040428537875413895
Validation loss = 0.04114702716469765
Validation loss = 0.038939218968153
Validation loss = 0.042909085750579834
Validation loss = 0.04201645031571388
Validation loss = 0.03925918787717819
Validation loss = 0.040577877312898636
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05189173296093941
Validation loss = 0.044001832604408264
Validation loss = 0.044419195502996445
Validation loss = 0.045749034732580185
Validation loss = 0.0428348183631897
Validation loss = 0.043045248836278915
Validation loss = 0.043702781200408936
Validation loss = 0.044246334582567215
Validation loss = 0.044876810163259506
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05480160191655159
Validation loss = 0.04360276833176613
Validation loss = 0.04523135721683502
Validation loss = 0.042968448251485825
Validation loss = 0.046576518565416336
Validation loss = 0.042861711233854294
Validation loss = 0.0405423641204834
Validation loss = 0.04078085348010063
Validation loss = 0.041949689388275146
Validation loss = 0.041255149990320206
Validation loss = 0.04043669253587723
Validation loss = 0.04181893169879913
Validation loss = 0.040566325187683105
Validation loss = 0.04205634817481041
Validation loss = 0.04025890678167343
Validation loss = 0.03899410739541054
Validation loss = 0.04013874754309654
Validation loss = 0.039317794144153595
Validation loss = 0.04127020016312599
Validation loss = 0.03946571424603462
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05251730605959892
Validation loss = 0.043796226382255554
Validation loss = 0.04509492218494415
Validation loss = 0.043193161487579346
Validation loss = 0.04336954280734062
Validation loss = 0.04474973678588867
Validation loss = 0.042066946625709534
Validation loss = 0.04128917306661606
Validation loss = 0.04579591378569603
Validation loss = 0.04327816143631935
Validation loss = 0.04242761805653572
Validation loss = 0.04152363911271095
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05470481514930725
Validation loss = 0.04274798557162285
Validation loss = 0.04200037196278572
Validation loss = 0.045434992760419846
Validation loss = 0.04108468070626259
Validation loss = 0.04003917798399925
Validation loss = 0.04116368293762207
Validation loss = 0.039967477321624756
Validation loss = 0.04098402336239815
Validation loss = 0.04084436222910881
Validation loss = 0.0388481579720974
Validation loss = 0.04333971068263054
Validation loss = 0.03987005725502968
Validation loss = 0.04144226387143135
Validation loss = 0.03999973088502884
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 45.9     |
| Iteration     | 5        |
| MaximumReturn | 364      |
| MinimumReturn | -242     |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0473564974963665
Validation loss = 0.038893405348062515
Validation loss = 0.03923913463950157
Validation loss = 0.04001009836792946
Validation loss = 0.039887893944978714
Validation loss = 0.039540890604257584
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04863152652978897
Validation loss = 0.0425625741481781
Validation loss = 0.04035554081201553
Validation loss = 0.03930595889687538
Validation loss = 0.040206704288721085
Validation loss = 0.03952465206384659
Validation loss = 0.0404895544052124
Validation loss = 0.04041758179664612
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05313103646039963
Validation loss = 0.039437975734472275
Validation loss = 0.03855949640274048
Validation loss = 0.03779562562704086
Validation loss = 0.03931302949786186
Validation loss = 0.03814956918358803
Validation loss = 0.03875543922185898
Validation loss = 0.03832654282450676
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05164561793208122
Validation loss = 0.04068910703063011
Validation loss = 0.04106602072715759
Validation loss = 0.04006441310048103
Validation loss = 0.04047065228223801
Validation loss = 0.03884565457701683
Validation loss = 0.04019933193922043
Validation loss = 0.04210760444402695
Validation loss = 0.04309067875146866
Validation loss = 0.040898289531469345
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.049361251294612885
Validation loss = 0.040318526327610016
Validation loss = 0.038490135222673416
Validation loss = 0.03998679295182228
Validation loss = 0.03859464079141617
Validation loss = 0.03788871690630913
Validation loss = 0.038274139165878296
Validation loss = 0.038100481033325195
Validation loss = 0.039067722856998444
Validation loss = 0.03854595869779587
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 389      |
| Iteration     | 6        |
| MaximumReturn | 1.04e+03 |
| MinimumReturn | -256     |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03943629562854767
Validation loss = 0.03732167184352875
Validation loss = 0.03533092886209488
Validation loss = 0.036554254591464996
Validation loss = 0.036913640797138214
Validation loss = 0.03558117896318436
Validation loss = 0.0356810986995697
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04408695548772812
Validation loss = 0.038061901926994324
Validation loss = 0.04042038321495056
Validation loss = 0.036344364285469055
Validation loss = 0.03721597418189049
Validation loss = 0.03538672626018524
Validation loss = 0.037954844534397125
Validation loss = 0.0390496551990509
Validation loss = 0.03630172088742256
Validation loss = 0.03634987771511078
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04190494865179062
Validation loss = 0.03618839755654335
Validation loss = 0.03698236867785454
Validation loss = 0.035195983946323395
Validation loss = 0.03570827096700668
Validation loss = 0.03595343977212906
Validation loss = 0.035309940576553345
Validation loss = 0.03461601585149765
Validation loss = 0.03463803231716156
Validation loss = 0.03728789836168289
Validation loss = 0.036300987005233765
Validation loss = 0.03484802693128586
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04057697206735611
Validation loss = 0.03876655176281929
Validation loss = 0.036847781389951706
Validation loss = 0.03676822781562805
Validation loss = 0.03583653271198273
Validation loss = 0.036010321229696274
Validation loss = 0.035986416041851044
Validation loss = 0.03745211660861969
Validation loss = 0.036181576550006866
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.044664785265922546
Validation loss = 0.037618912756443024
Validation loss = 0.03615495562553406
Validation loss = 0.034628041088581085
Validation loss = 0.03669186681509018
Validation loss = 0.03807000443339348
Validation loss = 0.035833291709423065
Validation loss = 0.03435572236776352
Validation loss = 0.03552016615867615
Validation loss = 0.04100609943270683
Validation loss = 0.035110294818878174
Validation loss = 0.03324296325445175
Validation loss = 0.03606195002794266
Validation loss = 0.03625829517841339
Validation loss = 0.03414071723818779
Validation loss = 0.033593371510505676
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 385      |
| Iteration     | 7        |
| MaximumReturn | 1.48e+03 |
| MinimumReturn | -338     |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03902728855609894
Validation loss = 0.03578657656908035
Validation loss = 0.03335767239332199
Validation loss = 0.03394757956266403
Validation loss = 0.03295998275279999
Validation loss = 0.032781027257442474
Validation loss = 0.03307407349348068
Validation loss = 0.03303861245512962
Validation loss = 0.033820170909166336
Validation loss = 0.034992530941963196
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04151993244886398
Validation loss = 0.03543290123343468
Validation loss = 0.036810651421546936
Validation loss = 0.03507418558001518
Validation loss = 0.0349414236843586
Validation loss = 0.036534056067466736
Validation loss = 0.03351139277219772
Validation loss = 0.03480261191725731
Validation loss = 0.03357141092419624
Validation loss = 0.03521198406815529
Validation loss = 0.03556935116648674
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.042636170983314514
Validation loss = 0.035163722932338715
Validation loss = 0.034683290868997574
Validation loss = 0.03580671548843384
Validation loss = 0.033185943961143494
Validation loss = 0.03322945535182953
Validation loss = 0.033550795167684555
Validation loss = 0.03313831984996796
Validation loss = 0.0330175906419754
Validation loss = 0.032834604382514954
Validation loss = 0.03441980108618736
Validation loss = 0.03273402154445648
Validation loss = 0.0342521034181118
Validation loss = 0.03310354799032211
Validation loss = 0.03252682462334633
Validation loss = 0.0330764502286911
Validation loss = 0.03229295462369919
Validation loss = 0.03364275023341179
Validation loss = 0.03234252333641052
Validation loss = 0.03196980059146881
Validation loss = 0.03310392424464226
Validation loss = 0.03259846568107605
Validation loss = 0.031245453283190727
Validation loss = 0.03229164332151413
Validation loss = 0.03293636068701744
Validation loss = 0.03170256316661835
Validation loss = 0.03249526768922806
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04390356317162514
Validation loss = 0.034277740865945816
Validation loss = 0.036621127277612686
Validation loss = 0.035337287932634354
Validation loss = 0.03595595061779022
Validation loss = 0.03460704907774925
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04083390161395073
Validation loss = 0.033924516290426254
Validation loss = 0.033833425492048264
Validation loss = 0.03305387869477272
Validation loss = 0.03724822774529457
Validation loss = 0.032616302371025085
Validation loss = 0.03240630775690079
Validation loss = 0.03217817842960358
Validation loss = 0.032946694642305374
Validation loss = 0.032187167555093765
Validation loss = 0.032507289201021194
Validation loss = 0.031285420060157776
Validation loss = 0.032394733279943466
Validation loss = 0.03089965134859085
Validation loss = 0.030952222645282745
Validation loss = 0.03212112933397293
Validation loss = 0.031931065022945404
Validation loss = 0.031900111585855484
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.15e+03 |
| Iteration     | 8        |
| MaximumReturn | 1.59e+03 |
| MinimumReturn | -1.93    |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03697943314909935
Validation loss = 0.03224291279911995
Validation loss = 0.030223000794649124
Validation loss = 0.030236229300498962
Validation loss = 0.02926192246377468
Validation loss = 0.029825910925865173
Validation loss = 0.02959347888827324
Validation loss = 0.031138021498918533
Validation loss = 0.03147037327289581
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03573913127183914
Validation loss = 0.0298757404088974
Validation loss = 0.03061440959572792
Validation loss = 0.0325513631105423
Validation loss = 0.03032802604138851
Validation loss = 0.03112410567700863
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03371121734380722
Validation loss = 0.029012080281972885
Validation loss = 0.02801007404923439
Validation loss = 0.029421186074614525
Validation loss = 0.030001169070601463
Validation loss = 0.028171012178063393
Validation loss = 0.028471756726503372
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.035847410559654236
Validation loss = 0.03204520791769028
Validation loss = 0.03130711615085602
Validation loss = 0.03180404379963875
Validation loss = 0.03098076954483986
Validation loss = 0.030873706564307213
Validation loss = 0.030945878475904465
Validation loss = 0.029826918616890907
Validation loss = 0.03125062957406044
Validation loss = 0.030527910217642784
Validation loss = 0.030755776911973953
Validation loss = 0.02916448749601841
Validation loss = 0.03023546002805233
Validation loss = 0.029570290818810463
Validation loss = 0.02904205024242401
Validation loss = 0.029236817732453346
Validation loss = 0.029596921056509018
Validation loss = 0.029095927253365517
Validation loss = 0.02867794595658779
Validation loss = 0.030736008659005165
Validation loss = 0.029185209423303604
Validation loss = 0.029750194400548935
Validation loss = 0.027820849791169167
Validation loss = 0.028541674837470055
Validation loss = 0.02741922065615654
Validation loss = 0.029379820451140404
Validation loss = 0.027441496029496193
Validation loss = 0.02691154181957245
Validation loss = 0.028124723583459854
Validation loss = 0.027960294857621193
Validation loss = 0.026736021041870117
Validation loss = 0.028495078906416893
Validation loss = 0.027367278933525085
Validation loss = 0.026950929313898087
Validation loss = 0.027241786941885948
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0353747121989727
Validation loss = 0.028607910498976707
Validation loss = 0.02902039885520935
Validation loss = 0.02918427623808384
Validation loss = 0.029567217454314232
Validation loss = 0.028950195759534836
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.34e+03 |
| Iteration     | 9        |
| MaximumReturn | 1.95e+03 |
| MinimumReturn | 370      |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03249029815196991
Validation loss = 0.027017058804631233
Validation loss = 0.028757549822330475
Validation loss = 0.027686547487974167
Validation loss = 0.027107514441013336
Validation loss = 0.028307156637310982
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03289986029267311
Validation loss = 0.03032544255256653
Validation loss = 0.028496449813246727
Validation loss = 0.028366588056087494
Validation loss = 0.027088386937975883
Validation loss = 0.0280449315905571
Validation loss = 0.027582287788391113
Validation loss = 0.02715734951198101
Validation loss = 0.026658209040760994
Validation loss = 0.029156379401683807
Validation loss = 0.02761702984571457
Validation loss = 0.028643488883972168
Validation loss = 0.026383979246020317
Validation loss = 0.02754610776901245
Validation loss = 0.027795149013400078
Validation loss = 0.026294676586985588
Validation loss = 0.027767198160290718
Validation loss = 0.029009949415922165
Validation loss = 0.026856420561671257
Validation loss = 0.027876710519194603
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03319564834237099
Validation loss = 0.027384711429476738
Validation loss = 0.027332831174135208
Validation loss = 0.028206754475831985
Validation loss = 0.027742773294448853
Validation loss = 0.027340160682797432
Validation loss = 0.02629782073199749
Validation loss = 0.02597736567258835
Validation loss = 0.025847053155303
Validation loss = 0.02681840769946575
Validation loss = 0.024665357545018196
Validation loss = 0.025113290175795555
Validation loss = 0.025798702612519264
Validation loss = 0.026181451976299286
Validation loss = 0.02584153600037098
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03144824877381325
Validation loss = 0.027072830125689507
Validation loss = 0.02496051974594593
Validation loss = 0.025777263566851616
Validation loss = 0.0256803035736084
Validation loss = 0.02433984912931919
Validation loss = 0.024678431451320648
Validation loss = 0.02441427670419216
Validation loss = 0.024252556264400482
Validation loss = 0.024134570732712746
Validation loss = 0.024432070553302765
Validation loss = 0.02596365287899971
Validation loss = 0.024850143119692802
Validation loss = 0.023904627189040184
Validation loss = 0.023453181609511375
Validation loss = 0.0236570555716753
Validation loss = 0.0241320189088583
Validation loss = 0.023863770067691803
Validation loss = 0.024078339338302612
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03474488854408264
Validation loss = 0.026573097333312035
Validation loss = 0.02680867351591587
Validation loss = 0.02631724439561367
Validation loss = 0.02668788470327854
Validation loss = 0.02638385258615017
Validation loss = 0.025840383023023605
Validation loss = 0.026531023904681206
Validation loss = 0.02577543444931507
Validation loss = 0.025808708742260933
Validation loss = 0.027698062360286713
Validation loss = 0.02537074126303196
Validation loss = 0.025813549757003784
Validation loss = 0.02627376839518547
Validation loss = 0.024706320837140083
Validation loss = 0.02493521384894848
Validation loss = 0.02422153949737549
Validation loss = 0.0260655228048563
Validation loss = 0.024385176599025726
Validation loss = 0.025399358943104744
Validation loss = 0.02601771242916584
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.04e+03 |
| Iteration     | 10       |
| MaximumReturn | 1.75e+03 |
| MinimumReturn | -126     |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.030686020851135254
Validation loss = 0.02673926018178463
Validation loss = 0.02618318237364292
Validation loss = 0.025077536702156067
Validation loss = 0.02580384910106659
Validation loss = 0.02542383037507534
Validation loss = 0.02638411708176136
Validation loss = 0.025296291336417198
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.030960774049162865
Validation loss = 0.026820799335837364
Validation loss = 0.02507711760699749
Validation loss = 0.025320002809166908
Validation loss = 0.024101361632347107
Validation loss = 0.025117019191384315
Validation loss = 0.02441534399986267
Validation loss = 0.02604760229587555
Validation loss = 0.023598387837409973
Validation loss = 0.02499188669025898
Validation loss = 0.024085721001029015
Validation loss = 0.025294942781329155
Validation loss = 0.023135272786021233
Validation loss = 0.026574470102787018
Validation loss = 0.022558584809303284
Validation loss = 0.02287227474153042
Validation loss = 0.023190060630440712
Validation loss = 0.023090535774827003
Validation loss = 0.02519984543323517
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03164759650826454
Validation loss = 0.026283802464604378
Validation loss = 0.02341534197330475
Validation loss = 0.025284303352236748
Validation loss = 0.02424982748925686
Validation loss = 0.023580431938171387
Validation loss = 0.0248794574290514
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03014003485441208
Validation loss = 0.02268317900598049
Validation loss = 0.022453607991337776
Validation loss = 0.022048329934477806
Validation loss = 0.022594863548874855
Validation loss = 0.022710293531417847
Validation loss = 0.023990752175450325
Validation loss = 0.02153860032558441
Validation loss = 0.02197439968585968
Validation loss = 0.02100071869790554
Validation loss = 0.022397631779313087
Validation loss = 0.021099410951137543
Validation loss = 0.021281639114022255
Validation loss = 0.0237711314111948
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03108866512775421
Validation loss = 0.025000400841236115
Validation loss = 0.02470560185611248
Validation loss = 0.02408801019191742
Validation loss = 0.023511042818427086
Validation loss = 0.02307719551026821
Validation loss = 0.023561246693134308
Validation loss = 0.022633560001850128
Validation loss = 0.024217786267399788
Validation loss = 0.022489579394459724
Validation loss = 0.02272067964076996
Validation loss = 0.023865321651101112
Validation loss = 0.021983200684189796
Validation loss = 0.022194797173142433
Validation loss = 0.021893376484513283
Validation loss = 0.0228715892881155
Validation loss = 0.02252938412129879
Validation loss = 0.022189686074852943
Validation loss = 0.021788761019706726
Validation loss = 0.021214663982391357
Validation loss = 0.02158312499523163
Validation loss = 0.021334579214453697
Validation loss = 0.02067532390356064
Validation loss = 0.021838849410414696
Validation loss = 0.021357176825404167
Validation loss = 0.02078176662325859
Validation loss = 0.022159134969115257
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 782      |
| Iteration     | 11       |
| MaximumReturn | 1.86e+03 |
| MinimumReturn | -224     |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029134951531887054
Validation loss = 0.0240433719009161
Validation loss = 0.0239450354129076
Validation loss = 0.024775715544819832
Validation loss = 0.02391732484102249
Validation loss = 0.023652823641896248
Validation loss = 0.02411082573235035
Validation loss = 0.024073081091046333
Validation loss = 0.023096634075045586
Validation loss = 0.023211650550365448
Validation loss = 0.026631001383066177
Validation loss = 0.023653920739889145
Validation loss = 0.022682564333081245
Validation loss = 0.02347853034734726
Validation loss = 0.023310629650950432
Validation loss = 0.0223796758800745
Validation loss = 0.02204805798828602
Validation loss = 0.02227000519633293
Validation loss = 0.024234116077423096
Validation loss = 0.02237708307802677
Validation loss = 0.024882223457098007
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.028063591569662094
Validation loss = 0.022540297359228134
Validation loss = 0.022032249718904495
Validation loss = 0.022124242037534714
Validation loss = 0.023037953302264214
Validation loss = 0.021665390580892563
Validation loss = 0.02231184020638466
Validation loss = 0.021904777735471725
Validation loss = 0.02104809135198593
Validation loss = 0.02245938591659069
Validation loss = 0.020685765892267227
Validation loss = 0.0216357484459877
Validation loss = 0.022059477865695953
Validation loss = 0.02124367281794548
Validation loss = 0.021433386951684952
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.032423265278339386
Validation loss = 0.024469394236803055
Validation loss = 0.023851560428738594
Validation loss = 0.02326977625489235
Validation loss = 0.022514935582876205
Validation loss = 0.02329130843281746
Validation loss = 0.025065729394555092
Validation loss = 0.021511267870664597
Validation loss = 0.024921007454395294
Validation loss = 0.022067228332161903
Validation loss = 0.022670229896903038
Validation loss = 0.02189391665160656
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02647084929049015
Validation loss = 0.021277762949466705
Validation loss = 0.021327348425984383
Validation loss = 0.020080795511603355
Validation loss = 0.02175329253077507
Validation loss = 0.02129681222140789
Validation loss = 0.020140888169407845
Validation loss = 0.02096029371023178
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025945592671632767
Validation loss = 0.020261406898498535
Validation loss = 0.02034265175461769
Validation loss = 0.020870914682745934
Validation loss = 0.020022807642817497
Validation loss = 0.019966423511505127
Validation loss = 0.019172918051481247
Validation loss = 0.020278332754969597
Validation loss = 0.019540807232260704
Validation loss = 0.021396277472376823
Validation loss = 0.01898576319217682
Validation loss = 0.019365200772881508
Validation loss = 0.01935509592294693
Validation loss = 0.01875486597418785
Validation loss = 0.019434729591012
Validation loss = 0.01837565191090107
Validation loss = 0.020514728501439095
Validation loss = 0.01809803768992424
Validation loss = 0.01869954541325569
Validation loss = 0.01880144141614437
Validation loss = 0.01827244833111763
Validation loss = 0.018349766731262207
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 805      |
| Iteration     | 12       |
| MaximumReturn | 2.03e+03 |
| MinimumReturn | -117     |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03547412529587746
Validation loss = 0.02693307399749756
Validation loss = 0.026504844427108765
Validation loss = 0.027125628665089607
Validation loss = 0.02676844783127308
Validation loss = 0.027544599026441574
Validation loss = 0.02763051725924015
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03587542101740837
Validation loss = 0.027930567041039467
Validation loss = 0.02690199203789234
Validation loss = 0.027027666568756104
Validation loss = 0.026762543246150017
Validation loss = 0.026477811858057976
Validation loss = 0.025125553831458092
Validation loss = 0.02662251703441143
Validation loss = 0.02658635936677456
Validation loss = 0.025439292192459106
Validation loss = 0.027700666338205338
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03570501133799553
Validation loss = 0.026857653632760048
Validation loss = 0.026974255219101906
Validation loss = 0.027711069211363792
Validation loss = 0.026360778138041496
Validation loss = 0.0299146119505167
Validation loss = 0.02608257345855236
Validation loss = 0.02657567523419857
Validation loss = 0.027672255411744118
Validation loss = 0.02624266967177391
Validation loss = 0.02984674833714962
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03178000450134277
Validation loss = 0.025583770126104355
Validation loss = 0.02567215822637081
Validation loss = 0.02588823437690735
Validation loss = 0.0263731200248003
Validation loss = 0.024122940376400948
Validation loss = 0.026486216112971306
Validation loss = 0.025148581713438034
Validation loss = 0.02414553426206112
Validation loss = 0.024255899712443352
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.034101855009794235
Validation loss = 0.026089001446962357
Validation loss = 0.025776183232665062
Validation loss = 0.024733854457736015
Validation loss = 0.0243796668946743
Validation loss = 0.024331599473953247
Validation loss = 0.02335391379892826
Validation loss = 0.031030958518385887
Validation loss = 0.02273031510412693
Validation loss = 0.024501627311110497
Validation loss = 0.02227720431983471
Validation loss = 0.024472426623106003
Validation loss = 0.0232294499874115
Validation loss = 0.0222012996673584
Validation loss = 0.026006648316979408
Validation loss = 0.021938985213637352
Validation loss = 0.023039182648062706
Validation loss = 0.022753408178687096
Validation loss = 0.02311338484287262
Validation loss = 0.021787768229842186
Validation loss = 0.02251528576016426
Validation loss = 0.023106159642338753
Validation loss = 0.02189655974507332
Validation loss = 0.02256409265100956
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.07e+03 |
| Iteration     | 13       |
| MaximumReturn | 1.95e+03 |
| MinimumReturn | 215      |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026580162346363068
Validation loss = 0.020722925662994385
Validation loss = 0.020738331601023674
Validation loss = 0.02188059873878956
Validation loss = 0.02104782871901989
Validation loss = 0.020363591611385345
Validation loss = 0.020428629592061043
Validation loss = 0.01979927532374859
Validation loss = 0.02063176780939102
Validation loss = 0.02054543048143387
Validation loss = 0.01924162730574608
Validation loss = 0.021099980920553207
Validation loss = 0.01939014159142971
Validation loss = 0.020626580342650414
Validation loss = 0.019982166588306427
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026101481169462204
Validation loss = 0.020708097144961357
Validation loss = 0.020411115139722824
Validation loss = 0.019713541492819786
Validation loss = 0.021714219823479652
Validation loss = 0.019334089010953903
Validation loss = 0.019775398075580597
Validation loss = 0.01865016110241413
Validation loss = 0.01918926276266575
Validation loss = 0.01964467763900757
Validation loss = 0.018970083445310593
Validation loss = 0.022829752415418625
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02895621582865715
Validation loss = 0.02170439623296261
Validation loss = 0.021809080615639687
Validation loss = 0.020017238333821297
Validation loss = 0.020906012505292892
Validation loss = 0.019844574853777885
Validation loss = 0.019656555727124214
Validation loss = 0.0205336082726717
Validation loss = 0.01946580968797207
Validation loss = 0.021392883732914925
Validation loss = 0.01950731873512268
Validation loss = 0.0206373892724514
Validation loss = 0.020261792466044426
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023713918402791023
Validation loss = 0.01951431855559349
Validation loss = 0.019848791882395744
Validation loss = 0.020137865096330643
Validation loss = 0.018673811107873917
Validation loss = 0.019765837118029594
Validation loss = 0.019494913518428802
Validation loss = 0.02128824032843113
Validation loss = 0.017948025837540627
Validation loss = 0.018625877797603607
Validation loss = 0.01947247050702572
Validation loss = 0.0194232277572155
Validation loss = 0.01783093623816967
Validation loss = 0.020372193306684494
Validation loss = 0.017395980656147003
Validation loss = 0.01820952631533146
Validation loss = 0.01761799305677414
Validation loss = 0.017883004620671272
Validation loss = 0.017490558326244354
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025825805962085724
Validation loss = 0.018167423084378242
Validation loss = 0.01749296672642231
Validation loss = 0.017538240179419518
Validation loss = 0.018604259938001633
Validation loss = 0.01731409691274166
Validation loss = 0.01639511249959469
Validation loss = 0.016893668100237846
Validation loss = 0.01749139092862606
Validation loss = 0.018066206946969032
Validation loss = 0.015889249742031097
Validation loss = 0.017461908981204033
Validation loss = 0.016132429242134094
Validation loss = 0.01622614823281765
Validation loss = 0.0170957762748003
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.19e+03 |
| Iteration     | 14       |
| MaximumReturn | 1.98e+03 |
| MinimumReturn | -145     |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023928511887788773
Validation loss = 0.018101293593645096
Validation loss = 0.018010735511779785
Validation loss = 0.01924620196223259
Validation loss = 0.01931077614426613
Validation loss = 0.01930342987179756
Validation loss = 0.017848854884505272
Validation loss = 0.017639243975281715
Validation loss = 0.02159728854894638
Validation loss = 0.01805111952126026
Validation loss = 0.01866954006254673
Validation loss = 0.017086416482925415
Validation loss = 0.01759643852710724
Validation loss = 0.017845164984464645
Validation loss = 0.017208553850650787
Validation loss = 0.020152166485786438
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022737856954336166
Validation loss = 0.018332015722990036
Validation loss = 0.018256962299346924
Validation loss = 0.019186455756425858
Validation loss = 0.017475446686148643
Validation loss = 0.01739462837576866
Validation loss = 0.017950017005205154
Validation loss = 0.01729336753487587
Validation loss = 0.016634276136755943
Validation loss = 0.020216956734657288
Validation loss = 0.016466576606035233
Validation loss = 0.018516454845666885
Validation loss = 0.016644906252622604
Validation loss = 0.016976481303572655
Validation loss = 0.017929721623659134
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022612012922763824
Validation loss = 0.018671806901693344
Validation loss = 0.018260981887578964
Validation loss = 0.01848629117012024
Validation loss = 0.01924421265721321
Validation loss = 0.018503084778785706
Validation loss = 0.01788795366883278
Validation loss = 0.017765386030077934
Validation loss = 0.01801314949989319
Validation loss = 0.019657563418149948
Validation loss = 0.018349919468164444
Validation loss = 0.01858065277338028
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01942722499370575
Validation loss = 0.01710142195224762
Validation loss = 0.01659499853849411
Validation loss = 0.01692294329404831
Validation loss = 0.01624809205532074
Validation loss = 0.01705170050263405
Validation loss = 0.017217956483364105
Validation loss = 0.016699206084012985
Validation loss = 0.015860965475440025
Validation loss = 0.017335522919893265
Validation loss = 0.015981368720531464
Validation loss = 0.015756577253341675
Validation loss = 0.01842942088842392
Validation loss = 0.015693843364715576
Validation loss = 0.016620147973299026
Validation loss = 0.01581326499581337
Validation loss = 0.01580844260752201
Validation loss = 0.016214117407798767
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01952856034040451
Validation loss = 0.015661144629120827
Validation loss = 0.01573016494512558
Validation loss = 0.01566614769399166
Validation loss = 0.015136746689677238
Validation loss = 0.01612173020839691
Validation loss = 0.015286028385162354
Validation loss = 0.015476599335670471
Validation loss = 0.015245028771460056
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.94e+03 |
| Iteration     | 15       |
| MaximumReturn | 2.26e+03 |
| MinimumReturn | 1.04e+03 |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02052871510386467
Validation loss = 0.0159517340362072
Validation loss = 0.016353758051991463
Validation loss = 0.016010306775569916
Validation loss = 0.015446708537638187
Validation loss = 0.018283739686012268
Validation loss = 0.015628796070814133
Validation loss = 0.01552038174122572
Validation loss = 0.01672508381307125
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018546070903539658
Validation loss = 0.015616842545568943
Validation loss = 0.01582680642604828
Validation loss = 0.017723657190799713
Validation loss = 0.01568526029586792
Validation loss = 0.017049327492713928
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020012451335787773
Validation loss = 0.0167899951338768
Validation loss = 0.01649472303688526
Validation loss = 0.017548084259033203
Validation loss = 0.01632300391793251
Validation loss = 0.016068659722805023
Validation loss = 0.01649506762623787
Validation loss = 0.015568945556879044
Validation loss = 0.01819964125752449
Validation loss = 0.015316691249608994
Validation loss = 0.016292305663228035
Validation loss = 0.016097452491521835
Validation loss = 0.015556544065475464
Validation loss = 0.01636114902794361
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017618387937545776
Validation loss = 0.015170730650424957
Validation loss = 0.014973821118474007
Validation loss = 0.015301913022994995
Validation loss = 0.014497369527816772
Validation loss = 0.014804563485085964
Validation loss = 0.01675971783697605
Validation loss = 0.014206103049218655
Validation loss = 0.01418947521597147
Validation loss = 0.01639927737414837
Validation loss = 0.01468665897846222
Validation loss = 0.014642536640167236
Validation loss = 0.014924989081919193
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01826615259051323
Validation loss = 0.014444134198129177
Validation loss = 0.015462016686797142
Validation loss = 0.013924567960202694
Validation loss = 0.015232797712087631
Validation loss = 0.014047380536794662
Validation loss = 0.01359585952013731
Validation loss = 0.015347735024988651
Validation loss = 0.013171927072107792
Validation loss = 0.013894457370042801
Validation loss = 0.01416966412216425
Validation loss = 0.013201363384723663
Validation loss = 0.013582910411059856
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.43e+03 |
| Iteration     | 16       |
| MaximumReturn | 2.2e+03  |
| MinimumReturn | -16.4    |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018977122381329536
Validation loss = 0.014866067096590996
Validation loss = 0.014972378499805927
Validation loss = 0.015211707912385464
Validation loss = 0.014654634520411491
Validation loss = 0.01483498141169548
Validation loss = 0.015370958484709263
Validation loss = 0.01508080493658781
Validation loss = 0.01392783410847187
Validation loss = 0.015657536685466766
Validation loss = 0.014180325902998447
Validation loss = 0.01454627700150013
Validation loss = 0.015133966691792011
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017598643898963928
Validation loss = 0.016427038237452507
Validation loss = 0.016598708927631378
Validation loss = 0.014874045737087727
Validation loss = 0.015123252756893635
Validation loss = 0.01490617636591196
Validation loss = 0.01633932627737522
Validation loss = 0.014687336049973965
Validation loss = 0.015111668035387993
Validation loss = 0.015341747552156448
Validation loss = 0.01397639699280262
Validation loss = 0.014676996506750584
Validation loss = 0.013686907477676868
Validation loss = 0.014324035495519638
Validation loss = 0.015116921626031399
Validation loss = 0.014170565642416477
Validation loss = 0.013495132327079773
Validation loss = 0.015614370815455914
Validation loss = 0.013756252825260162
Validation loss = 0.013997089117765427
Validation loss = 0.0129234055057168
Validation loss = 0.014891018159687519
Validation loss = 0.013969914987683296
Validation loss = 0.013142405077815056
Validation loss = 0.014849240891635418
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019151147454977036
Validation loss = 0.016129685565829277
Validation loss = 0.014380461536347866
Validation loss = 0.015117956325411797
Validation loss = 0.014372674748301506
Validation loss = 0.01769375428557396
Validation loss = 0.013476576656103134
Validation loss = 0.013887974433600903
Validation loss = 0.015566673129796982
Validation loss = 0.014275663532316685
Validation loss = 0.014485958963632584
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01616290956735611
Validation loss = 0.013898900710046291
Validation loss = 0.014472686685621738
Validation loss = 0.01360059343278408
Validation loss = 0.013611395843327045
Validation loss = 0.013709062710404396
Validation loss = 0.013797126710414886
Validation loss = 0.014346333220601082
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016297850757837296
Validation loss = 0.01271999254822731
Validation loss = 0.014051662757992744
Validation loss = 0.0139558594673872
Validation loss = 0.013489940203726292
Validation loss = 0.012694589793682098
Validation loss = 0.013312561437487602
Validation loss = 0.01267924066632986
Validation loss = 0.01355450414121151
Validation loss = 0.013263087719678879
Validation loss = 0.01294210646301508
Validation loss = 0.012789154425263405
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 937      |
| Iteration     | 17       |
| MaximumReturn | 2.2e+03  |
| MinimumReturn | -281     |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01702021062374115
Validation loss = 0.01440334040671587
Validation loss = 0.014585604891180992
Validation loss = 0.013619459234178066
Validation loss = 0.014798730611801147
Validation loss = 0.014022075571119785
Validation loss = 0.015077603049576283
Validation loss = 0.012986320070922375
Validation loss = 0.014156264252960682
Validation loss = 0.012905162759125233
Validation loss = 0.015229853801429272
Validation loss = 0.01317431777715683
Validation loss = 0.014412770047783852
Validation loss = 0.012845827266573906
Validation loss = 0.014715274795889854
Validation loss = 0.013111360371112823
Validation loss = 0.013794144615530968
Validation loss = 0.013823021203279495
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015598874539136887
Validation loss = 0.01315742451697588
Validation loss = 0.0137320039793849
Validation loss = 0.01413656398653984
Validation loss = 0.011996057815849781
Validation loss = 0.0130640072748065
Validation loss = 0.01250756997615099
Validation loss = 0.013597146607935429
Validation loss = 0.012766790576279163
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01623430661857128
Validation loss = 0.014971042983233929
Validation loss = 0.013580518774688244
Validation loss = 0.014360368251800537
Validation loss = 0.013104881159961224
Validation loss = 0.014830715022981167
Validation loss = 0.012920419685542583
Validation loss = 0.013962773606181145
Validation loss = 0.013066040351986885
Validation loss = 0.01323299016803503
Validation loss = 0.012968038208782673
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01543404906988144
Validation loss = 0.013092824257910252
Validation loss = 0.012904656119644642
Validation loss = 0.013608254492282867
Validation loss = 0.012591858394443989
Validation loss = 0.01431537140160799
Validation loss = 0.012770608998835087
Validation loss = 0.013266067951917648
Validation loss = 0.012828079052269459
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015685908496379852
Validation loss = 0.012234758585691452
Validation loss = 0.012412925250828266
Validation loss = 0.012352622114121914
Validation loss = 0.011933828704059124
Validation loss = 0.013122538104653358
Validation loss = 0.012469671666622162
Validation loss = 0.012184711173176765
Validation loss = 0.012597255408763885
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.61e+03 |
| Iteration     | 18       |
| MaximumReturn | 2.45e+03 |
| MinimumReturn | -26.6    |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014663162641227245
Validation loss = 0.011969504877924919
Validation loss = 0.012755528092384338
Validation loss = 0.012124432250857353
Validation loss = 0.012641984038054943
Validation loss = 0.012256038375198841
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014026026241481304
Validation loss = 0.012418096885085106
Validation loss = 0.01328733004629612
Validation loss = 0.012020422145724297
Validation loss = 0.013031572103500366
Validation loss = 0.011805696412920952
Validation loss = 0.011662086471915245
Validation loss = 0.013270040974020958
Validation loss = 0.012065931223332882
Validation loss = 0.012203266844153404
Validation loss = 0.011049512773752213
Validation loss = 0.01114489696919918
Validation loss = 0.01229854952543974
Validation loss = 0.010874778963625431
Validation loss = 0.01180986873805523
Validation loss = 0.011448448523879051
Validation loss = 0.011416693218052387
Validation loss = 0.011480839923024178
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015339871868491173
Validation loss = 0.012435602955520153
Validation loss = 0.013243812136352062
Validation loss = 0.012101319618523121
Validation loss = 0.013336164876818657
Validation loss = 0.012148183770477772
Validation loss = 0.013510966673493385
Validation loss = 0.012373353354632854
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014689642004668713
Validation loss = 0.012119625695049763
Validation loss = 0.01216648705303669
Validation loss = 0.014418172650039196
Validation loss = 0.011747390031814575
Validation loss = 0.012520315125584602
Validation loss = 0.012110558338463306
Validation loss = 0.01186075434088707
Validation loss = 0.011997884139418602
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014390021562576294
Validation loss = 0.01183002907782793
Validation loss = 0.011958524584770203
Validation loss = 0.011894137598574162
Validation loss = 0.01120837777853012
Validation loss = 0.011565066874027252
Validation loss = 0.011594749055802822
Validation loss = 0.011588280089199543
Validation loss = 0.011493630707263947
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.8e+03  |
| Iteration     | 19       |
| MaximumReturn | 2.31e+03 |
| MinimumReturn | -51.9    |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016814062371850014
Validation loss = 0.011659134179353714
Validation loss = 0.012194261886179447
Validation loss = 0.01271798089146614
Validation loss = 0.01164705865085125
Validation loss = 0.011483314447104931
Validation loss = 0.011904488317668438
Validation loss = 0.011820606887340546
Validation loss = 0.011244131252169609
Validation loss = 0.011416471563279629
Validation loss = 0.01172570139169693
Validation loss = 0.011315159499645233
Validation loss = 0.012070225551724434
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013110036961734295
Validation loss = 0.011483644135296345
Validation loss = 0.011091860011219978
Validation loss = 0.011243927292525768
Validation loss = 0.011681427247822285
Validation loss = 0.011172541417181492
Validation loss = 0.012746796943247318
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014024321921169758
Validation loss = 0.01169542782008648
Validation loss = 0.012200918979942799
Validation loss = 0.01165934931486845
Validation loss = 0.011828027665615082
Validation loss = 0.011584619991481304
Validation loss = 0.011214322410523891
Validation loss = 0.011464275419712067
Validation loss = 0.012474230490624905
Validation loss = 0.012022841721773148
Validation loss = 0.012742586433887482
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013508206233382225
Validation loss = 0.011609626933932304
Validation loss = 0.013388956896960735
Validation loss = 0.011098973453044891
Validation loss = 0.011719818226993084
Validation loss = 0.011433150619268417
Validation loss = 0.011189057491719723
Validation loss = 0.012040110304951668
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013106892816722393
Validation loss = 0.010618217289447784
Validation loss = 0.011527772061526775
Validation loss = 0.011925808154046535
Validation loss = 0.010395116172730923
Validation loss = 0.010715825483202934
Validation loss = 0.010617183521389961
Validation loss = 0.01043002214282751
Validation loss = 0.010452862828969955
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.21e+03 |
| Iteration     | 20       |
| MaximumReturn | 2.16e+03 |
| MinimumReturn | 47.6     |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01322101429104805
Validation loss = 0.01131389755755663
Validation loss = 0.01116096880286932
Validation loss = 0.010961727239191532
Validation loss = 0.013041765429079533
Validation loss = 0.011952617205679417
Validation loss = 0.010900052264332771
Validation loss = 0.012168264016509056
Validation loss = 0.010731003247201443
Validation loss = 0.010345609858632088
Validation loss = 0.010556366294622421
Validation loss = 0.011028626002371311
Validation loss = 0.010346553288400173
Validation loss = 0.010959886945784092
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012454512529075146
Validation loss = 0.010613132268190384
Validation loss = 0.010414616204798222
Validation loss = 0.011162655428051949
Validation loss = 0.010842195712029934
Validation loss = 0.010712013579905033
Validation loss = 0.011083021759986877
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01274485606700182
Validation loss = 0.011462376452982426
Validation loss = 0.011104260571300983
Validation loss = 0.0113518126308918
Validation loss = 0.01126288715749979
Validation loss = 0.012135008350014687
Validation loss = 0.012403014115989208
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013173389248549938
Validation loss = 0.011431685648858547
Validation loss = 0.010983697138726711
Validation loss = 0.010934022255241871
Validation loss = 0.010444100014865398
Validation loss = 0.011434792540967464
Validation loss = 0.010710484348237514
Validation loss = 0.01193785946816206
Validation loss = 0.01115392055362463
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011967506259679794
Validation loss = 0.01031208410859108
Validation loss = 0.010926305316388607
Validation loss = 0.01020490750670433
Validation loss = 0.012388136237859726
Validation loss = 0.009893124923110008
Validation loss = 0.010491595603525639
Validation loss = 0.009957642294466496
Validation loss = 0.009979289025068283
Validation loss = 0.011154640465974808
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.11e+03 |
| Iteration     | 21       |
| MaximumReturn | 1.91e+03 |
| MinimumReturn | 85.2     |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015181313268840313
Validation loss = 0.010372902266681194
Validation loss = 0.010554109700024128
Validation loss = 0.010352526791393757
Validation loss = 0.010472998954355717
Validation loss = 0.010438191704452038
Validation loss = 0.012976360507309437
Validation loss = 0.00991982314735651
Validation loss = 0.010197031311690807
Validation loss = 0.011890369467437267
Validation loss = 0.010652247816324234
Validation loss = 0.010575090534985065
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012449929490685463
Validation loss = 0.010603413917124271
Validation loss = 0.010072517208755016
Validation loss = 0.010433829389512539
Validation loss = 0.010142559185624123
Validation loss = 0.010144560597836971
Validation loss = 0.011325309053063393
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013024264946579933
Validation loss = 0.011066215112805367
Validation loss = 0.010659613646566868
Validation loss = 0.011285735294222832
Validation loss = 0.01000386755913496
Validation loss = 0.012102846056222916
Validation loss = 0.010200798511505127
Validation loss = 0.011598566547036171
Validation loss = 0.010490710847079754
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01153399609029293
Validation loss = 0.010493963956832886
Validation loss = 0.01070255134254694
Validation loss = 0.010998683981597424
Validation loss = 0.010727734304964542
Validation loss = 0.01081896387040615
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011681964620947838
Validation loss = 0.009901097044348717
Validation loss = 0.010585705749690533
Validation loss = 0.009479466825723648
Validation loss = 0.010004173964262009
Validation loss = 0.009343118406832218
Validation loss = 0.010095598176121712
Validation loss = 0.009702800773084164
Validation loss = 0.01093231700360775
Validation loss = 0.009792895056307316
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 643      |
| Iteration     | 22       |
| MaximumReturn | 1.59e+03 |
| MinimumReturn | 41       |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011892672628164291
Validation loss = 0.010287951678037643
Validation loss = 0.009698338806629181
Validation loss = 0.01066095381975174
Validation loss = 0.009888920933008194
Validation loss = 0.009943113662302494
Validation loss = 0.009534088894724846
Validation loss = 0.009628556668758392
Validation loss = 0.0101660480722785
Validation loss = 0.009749400429427624
Validation loss = 0.010754422284662724
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011443753726780415
Validation loss = 0.009804865345358849
Validation loss = 0.01009511947631836
Validation loss = 0.010204852558672428
Validation loss = 0.009659347124397755
Validation loss = 0.00963688176125288
Validation loss = 0.00918998010456562
Validation loss = 0.009747461415827274
Validation loss = 0.010106300935149193
Validation loss = 0.009533786214888096
Validation loss = 0.00999640952795744
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010918932966887951
Validation loss = 0.010704350657761097
Validation loss = 0.009871159680187702
Validation loss = 0.010817560367286205
Validation loss = 0.010486088693141937
Validation loss = 0.010277199558913708
Validation loss = 0.010039136745035648
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011155749671161175
Validation loss = 0.010487034916877747
Validation loss = 0.011151601560413837
Validation loss = 0.010777938179671764
Validation loss = 0.010294245555996895
Validation loss = 0.009645644575357437
Validation loss = 0.010097546502947807
Validation loss = 0.011760993860661983
Validation loss = 0.010328643023967743
Validation loss = 0.010084616020321846
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010984283871948719
Validation loss = 0.00949279684573412
Validation loss = 0.010386395268142223
Validation loss = 0.010807576589286327
Validation loss = 0.009200327098369598
Validation loss = 0.00968781765550375
Validation loss = 0.009573419578373432
Validation loss = 0.009823426604270935
Validation loss = 0.009457595646381378
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -118     |
| Iteration     | 23       |
| MaximumReturn | 116      |
| MinimumReturn | -305     |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01133701205253601
Validation loss = 0.009823479689657688
Validation loss = 0.010776306502521038
Validation loss = 0.0098161855712533
Validation loss = 0.009761836379766464
Validation loss = 0.009919995442032814
Validation loss = 0.010297105647623539
Validation loss = 0.009462093934416771
Validation loss = 0.010025614872574806
Validation loss = 0.009274784475564957
Validation loss = 0.010452616959810257
Validation loss = 0.009237227961421013
Validation loss = 0.010135701857507229
Validation loss = 0.009046386927366257
Validation loss = 0.009694159030914307
Validation loss = 0.009144771844148636
Validation loss = 0.010458629578351974
Validation loss = 0.009177534841001034
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01086551696062088
Validation loss = 0.009956657886505127
Validation loss = 0.009716140106320381
Validation loss = 0.00930994376540184
Validation loss = 0.009628144092857838
Validation loss = 0.009272669441998005
Validation loss = 0.01013247575610876
Validation loss = 0.008995809592306614
Validation loss = 0.00874527357518673
Validation loss = 0.009074791334569454
Validation loss = 0.010390426963567734
Validation loss = 0.008930086158216
Validation loss = 0.009494646452367306
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012226061895489693
Validation loss = 0.00982552021741867
Validation loss = 0.010167316533625126
Validation loss = 0.009241197258234024
Validation loss = 0.011001107282936573
Validation loss = 0.009461787529289722
Validation loss = 0.009858911857008934
Validation loss = 0.009990384802222252
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011246024630963802
Validation loss = 0.010901185683906078
Validation loss = 0.009971918538212776
Validation loss = 0.010197009891271591
Validation loss = 0.009924748912453651
Validation loss = 0.010357257910072803
Validation loss = 0.009447977878153324
Validation loss = 0.010000854730606079
Validation loss = 0.010348639450967312
Validation loss = 0.009982497431337833
Validation loss = 0.009299740195274353
Validation loss = 0.009171947836875916
Validation loss = 0.0101923868060112
Validation loss = 0.00951672438532114
Validation loss = 0.010046563111245632
Validation loss = 0.009122362360358238
Validation loss = 0.009819885715842247
Validation loss = 0.009040500968694687
Validation loss = 0.00970129482448101
Validation loss = 0.009693467989563942
Validation loss = 0.009913142770528793
Validation loss = 0.009140604175627232
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010894844308495522
Validation loss = 0.009549089707434177
Validation loss = 0.009752358309924603
Validation loss = 0.008969915099442005
Validation loss = 0.009770483709871769
Validation loss = 0.00893121212720871
Validation loss = 0.009483348578214645
Validation loss = 0.009237805381417274
Validation loss = 0.009183848276734352
Validation loss = 0.009278563782572746
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -117     |
| Iteration     | 24       |
| MaximumReturn | 258      |
| MinimumReturn | -304     |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010803768411278725
Validation loss = 0.009177942760288715
Validation loss = 0.009882375597953796
Validation loss = 0.009217082522809505
Validation loss = 0.009356286376714706
Validation loss = 0.009626330807805061
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010640661232173443
Validation loss = 0.009067644365131855
Validation loss = 0.009354611858725548
Validation loss = 0.009787441231310368
Validation loss = 0.009038614109158516
Validation loss = 0.009444975294172764
Validation loss = 0.01001699734479189
Validation loss = 0.008897654712200165
Validation loss = 0.009157436899840832
Validation loss = 0.009985730051994324
Validation loss = 0.009271095506846905
Validation loss = 0.008614092133939266
Validation loss = 0.00926824752241373
Validation loss = 0.009746103547513485
Validation loss = 0.00888756476342678
Validation loss = 0.009284721687436104
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011954047717154026
Validation loss = 0.009796066209673882
Validation loss = 0.009986696764826775
Validation loss = 0.010007288306951523
Validation loss = 0.009823649190366268
Validation loss = 0.009761938825249672
Validation loss = 0.010571480728685856
Validation loss = 0.00915775541216135
Validation loss = 0.00952400453388691
Validation loss = 0.010226396843791008
Validation loss = 0.009658843278884888
Validation loss = 0.010256931185722351
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010635818354785442
Validation loss = 0.00925647746771574
Validation loss = 0.009812301024794579
Validation loss = 0.008811271749436855
Validation loss = 0.009874511510133743
Validation loss = 0.009303951635956764
Validation loss = 0.009676286019384861
Validation loss = 0.009762578643858433
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01036594808101654
Validation loss = 0.009282543323934078
Validation loss = 0.009467744268476963
Validation loss = 0.00983351655304432
Validation loss = 0.009836957789957523
Validation loss = 0.008649466559290886
Validation loss = 0.00909910723567009
Validation loss = 0.00867073517292738
Validation loss = 0.009173690341413021
Validation loss = 0.008782957680523396
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -169     |
| Iteration     | 25       |
| MaximumReturn | 504      |
| MinimumReturn | -617     |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010788400657474995
Validation loss = 0.009335844777524471
Validation loss = 0.00917472317814827
Validation loss = 0.009187094867229462
Validation loss = 0.009350142441689968
Validation loss = 0.008903270587325096
Validation loss = 0.009839748963713646
Validation loss = 0.009677953086793423
Validation loss = 0.009366433136165142
Validation loss = 0.009065828286111355
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010792110115289688
Validation loss = 0.009472129866480827
Validation loss = 0.009229988791048527
Validation loss = 0.008555089123547077
Validation loss = 0.008970102295279503
Validation loss = 0.009170922450721264
Validation loss = 0.008978303521871567
Validation loss = 0.009688206948339939
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011095170862972736
Validation loss = 0.009481861256062984
Validation loss = 0.010017008520662785
Validation loss = 0.009981955401599407
Validation loss = 0.009954429231584072
Validation loss = 0.010795611888170242
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009730542078614235
Validation loss = 0.009343747980892658
Validation loss = 0.010072215460240841
Validation loss = 0.00990020576864481
Validation loss = 0.009775093756616116
Validation loss = 0.00950850173830986
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009687621146440506
Validation loss = 0.008852369152009487
Validation loss = 0.008744935505092144
Validation loss = 0.00920642726123333
Validation loss = 0.00879750121384859
Validation loss = 0.008534890599548817
Validation loss = 0.008743729442358017
Validation loss = 0.008731122128665447
Validation loss = 0.009055311791598797
Validation loss = 0.008792695589363575
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 808      |
| Iteration     | 26       |
| MaximumReturn | 2.13e+03 |
| MinimumReturn | -318     |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010208440944552422
Validation loss = 0.008674686774611473
Validation loss = 0.009086345322430134
Validation loss = 0.009584560059010983
Validation loss = 0.008974098600447178
Validation loss = 0.009209533222019672
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008988210000097752
Validation loss = 0.008718868717551231
Validation loss = 0.00949411652982235
Validation loss = 0.00896273273974657
Validation loss = 0.008664323017001152
Validation loss = 0.008708222769200802
Validation loss = 0.010213307105004787
Validation loss = 0.00887737050652504
Validation loss = 0.00945527758449316
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010123027488589287
Validation loss = 0.009287746623158455
Validation loss = 0.00952321570366621
Validation loss = 0.009370727464556694
Validation loss = 0.009175730869174004
Validation loss = 0.009759972803294659
Validation loss = 0.009953654371201992
Validation loss = 0.00890739168971777
Validation loss = 0.008775484748184681
Validation loss = 0.008717008866369724
Validation loss = 0.0096582705155015
Validation loss = 0.009709805250167847
Validation loss = 0.008826939389109612
Validation loss = 0.008980571292340755
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009727011434733868
Validation loss = 0.0088968425989151
Validation loss = 0.009140177629888058
Validation loss = 0.00918423943221569
Validation loss = 0.01019089762121439
Validation loss = 0.009282314218580723
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009053783491253853
Validation loss = 0.009049361571669579
Validation loss = 0.009583753533661366
Validation loss = 0.008403605781495571
Validation loss = 0.008887805975973606
Validation loss = 0.008797606453299522
Validation loss = 0.008477087132632732
Validation loss = 0.008571786805987358
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.69e+03 |
| Iteration     | 27       |
| MaximumReturn | 2.46e+03 |
| MinimumReturn | -4.37    |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009518922306597233
Validation loss = 0.008404368534684181
Validation loss = 0.00905547384172678
Validation loss = 0.008530940860509872
Validation loss = 0.008713911287486553
Validation loss = 0.008492382243275642
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009346937760710716
Validation loss = 0.008699247613549232
Validation loss = 0.009592147544026375
Validation loss = 0.008769468404352665
Validation loss = 0.008577733300626278
Validation loss = 0.008263141848146915
Validation loss = 0.00885013584047556
Validation loss = 0.00954732671380043
Validation loss = 0.008544182404875755
Validation loss = 0.00823895912617445
Validation loss = 0.008427444845438004
Validation loss = 0.008199498988687992
Validation loss = 0.008388005197048187
Validation loss = 0.008122541010379791
Validation loss = 0.00863972119987011
Validation loss = 0.009088153019547462
Validation loss = 0.008495636284351349
Validation loss = 0.00776439206674695
Validation loss = 0.008475047536194324
Validation loss = 0.008331268094480038
Validation loss = 0.008312457241117954
Validation loss = 0.008294923231005669
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009798587299883366
Validation loss = 0.008756045252084732
Validation loss = 0.00944542232900858
Validation loss = 0.008207079023122787
Validation loss = 0.009224765002727509
Validation loss = 0.008604885078966618
Validation loss = 0.009373372420668602
Validation loss = 0.009102417156100273
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010420175269246101
Validation loss = 0.009029131382703781
Validation loss = 0.00870126485824585
Validation loss = 0.008907794952392578
Validation loss = 0.008190622553229332
Validation loss = 0.00858929194509983
Validation loss = 0.008340515196323395
Validation loss = 0.008491190150380135
Validation loss = 0.00915797520428896
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01020839810371399
Validation loss = 0.008927046321332455
Validation loss = 0.009548384696245193
Validation loss = 0.00886552780866623
Validation loss = 0.00895419530570507
Validation loss = 0.010157651267945766
Validation loss = 0.008333267644047737
Validation loss = 0.009082300588488579
Validation loss = 0.008179770782589912
Validation loss = 0.008077021688222885
Validation loss = 0.008736252784729004
Validation loss = 0.008867958560585976
Validation loss = 0.008692307397723198
Validation loss = 0.008542899042367935
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.53e+03 |
| Iteration     | 28       |
| MaximumReturn | 2.28e+03 |
| MinimumReturn | -124     |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00938798300921917
Validation loss = 0.008628714829683304
Validation loss = 0.008387524634599686
Validation loss = 0.00903343129903078
Validation loss = 0.008761510252952576
Validation loss = 0.008180682547390461
Validation loss = 0.008790195919573307
Validation loss = 0.008324684575200081
Validation loss = 0.009209981188178062
Validation loss = 0.008925444446504116
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008697175420820713
Validation loss = 0.007763589266687632
Validation loss = 0.008923584595322609
Validation loss = 0.008087639696896076
Validation loss = 0.008506735786795616
Validation loss = 0.009103293530642986
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010180655866861343
Validation loss = 0.0092307822778821
Validation loss = 0.00817799475044012
Validation loss = 0.008197939023375511
Validation loss = 0.008511717431247234
Validation loss = 0.00860491394996643
Validation loss = 0.00820060446858406
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010417607612907887
Validation loss = 0.008284350857138634
Validation loss = 0.008323733694851398
Validation loss = 0.007783425971865654
Validation loss = 0.008333997800946236
Validation loss = 0.008336693048477173
Validation loss = 0.0084150405600667
Validation loss = 0.008034758269786835
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00928285252302885
Validation loss = 0.008732589893043041
Validation loss = 0.009167700074613094
Validation loss = 0.00828335341066122
Validation loss = 0.007764455396682024
Validation loss = 0.008119145408272743
Validation loss = 0.009086056612432003
Validation loss = 0.008245824836194515
Validation loss = 0.009134634397923946
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.86e+03 |
| Iteration     | 29       |
| MaximumReturn | 2.46e+03 |
| MinimumReturn | -49      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008833881467580795
Validation loss = 0.009054790250957012
Validation loss = 0.008403369225561619
Validation loss = 0.007926275953650475
Validation loss = 0.007998696528375149
Validation loss = 0.008482392877340317
Validation loss = 0.008310466073453426
Validation loss = 0.00955321453511715
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008436809293925762
Validation loss = 0.008098718710243702
Validation loss = 0.007758101914077997
Validation loss = 0.008108563721179962
Validation loss = 0.008196362294256687
Validation loss = 0.007595191244035959
Validation loss = 0.007696446031332016
Validation loss = 0.007660531904548407
Validation loss = 0.007987502962350845
Validation loss = 0.007715377025306225
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00890391506254673
Validation loss = 0.008983151987195015
Validation loss = 0.008857621811330318
Validation loss = 0.008718453347682953
Validation loss = 0.008790592662990093
Validation loss = 0.00835352297872305
Validation loss = 0.008642787113785744
Validation loss = 0.008239777758717537
Validation loss = 0.008819963783025742
Validation loss = 0.008391011506319046
Validation loss = 0.00829886645078659
Validation loss = 0.007866374216973782
Validation loss = 0.007879487238824368
Validation loss = 0.008892504498362541
Validation loss = 0.007831210270524025
Validation loss = 0.008736769668757915
Validation loss = 0.008859130553901196
Validation loss = 0.008076171390712261
Validation loss = 0.008030989207327366
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008540146052837372
Validation loss = 0.00856725126504898
Validation loss = 0.008037989027798176
Validation loss = 0.008182545192539692
Validation loss = 0.00809798389673233
Validation loss = 0.008385416120290756
Validation loss = 0.007998079061508179
Validation loss = 0.008468417450785637
Validation loss = 0.008493679575622082
Validation loss = 0.00790381245315075
Validation loss = 0.008216285146772861
Validation loss = 0.008351190946996212
Validation loss = 0.007857169024646282
Validation loss = 0.008171397261321545
Validation loss = 0.007951248437166214
Validation loss = 0.008260415866971016
Validation loss = 0.007644112687557936
Validation loss = 0.008025784976780415
Validation loss = 0.007864676415920258
Validation loss = 0.008071140386164188
Validation loss = 0.00817932654172182
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008634456433355808
Validation loss = 0.007951268926262856
Validation loss = 0.008247220888733864
Validation loss = 0.00807342678308487
Validation loss = 0.007724483963102102
Validation loss = 0.007823395542800426
Validation loss = 0.008353929035365582
Validation loss = 0.008031576871871948
Validation loss = 0.008773455396294594
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.78e+03 |
| Iteration     | 30       |
| MaximumReturn | 2.52e+03 |
| MinimumReturn | 232      |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008498191833496094
Validation loss = 0.007846646942198277
Validation loss = 0.008049353957176208
Validation loss = 0.008176164701581001
Validation loss = 0.00865832157433033
Validation loss = 0.007885092869400978
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008942047134041786
Validation loss = 0.007658961229026318
Validation loss = 0.007739396765828133
Validation loss = 0.007941104471683502
Validation loss = 0.007766627706587315
Validation loss = 0.00784602016210556
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008531522937119007
Validation loss = 0.008632668294012547
Validation loss = 0.008026729337871075
Validation loss = 0.007775850594043732
Validation loss = 0.007998037151992321
Validation loss = 0.00892765261232853
Validation loss = 0.007712823338806629
Validation loss = 0.008260427042841911
Validation loss = 0.0077435411512851715
Validation loss = 0.007925551384687424
Validation loss = 0.007613359950482845
Validation loss = 0.008938031271100044
Validation loss = 0.00780158955603838
Validation loss = 0.007774150464683771
Validation loss = 0.007626493461430073
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008513720706105232
Validation loss = 0.007824055850505829
Validation loss = 0.007646775804460049
Validation loss = 0.007884666323661804
Validation loss = 0.008008444681763649
Validation loss = 0.007734235841780901
Validation loss = 0.007928432896733284
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00825608242303133
Validation loss = 0.008726869709789753
Validation loss = 0.007636500522494316
Validation loss = 0.007510200142860413
Validation loss = 0.00843640137463808
Validation loss = 0.008199081756174564
Validation loss = 0.00860045850276947
Validation loss = 0.007658864371478558
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.68e+03 |
| Iteration     | 31       |
| MaximumReturn | 2.4e+03  |
| MinimumReturn | 409      |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008400830440223217
Validation loss = 0.008045363239943981
Validation loss = 0.007875807583332062
Validation loss = 0.008428939618170261
Validation loss = 0.007387850899249315
Validation loss = 0.008287970907986164
Validation loss = 0.008498570881783962
Validation loss = 0.007487480062991381
Validation loss = 0.008512325584888458
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008674480020999908
Validation loss = 0.007444957736879587
Validation loss = 0.008140132762491703
Validation loss = 0.0073005142621695995
Validation loss = 0.007414353545755148
Validation loss = 0.0074904561042785645
Validation loss = 0.008773871697485447
Validation loss = 0.0071076517924666405
Validation loss = 0.007779134903103113
Validation loss = 0.0074823712930083275
Validation loss = 0.00698808953166008
Validation loss = 0.007203183136880398
Validation loss = 0.007578051649034023
Validation loss = 0.007438745349645615
Validation loss = 0.007208917289972305
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008027195930480957
Validation loss = 0.0077796028926968575
Validation loss = 0.00819601584225893
Validation loss = 0.007612841669470072
Validation loss = 0.007541976869106293
Validation loss = 0.007846232503652573
Validation loss = 0.007571953348815441
Validation loss = 0.007843835279345512
Validation loss = 0.007941799238324165
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008475015871226788
Validation loss = 0.008853960782289505
Validation loss = 0.008189959451556206
Validation loss = 0.00789838656783104
Validation loss = 0.007373051717877388
Validation loss = 0.00789590273052454
Validation loss = 0.008509294129908085
Validation loss = 0.007437873166054487
Validation loss = 0.008144482970237732
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007754288148134947
Validation loss = 0.007843689993023872
Validation loss = 0.0075331320986151695
Validation loss = 0.007671954110264778
Validation loss = 0.0075071221217513084
Validation loss = 0.007582505699247122
Validation loss = 0.007524385582655668
Validation loss = 0.008038275875151157
Validation loss = 0.008046475239098072
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.96e+03 |
| Iteration     | 32       |
| MaximumReturn | 2.44e+03 |
| MinimumReturn | 503      |
| TotalSamples  | 136000   |
----------------------------
