Logging to experiments/half_cheetah/oct29/w350e3_seed4321
Print configuration .....
{'env_name': 'half_cheetah', 'random_seeds': [4321, 2314, 2341, 3421], 'save_variables': False, 'model_save_dir': '/tmp/half_cheetah_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 0
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 0
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.39332473278045654
Validation loss = 0.14237967133522034
Validation loss = 0.09473558515310287
Validation loss = 0.08229367434978485
Validation loss = 0.07207302749156952
Validation loss = 0.06798464059829712
Validation loss = 0.0876329094171524
Validation loss = 0.06496430933475494
Validation loss = 0.06074180454015732
Validation loss = 0.06193738430738449
Validation loss = 0.06570987403392792
Validation loss = 0.05694779008626938
Validation loss = 0.0634809359908104
Validation loss = 0.0565401166677475
Validation loss = 0.05565381795167923
Validation loss = 0.13954313099384308
Validation loss = 0.05852801352739334
Validation loss = 0.05462596192955971
Validation loss = 0.05669622868299484
Validation loss = 0.05460204929113388
Validation loss = 0.05654119700193405
Validation loss = 0.05663503706455231
Validation loss = 0.05525291711091995
Validation loss = 0.05500991269946098
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3411667048931122
Validation loss = 0.14734935760498047
Validation loss = 0.09818020462989807
Validation loss = 0.08193419873714447
Validation loss = 0.07395096123218536
Validation loss = 0.07190650701522827
Validation loss = 0.07863721251487732
Validation loss = 0.06266163289546967
Validation loss = 0.061755165457725525
Validation loss = 0.06354111433029175
Validation loss = 0.06046877056360245
Validation loss = 0.060446612536907196
Validation loss = 0.06001437455415726
Validation loss = 0.08806084096431732
Validation loss = 0.05754804611206055
Validation loss = 0.05531001836061478
Validation loss = 0.06351922452449799
Validation loss = 0.05502732843160629
Validation loss = 0.05822351202368736
Validation loss = 0.05381465703248978
Validation loss = 0.0545797236263752
Validation loss = 0.05519061163067818
Validation loss = 0.05904509872198105
Validation loss = 0.05260123685002327
Validation loss = 0.05872476100921631
Validation loss = 0.05205893889069557
Validation loss = 0.05290112644433975
Validation loss = 0.05287732183933258
Validation loss = 0.054182544350624084
Validation loss = 0.053779102861881256
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.37908709049224854
Validation loss = 0.1535067856311798
Validation loss = 0.10002652555704117
Validation loss = 0.08216185867786407
Validation loss = 0.07201708853244781
Validation loss = 0.06900787353515625
Validation loss = 0.06797923147678375
Validation loss = 0.06461212784051895
Validation loss = 0.06202133744955063
Validation loss = 0.06441790610551834
Validation loss = 0.056763067841529846
Validation loss = 0.05668226629495621
Validation loss = 0.0552884116768837
Validation loss = 0.05490626394748688
Validation loss = 0.08023536205291748
Validation loss = 0.05732547119259834
Validation loss = 0.05620063841342926
Validation loss = 0.05359362065792084
Validation loss = 0.06216247379779816
Validation loss = 0.05327051132917404
Validation loss = 0.05456298589706421
Validation loss = 0.05255898833274841
Validation loss = 0.055455636233091354
Validation loss = 0.07934144139289856
Validation loss = 0.052834369242191315
Validation loss = 0.05167052149772644
Validation loss = 0.051521800458431244
Validation loss = 0.05964355170726776
Validation loss = 0.050887711346149445
Validation loss = 0.049908898770809174
Validation loss = 0.053140539675951004
Validation loss = 0.05463124066591263
Validation loss = 0.05085653439164162
Validation loss = 0.059157028794288635
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3694756031036377
Validation loss = 0.15444433689117432
Validation loss = 0.10216780006885529
Validation loss = 0.08303233981132507
Validation loss = 0.07394677400588989
Validation loss = 0.07254454493522644
Validation loss = 0.06585558503866196
Validation loss = 0.06901535391807556
Validation loss = 0.08420045673847198
Validation loss = 0.06681036949157715
Validation loss = 0.05975053459405899
Validation loss = 0.06449531018733978
Validation loss = 0.058006010949611664
Validation loss = 0.07065653055906296
Validation loss = 0.05874335765838623
Validation loss = 0.08332613855600357
Validation loss = 0.05604228377342224
Validation loss = 0.05653976649045944
Validation loss = 0.054403454065322876
Validation loss = 0.0666210949420929
Validation loss = 0.05344601720571518
Validation loss = 0.05789686739444733
Validation loss = 0.053682439029216766
Validation loss = 0.058500148355960846
Validation loss = 0.05606523156166077
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3804513216018677
Validation loss = 0.1467665135860443
Validation loss = 0.09717969596385956
Validation loss = 0.08058654516935349
Validation loss = 0.07478699088096619
Validation loss = 0.07005493342876434
Validation loss = 0.07022695243358612
Validation loss = 0.06562566012144089
Validation loss = 0.06556031852960587
Validation loss = 0.1080925390124321
Validation loss = 0.0615704245865345
Validation loss = 0.06481140106916428
Validation loss = 0.05771417170763016
Validation loss = 0.06255772709846497
Validation loss = 0.05891970917582512
Validation loss = 0.0640135109424591
Validation loss = 0.0634826049208641
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 233
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 229
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 220
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 256
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 245
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 234
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -258     |
| Iteration     | 0        |
| MaximumReturn | -169     |
| MinimumReturn | -328     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10893230140209198
Validation loss = 0.07447990775108337
Validation loss = 0.0814783051609993
Validation loss = 0.07212019711732864
Validation loss = 0.06817661970853806
Validation loss = 0.06934008002281189
Validation loss = 0.06583086401224136
Validation loss = 0.06619015336036682
Validation loss = 0.06696154922246933
Validation loss = 0.06986650824546814
Validation loss = 0.06656830757856369
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10164892673492432
Validation loss = 0.07437919825315475
Validation loss = 0.07005925476551056
Validation loss = 0.07053813338279724
Validation loss = 0.07086212933063507
Validation loss = 0.06913255155086517
Validation loss = 0.0663299486041069
Validation loss = 0.06415228545665741
Validation loss = 0.06297706812620163
Validation loss = 0.08140695095062256
Validation loss = 0.06214750185608864
Validation loss = 0.06760606169700623
Validation loss = 0.06686045229434967
Validation loss = 0.07035171985626221
Validation loss = 0.06590146571397781
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.13483953475952148
Validation loss = 0.07150434702634811
Validation loss = 0.06710412353277206
Validation loss = 0.067893847823143
Validation loss = 0.06386540830135345
Validation loss = 0.06553280353546143
Validation loss = 0.07171927392482758
Validation loss = 0.06196155399084091
Validation loss = 0.06259138137102127
Validation loss = 0.06367946416139603
Validation loss = 0.06139088049530983
Validation loss = 0.06181083247065544
Validation loss = 0.06424017995595932
Validation loss = 0.06061108037829399
Validation loss = 0.06245499104261398
Validation loss = 0.06958933919668198
Validation loss = 0.06204139068722725
Validation loss = 0.06671121716499329
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10357008874416351
Validation loss = 0.07259348779916763
Validation loss = 0.07251584529876709
Validation loss = 0.07639328390359879
Validation loss = 0.06425151973962784
Validation loss = 0.07114256173372269
Validation loss = 0.06576851010322571
Validation loss = 0.0645960122346878
Validation loss = 0.06816840171813965
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10333053767681122
Validation loss = 0.07339002937078476
Validation loss = 0.0745926946401596
Validation loss = 0.06989298015832901
Validation loss = 0.06746865063905716
Validation loss = 0.0789669007062912
Validation loss = 0.06696467101573944
Validation loss = 0.0697706788778305
Validation loss = 0.06460820138454437
Validation loss = 0.06641585379838943
Validation loss = 0.06427233666181564
Validation loss = 0.06315191090106964
Validation loss = 0.06420744955539703
Validation loss = 0.0625915378332138
Validation loss = 0.06303808093070984
Validation loss = 0.06253921985626221
Validation loss = 0.06475040316581726
Validation loss = 0.06233513355255127
Validation loss = 0.06807664781808853
Validation loss = 0.0633072704076767
Validation loss = 0.06198091804981232
Validation loss = 0.06542845815420151
Validation loss = 0.06405258923768997
Validation loss = 0.07019184529781342
Validation loss = 0.07607562094926834
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 200
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 306
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 301
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 292
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 301
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 294
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 129      |
| Iteration     | 1        |
| MaximumReturn | 269      |
| MinimumReturn | -297     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10756689310073853
Validation loss = 0.06954299658536911
Validation loss = 0.06784599274396896
Validation loss = 0.0646219551563263
Validation loss = 0.06410179287195206
Validation loss = 0.06300253421068192
Validation loss = 0.061495959758758545
Validation loss = 0.0764937624335289
Validation loss = 0.06357594579458237
Validation loss = 0.06739541888237
Validation loss = 0.06080508604645729
Validation loss = 0.06157046556472778
Validation loss = 0.06277433782815933
Validation loss = 0.0641121193766594
Validation loss = 0.05929436907172203
Validation loss = 0.05970728024840355
Validation loss = 0.060720328241586685
Validation loss = 0.06275313347578049
Validation loss = 0.06244215369224548
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11336427181959152
Validation loss = 0.06801581382751465
Validation loss = 0.06758200377225876
Validation loss = 0.06328217685222626
Validation loss = 0.06294175237417221
Validation loss = 0.06514191627502441
Validation loss = 0.06075749918818474
Validation loss = 0.06329142302274704
Validation loss = 0.05967431142926216
Validation loss = 0.0615602470934391
Validation loss = 0.06029185280203819
Validation loss = 0.06062719225883484
Validation loss = 0.05956362187862396
Validation loss = 0.059569671750068665
Validation loss = 0.06206785514950752
Validation loss = 0.061142463237047195
Validation loss = 0.059537697583436966
Validation loss = 0.058007121086120605
Validation loss = 0.06331949681043625
Validation loss = 0.059033263474702835
Validation loss = 0.061184149235486984
Validation loss = 0.058443840593099594
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09932487457990646
Validation loss = 0.06706443428993225
Validation loss = 0.06779101490974426
Validation loss = 0.06643099337816238
Validation loss = 0.06164759770035744
Validation loss = 0.061093468219041824
Validation loss = 0.06272327154874802
Validation loss = 0.05970458313822746
Validation loss = 0.059090182185173035
Validation loss = 0.05912826955318451
Validation loss = 0.058321285992860794
Validation loss = 0.05854916572570801
Validation loss = 0.06049984693527222
Validation loss = 0.057867381721735
Validation loss = 0.060697197914123535
Validation loss = 0.06017184257507324
Validation loss = 0.05890342965722084
Validation loss = 0.05716739967465401
Validation loss = 0.059744447469711304
Validation loss = 0.061803847551345825
Validation loss = 0.059489112347364426
Validation loss = 0.06216419115662575
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10452002286911011
Validation loss = 0.0697311982512474
Validation loss = 0.06656890362501144
Validation loss = 0.06479410082101822
Validation loss = 0.06517843157052994
Validation loss = 0.06004304811358452
Validation loss = 0.06608638167381287
Validation loss = 0.0587536096572876
Validation loss = 0.06321590393781662
Validation loss = 0.06301964074373245
Validation loss = 0.05895683169364929
Validation loss = 0.06603824347257614
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11455795913934708
Validation loss = 0.06820464134216309
Validation loss = 0.06529790163040161
Validation loss = 0.06415405124425888
Validation loss = 0.06087000295519829
Validation loss = 0.06049668788909912
Validation loss = 0.06659460812807083
Validation loss = 0.061510030180215836
Validation loss = 0.0599261038005352
Validation loss = 0.059378910809755325
Validation loss = 0.059122610837221146
Validation loss = 0.07136792689561844
Validation loss = 0.05921066179871559
Validation loss = 0.05923612043261528
Validation loss = 0.05951625108718872
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 436
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 451
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 450
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 456
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 472
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 458
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 312      |
| Iteration     | 2        |
| MaximumReturn | 360      |
| MinimumReturn | 222      |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.058659207075834274
Validation loss = 0.05498838052153587
Validation loss = 0.05397782102227211
Validation loss = 0.05138172209262848
Validation loss = 0.05056058615446091
Validation loss = 0.0493282675743103
Validation loss = 0.0491967648267746
Validation loss = 0.04811442643404007
Validation loss = 0.04850878566503525
Validation loss = 0.05079617351293564
Validation loss = 0.04829522594809532
Validation loss = 0.04692854359745979
Validation loss = 0.04818030446767807
Validation loss = 0.04689404368400574
Validation loss = 0.04749142378568649
Validation loss = 0.047041065990924835
Validation loss = 0.05139774829149246
Validation loss = 0.04902336001396179
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06829538196325302
Validation loss = 0.049430571496486664
Validation loss = 0.04922851175069809
Validation loss = 0.05047748237848282
Validation loss = 0.04834683984518051
Validation loss = 0.048575691878795624
Validation loss = 0.04897793009877205
Validation loss = 0.04689212515950203
Validation loss = 0.049268484115600586
Validation loss = 0.04609944671392441
Validation loss = 0.05095583200454712
Validation loss = 0.04545609652996063
Validation loss = 0.046814925968647
Validation loss = 0.04796227067708969
Validation loss = 0.04573895037174225
Validation loss = 0.04680329188704491
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05517031252384186
Validation loss = 0.05397355183959007
Validation loss = 0.04976280778646469
Validation loss = 0.04823784530162811
Validation loss = 0.048886679112911224
Validation loss = 0.047558121383190155
Validation loss = 0.04878572002053261
Validation loss = 0.04791286587715149
Validation loss = 0.05016271770000458
Validation loss = 0.04563684016466141
Validation loss = 0.0484284982085228
Validation loss = 0.045304909348487854
Validation loss = 0.0453055165708065
Validation loss = 0.04861734062433243
Validation loss = 0.046009667217731476
Validation loss = 0.046090930700302124
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.061638034880161285
Validation loss = 0.05443621426820755
Validation loss = 0.05012447014451027
Validation loss = 0.0514005571603775
Validation loss = 0.05051368102431297
Validation loss = 0.04739699140191078
Validation loss = 0.04962129145860672
Validation loss = 0.049499958753585815
Validation loss = 0.04650687798857689
Validation loss = 0.04892547428607941
Validation loss = 0.047020815312862396
Validation loss = 0.0474415123462677
Validation loss = 0.04725675284862518
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.061486151069402695
Validation loss = 0.05265112966299057
Validation loss = 0.05146662890911102
Validation loss = 0.051733143627643585
Validation loss = 0.04989812150597572
Validation loss = 0.050906188786029816
Validation loss = 0.04957038164138794
Validation loss = 0.04992011561989784
Validation loss = 0.04971957951784134
Validation loss = 0.050329845398664474
Validation loss = 0.04755788296461105
Validation loss = 0.04914618283510208
Validation loss = 0.047413088381290436
Validation loss = 0.04685281962156296
Validation loss = 0.04511875659227371
Validation loss = 0.046423204243183136
Validation loss = 0.04842545837163925
Validation loss = 0.04524802416563034
Validation loss = 0.04468570649623871
Validation loss = 0.0461510494351387
Validation loss = 0.045710548758506775
Validation loss = 0.04691167175769806
Validation loss = 0.04718983173370361
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 509
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 494
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 492
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 492
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 496
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 521
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 521      |
| Iteration     | 3        |
| MaximumReturn | 605      |
| MinimumReturn | 437      |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.048475924879312515
Validation loss = 0.04039747267961502
Validation loss = 0.046523191034793854
Validation loss = 0.039284348487854004
Validation loss = 0.03809323161840439
Validation loss = 0.03890134021639824
Validation loss = 0.03882615268230438
Validation loss = 0.038110993802547455
Validation loss = 0.03947824612259865
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04761491343379021
Validation loss = 0.040807414799928665
Validation loss = 0.039030928164720535
Validation loss = 0.03769023343920708
Validation loss = 0.03821631893515587
Validation loss = 0.0382833257317543
Validation loss = 0.03907022625207901
Validation loss = 0.037212617695331573
Validation loss = 0.03837122395634651
Validation loss = 0.03919897601008415
Validation loss = 0.03745853528380394
Validation loss = 0.03687335550785065
Validation loss = 0.03587806969881058
Validation loss = 0.03989139944314957
Validation loss = 0.037253670394420624
Validation loss = 0.037497274577617645
Validation loss = 0.03705670312047005
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04457978159189224
Validation loss = 0.044337183237075806
Validation loss = 0.03945949301123619
Validation loss = 0.03942420706152916
Validation loss = 0.03947141021490097
Validation loss = 0.0391942523419857
Validation loss = 0.03930145502090454
Validation loss = 0.03926350548863411
Validation loss = 0.038477279245853424
Validation loss = 0.037624627351760864
Validation loss = 0.03855282440781593
Validation loss = 0.03624209761619568
Validation loss = 0.036182306706905365
Validation loss = 0.03743378072977066
Validation loss = 0.03707307577133179
Validation loss = 0.038756873458623886
Validation loss = 0.036002181470394135
Validation loss = 0.03544742241501808
Validation loss = 0.038426510989665985
Validation loss = 0.035970065742731094
Validation loss = 0.035444002598524094
Validation loss = 0.03514941781759262
Validation loss = 0.03441659361124039
Validation loss = 0.03647876903414726
Validation loss = 0.035997308790683746
Validation loss = 0.035333842039108276
Validation loss = 0.036124877631664276
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.047085221856832504
Validation loss = 0.041251860558986664
Validation loss = 0.03887350857257843
Validation loss = 0.03885497897863388
Validation loss = 0.03955500200390816
Validation loss = 0.03698595613241196
Validation loss = 0.03736815229058266
Validation loss = 0.036473847925662994
Validation loss = 0.03798402100801468
Validation loss = 0.0376097708940506
Validation loss = 0.03674108907580376
Validation loss = 0.035708799958229065
Validation loss = 0.03933320194482803
Validation loss = 0.035735782235860825
Validation loss = 0.03550940006971359
Validation loss = 0.03636976331472397
Validation loss = 0.034532733261585236
Validation loss = 0.03695647045969963
Validation loss = 0.03795234113931656
Validation loss = 0.03441920131444931
Validation loss = 0.0344921238720417
Validation loss = 0.036704037338495255
Validation loss = 0.03380743786692619
Validation loss = 0.033087484538555145
Validation loss = 0.03439859673380852
Validation loss = 0.03240271657705307
Validation loss = 0.03409872576594353
Validation loss = 0.032979439944028854
Validation loss = 0.033109333366155624
Validation loss = 0.03510739654302597
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.048218678683042526
Validation loss = 0.03946522995829582
Validation loss = 0.038068853318691254
Validation loss = 0.03993148356676102
Validation loss = 0.03816453740000725
Validation loss = 0.0386466309428215
Validation loss = 0.040590159595012665
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 546
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 84
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 524
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 547
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 542
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 550
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 542      |
| Iteration     | 4        |
| MaximumReturn | 779      |
| MinimumReturn | -408     |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07527921348810196
Validation loss = 0.045708268880844116
Validation loss = 0.04369528964161873
Validation loss = 0.04004629701375961
Validation loss = 0.04219042882323265
Validation loss = 0.041584562510252
Validation loss = 0.03822389245033264
Validation loss = 0.03856286779046059
Validation loss = 0.03803691267967224
Validation loss = 0.03747420385479927
Validation loss = 0.03847819194197655
Validation loss = 0.04075343534350395
Validation loss = 0.0361342690885067
Validation loss = 0.036359257996082306
Validation loss = 0.03863620385527611
Validation loss = 0.03731832280755043
Validation loss = 0.036391645669937134
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.060584958642721176
Validation loss = 0.04481777548789978
Validation loss = 0.04203483462333679
Validation loss = 0.038286853581666946
Validation loss = 0.03963864967226982
Validation loss = 0.03726913407444954
Validation loss = 0.038871657103300095
Validation loss = 0.0377972386777401
Validation loss = 0.03725096955895424
Validation loss = 0.0370224192738533
Validation loss = 0.036722343415021896
Validation loss = 0.03761698305606842
Validation loss = 0.036294665187597275
Validation loss = 0.03551765903830528
Validation loss = 0.03451156243681908
Validation loss = 0.03680024668574333
Validation loss = 0.034329984337091446
Validation loss = 0.03690493106842041
Validation loss = 0.03455372154712677
Validation loss = 0.040261056274175644
Validation loss = 0.034629885107278824
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07060782611370087
Validation loss = 0.044303327798843384
Validation loss = 0.040859054774045944
Validation loss = 0.03903282806277275
Validation loss = 0.03774121403694153
Validation loss = 0.03872404247522354
Validation loss = 0.03811138868331909
Validation loss = 0.03807671740651131
Validation loss = 0.03625462204217911
Validation loss = 0.03702045604586601
Validation loss = 0.03629167377948761
Validation loss = 0.03663928434252739
Validation loss = 0.03585488721728325
Validation loss = 0.03597523644566536
Validation loss = 0.03556612879037857
Validation loss = 0.03521522507071495
Validation loss = 0.03433012589812279
Validation loss = 0.036654964089393616
Validation loss = 0.036884207278490067
Validation loss = 0.03439803794026375
Validation loss = 0.03428753465414047
Validation loss = 0.037451788783073425
Validation loss = 0.0350814089179039
Validation loss = 0.03597189113497734
Validation loss = 0.0347297228872776
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08490881323814392
Validation loss = 0.04567734897136688
Validation loss = 0.042396146804094315
Validation loss = 0.04010545834898949
Validation loss = 0.03905593603849411
Validation loss = 0.03756671026349068
Validation loss = 0.037380319088697433
Validation loss = 0.03834807500243187
Validation loss = 0.03676861152052879
Validation loss = 0.03688091039657593
Validation loss = 0.03679053857922554
Validation loss = 0.03697484731674194
Validation loss = 0.03638160973787308
Validation loss = 0.034737858921289444
Validation loss = 0.03729166463017464
Validation loss = 0.03538825362920761
Validation loss = 0.03420943394303322
Validation loss = 0.03692614659667015
Validation loss = 0.036272745579481125
Validation loss = 0.0343591682612896
Validation loss = 0.03545147553086281
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06234481558203697
Validation loss = 0.04408680275082588
Validation loss = 0.04394744336605072
Validation loss = 0.03958405926823616
Validation loss = 0.03842935711145401
Validation loss = 0.03809976950287819
Validation loss = 0.0410534106194973
Validation loss = 0.03755548223853111
Validation loss = 0.0398932620882988
Validation loss = 0.03748055547475815
Validation loss = 0.036874957382678986
Validation loss = 0.036531057208776474
Validation loss = 0.03623144328594208
Validation loss = 0.03597124293446541
Validation loss = 0.036911848932504654
Validation loss = 0.03646234795451164
Validation loss = 0.03600930795073509
Validation loss = 0.03571614250540733
Validation loss = 0.034963708370923996
Validation loss = 0.03524211049079895
Validation loss = 0.03622645139694214
Validation loss = 0.03690851852297783
Validation loss = 0.03555017709732056
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 615
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 639
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 610
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 629
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 609
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 630
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 814      |
| Iteration     | 5        |
| MaximumReturn | 861      |
| MinimumReturn | 748      |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04055153205990791
Validation loss = 0.03287395089864731
Validation loss = 0.032807692885398865
Validation loss = 0.03263750299811363
Validation loss = 0.032708846032619476
Validation loss = 0.03155917674303055
Validation loss = 0.031367916613817215
Validation loss = 0.033759985119104385
Validation loss = 0.031318776309490204
Validation loss = 0.03158991038799286
Validation loss = 0.030882397666573524
Validation loss = 0.030149448662996292
Validation loss = 0.029465829953551292
Validation loss = 0.030539916828274727
Validation loss = 0.029955163598060608
Validation loss = 0.03204619511961937
Validation loss = 0.03559219837188721
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0395624153316021
Validation loss = 0.0317208506166935
Validation loss = 0.029737262055277824
Validation loss = 0.029952604323625565
Validation loss = 0.030168814584612846
Validation loss = 0.0312128234654665
Validation loss = 0.030387455597519875
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03628125041723251
Validation loss = 0.031200891360640526
Validation loss = 0.030836790800094604
Validation loss = 0.031647201627492905
Validation loss = 0.031033415347337723
Validation loss = 0.02939053438603878
Validation loss = 0.03074154071509838
Validation loss = 0.03014555014669895
Validation loss = 0.03123391978442669
Validation loss = 0.03000420331954956
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03791889175772667
Validation loss = 0.03044942207634449
Validation loss = 0.03025798499584198
Validation loss = 0.029452387243509293
Validation loss = 0.029653744772076607
Validation loss = 0.029382294043898582
Validation loss = 0.03061460331082344
Validation loss = 0.030975501984357834
Validation loss = 0.03225981071591377
Validation loss = 0.029395008459687233
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03840268775820732
Validation loss = 0.03100636787712574
Validation loss = 0.03212913125753403
Validation loss = 0.031174426898360252
Validation loss = 0.03060099482536316
Validation loss = 0.03040142171084881
Validation loss = 0.030649181455373764
Validation loss = 0.02937999926507473
Validation loss = 0.029162021353840828
Validation loss = 0.027967801317572594
Validation loss = 0.030309882014989853
Validation loss = 0.031693655997514725
Validation loss = 0.029948705807328224
Validation loss = 0.028385931625962257
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 669
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 668
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 692
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 680
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 655
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 664
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 916      |
| Iteration     | 6        |
| MaximumReturn | 957      |
| MinimumReturn | 878      |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.033059850335121155
Validation loss = 0.02693447843194008
Validation loss = 0.02672429010272026
Validation loss = 0.025629598647356033
Validation loss = 0.024663046002388
Validation loss = 0.024884968996047974
Validation loss = 0.026180323213338852
Validation loss = 0.024977466091513634
Validation loss = 0.0256098173558712
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03452588617801666
Validation loss = 0.02652435377240181
Validation loss = 0.02638247422873974
Validation loss = 0.02625710517168045
Validation loss = 0.028073221445083618
Validation loss = 0.02674422413110733
Validation loss = 0.02617042325437069
Validation loss = 0.027377288788557053
Validation loss = 0.02608310617506504
Validation loss = 0.028096573427319527
Validation loss = 0.0248170904815197
Validation loss = 0.025583550333976746
Validation loss = 0.02580837719142437
Validation loss = 0.02413376048207283
Validation loss = 0.025582898408174515
Validation loss = 0.026660526171326637
Validation loss = 0.023558475077152252
Validation loss = 0.024125931784510612
Validation loss = 0.024649791419506073
Validation loss = 0.025113750249147415
Validation loss = 0.024714073166251183
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.031011143699288368
Validation loss = 0.026787515729665756
Validation loss = 0.02676297537982464
Validation loss = 0.025674227625131607
Validation loss = 0.025904439389705658
Validation loss = 0.024923764169216156
Validation loss = 0.025546273216605186
Validation loss = 0.024473007768392563
Validation loss = 0.024951476603746414
Validation loss = 0.025031428784132004
Validation loss = 0.024358319118618965
Validation loss = 0.026499301195144653
Validation loss = 0.024038391187787056
Validation loss = 0.024300623685121536
Validation loss = 0.025233637541532516
Validation loss = 0.02354678511619568
Validation loss = 0.0230986587703228
Validation loss = 0.023331493139266968
Validation loss = 0.02547891065478325
Validation loss = 0.02341546304523945
Validation loss = 0.02258942276239395
Validation loss = 0.023446284234523773
Validation loss = 0.023556912317872047
Validation loss = 0.022925611585378647
Validation loss = 0.02368992194533348
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.039696283638477325
Validation loss = 0.026154475286602974
Validation loss = 0.025847068056464195
Validation loss = 0.02611652761697769
Validation loss = 0.025413883849978447
Validation loss = 0.02589043788611889
Validation loss = 0.026430219411849976
Validation loss = 0.024895571172237396
Validation loss = 0.024201754480600357
Validation loss = 0.02549230307340622
Validation loss = 0.026154566556215286
Validation loss = 0.025252249091863632
Validation loss = 0.02524755895137787
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.032507311552762985
Validation loss = 0.026484085246920586
Validation loss = 0.0259661003947258
Validation loss = 0.025511451065540314
Validation loss = 0.024912044405937195
Validation loss = 0.02496962994337082
Validation loss = 0.02403535321354866
Validation loss = 0.02516191452741623
Validation loss = 0.028774164617061615
Validation loss = 0.025354858487844467
Validation loss = 0.025308212265372276
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 679
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 694
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 709
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 680
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 695
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 689
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 986      |
| Iteration     | 7        |
| MaximumReturn | 1.02e+03 |
| MinimumReturn | 935      |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029635412618517876
Validation loss = 0.02391931228339672
Validation loss = 0.022349238395690918
Validation loss = 0.023107195273041725
Validation loss = 0.021425805985927582
Validation loss = 0.02163560874760151
Validation loss = 0.022201377898454666
Validation loss = 0.02216646447777748
Validation loss = 0.021632054820656776
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02880425564944744
Validation loss = 0.02210116945207119
Validation loss = 0.022112684324383736
Validation loss = 0.021642519161105156
Validation loss = 0.021929562091827393
Validation loss = 0.021898610517382622
Validation loss = 0.02317027375102043
Validation loss = 0.02202894724905491
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.027028899639844894
Validation loss = 0.02318631485104561
Validation loss = 0.020426824688911438
Validation loss = 0.0231638066470623
Validation loss = 0.02051168493926525
Validation loss = 0.020478546619415283
Validation loss = 0.021055549383163452
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.029733210802078247
Validation loss = 0.02270798571407795
Validation loss = 0.022441724315285683
Validation loss = 0.023331396281719208
Validation loss = 0.0221484936773777
Validation loss = 0.021244782954454422
Validation loss = 0.02220172993838787
Validation loss = 0.023756222799420357
Validation loss = 0.021004976704716682
Validation loss = 0.0222287829965353
Validation loss = 0.020965950563549995
Validation loss = 0.022019006311893463
Validation loss = 0.02116045355796814
Validation loss = 0.020532988011837006
Validation loss = 0.021330632269382477
Validation loss = 0.020292919129133224
Validation loss = 0.02077827788889408
Validation loss = 0.021239323541522026
Validation loss = 0.02054666355252266
Validation loss = 0.01992912031710148
Validation loss = 0.02015216089785099
Validation loss = 0.020300062373280525
Validation loss = 0.0201267059892416
Validation loss = 0.02066219598054886
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02697085402905941
Validation loss = 0.02299196831882
Validation loss = 0.02296070009469986
Validation loss = 0.02284822054207325
Validation loss = 0.022622816264629364
Validation loss = 0.02344135008752346
Validation loss = 0.021784484386444092
Validation loss = 0.02390991896390915
Validation loss = 0.022814147174358368
Validation loss = 0.021285900846123695
Validation loss = 0.021352794021368027
Validation loss = 0.02199886552989483
Validation loss = 0.021633850410580635
Validation loss = 0.020877007395029068
Validation loss = 0.021428991109132767
Validation loss = 0.020439641550183296
Validation loss = 0.02004104107618332
Validation loss = 0.021746966987848282
Validation loss = 0.019696274772286415
Validation loss = 0.01941249705851078
Validation loss = 0.020269155502319336
Validation loss = 0.019747234880924225
Validation loss = 0.020291807129979134
Validation loss = 0.019834158942103386
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 726
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 629
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 707
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 726
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 729
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 717
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 750      |
| Iteration     | 8        |
| MaximumReturn | 1.06e+03 |
| MinimumReturn | -482     |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.027697812765836716
Validation loss = 0.021818671375513077
Validation loss = 0.02124127745628357
Validation loss = 0.02015220746397972
Validation loss = 0.021109219640493393
Validation loss = 0.020308535546064377
Validation loss = 0.02063160203397274
Validation loss = 0.0198963675647974
Validation loss = 0.019919853657484055
Validation loss = 0.01891474798321724
Validation loss = 0.019848322495818138
Validation loss = 0.019648928195238113
Validation loss = 0.019607311114668846
Validation loss = 0.019676093012094498
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.027647191658616066
Validation loss = 0.020975815132260323
Validation loss = 0.020011842250823975
Validation loss = 0.01920807734131813
Validation loss = 0.019970577210187912
Validation loss = 0.020412789657711983
Validation loss = 0.021434519439935684
Validation loss = 0.019644059240818024
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024013858288526535
Validation loss = 0.021337317302823067
Validation loss = 0.021619534119963646
Validation loss = 0.01966787874698639
Validation loss = 0.019182153046131134
Validation loss = 0.018741372972726822
Validation loss = 0.01990537717938423
Validation loss = 0.01883561536669731
Validation loss = 0.01947653479874134
Validation loss = 0.018898356705904007
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02472379431128502
Validation loss = 0.02090851590037346
Validation loss = 0.019845083355903625
Validation loss = 0.020571667701005936
Validation loss = 0.01916920766234398
Validation loss = 0.018473265692591667
Validation loss = 0.019080575555562973
Validation loss = 0.020442210137844086
Validation loss = 0.018385278061032295
Validation loss = 0.018961677327752113
Validation loss = 0.01803607866168022
Validation loss = 0.020353587344288826
Validation loss = 0.018348176032304764
Validation loss = 0.018450159579515457
Validation loss = 0.017514439299702644
Validation loss = 0.01809510961174965
Validation loss = 0.018320634961128235
Validation loss = 0.018958700820803642
Validation loss = 0.017459189519286156
Validation loss = 0.018101979047060013
Validation loss = 0.018756624311208725
Validation loss = 0.017418574541807175
Validation loss = 0.01926596090197563
Validation loss = 0.01808849163353443
Validation loss = 0.017528768628835678
Validation loss = 0.019526934251189232
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024398069828748703
Validation loss = 0.02068197727203369
Validation loss = 0.019475502893328667
Validation loss = 0.018646545708179474
Validation loss = 0.020154623314738274
Validation loss = 0.018582383170723915
Validation loss = 0.01904197223484516
Validation loss = 0.021246489137411118
Validation loss = 0.018541425466537476
Validation loss = 0.019144127145409584
Validation loss = 0.017657218500971794
Validation loss = 0.01775055192410946
Validation loss = 0.019212309271097183
Validation loss = 0.01773378625512123
Validation loss = 0.01935269683599472
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 746
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 731
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 748
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 733
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 746
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 739
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1e+03    |
| Iteration     | 9        |
| MaximumReturn | 1.11e+03 |
| MinimumReturn | 933      |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02349364198744297
Validation loss = 0.018323903903365135
Validation loss = 0.018994076177477837
Validation loss = 0.0203070230782032
Validation loss = 0.017637254670262337
Validation loss = 0.017870087176561356
Validation loss = 0.0202055424451828
Validation loss = 0.017093420028686523
Validation loss = 0.018081439658999443
Validation loss = 0.017725534737110138
Validation loss = 0.017914045602083206
Validation loss = 0.018596850335597992
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021173205226659775
Validation loss = 0.01959626004099846
Validation loss = 0.018443571403622627
Validation loss = 0.01865011267364025
Validation loss = 0.01839319057762623
Validation loss = 0.0183522030711174
Validation loss = 0.018654201179742813
Validation loss = 0.020063458010554314
Validation loss = 0.01816200092434883
Validation loss = 0.018751006573438644
Validation loss = 0.018732784315943718
Validation loss = 0.017910363152623177
Validation loss = 0.017574351280927658
Validation loss = 0.018725939095020294
Validation loss = 0.01717083901166916
Validation loss = 0.017341556027531624
Validation loss = 0.017182640731334686
Validation loss = 0.01776972971856594
Validation loss = 0.017290355637669563
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01991409808397293
Validation loss = 0.01862805336713791
Validation loss = 0.017531154677271843
Validation loss = 0.017877308651804924
Validation loss = 0.01778004691004753
Validation loss = 0.01692422665655613
Validation loss = 0.016659943386912346
Validation loss = 0.01752207800745964
Validation loss = 0.01712556555867195
Validation loss = 0.016683919355273247
Validation loss = 0.017155775800347328
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019152536988258362
Validation loss = 0.01696942001581192
Validation loss = 0.017166398465633392
Validation loss = 0.016396038234233856
Validation loss = 0.0169523973017931
Validation loss = 0.01672941818833351
Validation loss = 0.01714620366692543
Validation loss = 0.01658119447529316
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020667510107159615
Validation loss = 0.01720704883337021
Validation loss = 0.017123175784945488
Validation loss = 0.01695992983877659
Validation loss = 0.01658863015472889
Validation loss = 0.01739845797419548
Validation loss = 0.01666947454214096
Validation loss = 0.01743406616151333
Validation loss = 0.017304198816418648
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 733
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 733
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 728
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 742
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 726
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 724
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.05e+03 |
| Iteration     | 10       |
| MaximumReturn | 1.13e+03 |
| MinimumReturn | 974      |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018561547622084618
Validation loss = 0.01709059067070484
Validation loss = 0.016650989651679993
Validation loss = 0.01601000316441059
Validation loss = 0.017130974680185318
Validation loss = 0.016356294974684715
Validation loss = 0.016884274780750275
Validation loss = 0.016383130103349686
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018254073336720467
Validation loss = 0.016357237473130226
Validation loss = 0.016343308612704277
Validation loss = 0.0164511576294899
Validation loss = 0.01883106864988804
Validation loss = 0.015912925824522972
Validation loss = 0.01575688272714615
Validation loss = 0.016795719042420387
Validation loss = 0.017158232629299164
Validation loss = 0.018651161342859268
Validation loss = 0.015676749870181084
Validation loss = 0.017110150307416916
Validation loss = 0.015565268695354462
Validation loss = 0.01489525567740202
Validation loss = 0.016295628622174263
Validation loss = 0.01581438072025776
Validation loss = 0.015433895401656628
Validation loss = 0.0162262711673975
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020669056102633476
Validation loss = 0.016187377274036407
Validation loss = 0.016394266858696938
Validation loss = 0.01764281466603279
Validation loss = 0.016918858513236046
Validation loss = 0.016140148043632507
Validation loss = 0.016428638249635696
Validation loss = 0.01590331643819809
Validation loss = 0.01632184162735939
Validation loss = 0.01582833006978035
Validation loss = 0.01550745964050293
Validation loss = 0.015797287225723267
Validation loss = 0.015626752749085426
Validation loss = 0.015606469474732876
Validation loss = 0.017412694171071053
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016865577548742294
Validation loss = 0.016463110223412514
Validation loss = 0.015920614823698997
Validation loss = 0.016532031819224358
Validation loss = 0.015446777455508709
Validation loss = 0.01578649692237377
Validation loss = 0.016377897933125496
Validation loss = 0.016770632937550545
Validation loss = 0.015091228298842907
Validation loss = 0.015240922570228577
Validation loss = 0.01659844070672989
Validation loss = 0.015687642619013786
Validation loss = 0.015819044783711433
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017485253512859344
Validation loss = 0.016571970656514168
Validation loss = 0.016696350648999214
Validation loss = 0.015837399289011955
Validation loss = 0.015537205152213573
Validation loss = 0.015162751078605652
Validation loss = 0.016417371109128
Validation loss = 0.017209352925419807
Validation loss = 0.015707187354564667
Validation loss = 0.01564847119152546
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 757
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 761
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 740
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 731
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 756
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 733
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 975      |
| Iteration     | 11       |
| MaximumReturn | 1.04e+03 |
| MinimumReturn | 896      |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01759340614080429
Validation loss = 0.016435246914625168
Validation loss = 0.015508229844272137
Validation loss = 0.015533463098108768
Validation loss = 0.014834784902632236
Validation loss = 0.01622479036450386
Validation loss = 0.015069782733917236
Validation loss = 0.015503873117268085
Validation loss = 0.016373254358768463
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017484696581959724
Validation loss = 0.015002699568867683
Validation loss = 0.015096128918230534
Validation loss = 0.015400305390357971
Validation loss = 0.015733977779746056
Validation loss = 0.01524384319782257
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015722332522273064
Validation loss = 0.015130188316106796
Validation loss = 0.014644579961895943
Validation loss = 0.016634494066238403
Validation loss = 0.01462855376303196
Validation loss = 0.014927390962839127
Validation loss = 0.01509430818259716
Validation loss = 0.01461444329470396
Validation loss = 0.015124908648431301
Validation loss = 0.015446226112544537
Validation loss = 0.015491015277802944
Validation loss = 0.015545361675322056
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016239870339632034
Validation loss = 0.01513192430138588
Validation loss = 0.015112840570509434
Validation loss = 0.014945531263947487
Validation loss = 0.015321608632802963
Validation loss = 0.01540398970246315
Validation loss = 0.015353413298726082
Validation loss = 0.01488895621150732
Validation loss = 0.014407219365239143
Validation loss = 0.014867598190903664
Validation loss = 0.015256809070706367
Validation loss = 0.014398196712136269
Validation loss = 0.014158546924591064
Validation loss = 0.014332546852529049
Validation loss = 0.014435192570090294
Validation loss = 0.014154355973005295
Validation loss = 0.015163769014179707
Validation loss = 0.014819454401731491
Validation loss = 0.015296072699129581
Validation loss = 0.014320101588964462
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016460450366139412
Validation loss = 0.015787364915013313
Validation loss = 0.0158971156924963
Validation loss = 0.014424307271838188
Validation loss = 0.015417222864925861
Validation loss = 0.01520442869514227
Validation loss = 0.01552846934646368
Validation loss = 0.014551429077982903
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 754
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 764
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 777
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 776
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 766
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 786
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 998      |
| Iteration     | 12       |
| MaximumReturn | 1.04e+03 |
| MinimumReturn | 955      |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01615709252655506
Validation loss = 0.01584996096789837
Validation loss = 0.015434888191521168
Validation loss = 0.016196470707654953
Validation loss = 0.014455723576247692
Validation loss = 0.014674805104732513
Validation loss = 0.014543767087161541
Validation loss = 0.016059618443250656
Validation loss = 0.014197521843016148
Validation loss = 0.015471525490283966
Validation loss = 0.015338319353759289
Validation loss = 0.013891388662159443
Validation loss = 0.01465680729597807
Validation loss = 0.014784170314669609
Validation loss = 0.013925033621490002
Validation loss = 0.01487580593675375
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016143571585416794
Validation loss = 0.014757598750293255
Validation loss = 0.013920691795647144
Validation loss = 0.014009952545166016
Validation loss = 0.014715285040438175
Validation loss = 0.013910636305809021
Validation loss = 0.01391536183655262
Validation loss = 0.014367105439305305
Validation loss = 0.014282546006143093
Validation loss = 0.013618764467537403
Validation loss = 0.014357490465044975
Validation loss = 0.015113291330635548
Validation loss = 0.01405158918350935
Validation loss = 0.014090524055063725
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015238446183502674
Validation loss = 0.015044457279145718
Validation loss = 0.014860136434435844
Validation loss = 0.014152892865240574
Validation loss = 0.014519574120640755
Validation loss = 0.015298695303499699
Validation loss = 0.01362437754869461
Validation loss = 0.014571639709174633
Validation loss = 0.014122428372502327
Validation loss = 0.014088215306401253
Validation loss = 0.014340182766318321
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014680936001241207
Validation loss = 0.01417339313775301
Validation loss = 0.01475948840379715
Validation loss = 0.015165025368332863
Validation loss = 0.01410767249763012
Validation loss = 0.014079594984650612
Validation loss = 0.013480688445270061
Validation loss = 0.013301195576786995
Validation loss = 0.015105296857655048
Validation loss = 0.014438053593039513
Validation loss = 0.013964231126010418
Validation loss = 0.013529734686017036
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015144268982112408
Validation loss = 0.014010374434292316
Validation loss = 0.014983252622187138
Validation loss = 0.013586648739874363
Validation loss = 0.01419313345104456
Validation loss = 0.014203493483364582
Validation loss = 0.013736039400100708
Validation loss = 0.01380096934735775
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 772
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 762
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 779
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 760
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 749
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 778
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.02e+03 |
| Iteration     | 13       |
| MaximumReturn | 1.03e+03 |
| MinimumReturn | 986      |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015465915203094482
Validation loss = 0.013530743308365345
Validation loss = 0.013454603962600231
Validation loss = 0.014180140569806099
Validation loss = 0.013664224185049534
Validation loss = 0.013614787720143795
Validation loss = 0.013703088276088238
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016482872888445854
Validation loss = 0.01322864182293415
Validation loss = 0.013386934995651245
Validation loss = 0.013239070773124695
Validation loss = 0.0134400874376297
Validation loss = 0.013080154545605183
Validation loss = 0.013498438522219658
Validation loss = 0.013661411590874195
Validation loss = 0.012918572872877121
Validation loss = 0.01334733609110117
Validation loss = 0.013056465424597263
Validation loss = 0.014006230980157852
Validation loss = 0.013176215812563896
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01454247161746025
Validation loss = 0.014012458734214306
Validation loss = 0.013605515472590923
Validation loss = 0.01371148880571127
Validation loss = 0.014172481372952461
Validation loss = 0.012871813029050827
Validation loss = 0.014097183011472225
Validation loss = 0.012988441623747349
Validation loss = 0.013275284320116043
Validation loss = 0.013821584172546864
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016440916806459427
Validation loss = 0.012991253286600113
Validation loss = 0.012991971336305141
Validation loss = 0.013373839668929577
Validation loss = 0.012636003084480762
Validation loss = 0.013055204413831234
Validation loss = 0.01290805358439684
Validation loss = 0.013351491652429104
Validation loss = 0.013401159085333347
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016600433737039566
Validation loss = 0.013214523904025555
Validation loss = 0.014117526821792126
Validation loss = 0.013823936693370342
Validation loss = 0.0130270691588521
Validation loss = 0.014795185998082161
Validation loss = 0.013260504230856895
Validation loss = 0.012538284994661808
Validation loss = 0.0142397815361619
Validation loss = 0.013074294663965702
Validation loss = 0.01293144654482603
Validation loss = 0.014458521269261837
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 775
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 753
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 755
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 764
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 745
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 766
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.03e+03 |
| Iteration     | 14       |
| MaximumReturn | 1.14e+03 |
| MinimumReturn | 828      |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015710389241576195
Validation loss = 0.01323346234858036
Validation loss = 0.013353392481803894
Validation loss = 0.012940678745508194
Validation loss = 0.014441028237342834
Validation loss = 0.013586631044745445
Validation loss = 0.013059008866548538
Validation loss = 0.012683181092143059
Validation loss = 0.013302040286362171
Validation loss = 0.013345116749405861
Validation loss = 0.013809199444949627
Validation loss = 0.013615644536912441
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013970069587230682
Validation loss = 0.013438113033771515
Validation loss = 0.013093685731291771
Validation loss = 0.013612860813736916
Validation loss = 0.012459434568881989
Validation loss = 0.012995067983865738
Validation loss = 0.012014813721179962
Validation loss = 0.014191010035574436
Validation loss = 0.01253950223326683
Validation loss = 0.01304670237004757
Validation loss = 0.012167418375611305
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014686642214655876
Validation loss = 0.013714080676436424
Validation loss = 0.012609530240297318
Validation loss = 0.012753458693623543
Validation loss = 0.012802617624402046
Validation loss = 0.013602374121546745
Validation loss = 0.012721634469926357
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014748205430805683
Validation loss = 0.014452927745878696
Validation loss = 0.012483939528465271
Validation loss = 0.012410348281264305
Validation loss = 0.012446689419448376
Validation loss = 0.012994324788451195
Validation loss = 0.013075871393084526
Validation loss = 0.013021105900406837
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013275659643113613
Validation loss = 0.01227240078151226
Validation loss = 0.013095538131892681
Validation loss = 0.012711725197732449
Validation loss = 0.012723986059427261
Validation loss = 0.012825973331928253
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 765
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 768
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 762
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 753
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 764
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 770
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.07e+03 |
| Iteration     | 15       |
| MaximumReturn | 1.12e+03 |
| MinimumReturn | 1.05e+03 |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01361473836004734
Validation loss = 0.012507165782153606
Validation loss = 0.01307467371225357
Validation loss = 0.012289715930819511
Validation loss = 0.012635455466806889
Validation loss = 0.013166272081434727
Validation loss = 0.012273628264665604
Validation loss = 0.012385068461298943
Validation loss = 0.01272389106452465
Validation loss = 0.013505227863788605
Validation loss = 0.012339255772531033
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01229112595319748
Validation loss = 0.012710209004580975
Validation loss = 0.012650512158870697
Validation loss = 0.012295668944716454
Validation loss = 0.012472648173570633
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013939001597464085
Validation loss = 0.012603816576302052
Validation loss = 0.0126853222027421
Validation loss = 0.013555015437304974
Validation loss = 0.012535175308585167
Validation loss = 0.012568202801048756
Validation loss = 0.012668324634432793
Validation loss = 0.012425906956195831
Validation loss = 0.012276467867195606
Validation loss = 0.012783520855009556
Validation loss = 0.01275414228439331
Validation loss = 0.012129152193665504
Validation loss = 0.012807883322238922
Validation loss = 0.012298387475311756
Validation loss = 0.012934830971062183
Validation loss = 0.012686021625995636
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012234470807015896
Validation loss = 0.01220776792615652
Validation loss = 0.012483063153922558
Validation loss = 0.013140521943569183
Validation loss = 0.012471708469092846
Validation loss = 0.011919848620891571
Validation loss = 0.012357211671769619
Validation loss = 0.013243160210549831
Validation loss = 0.011771582067012787
Validation loss = 0.012069444172084332
Validation loss = 0.012375213205814362
Validation loss = 0.011880473233759403
Validation loss = 0.012165548279881477
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012815827503800392
Validation loss = 0.012259839102625847
Validation loss = 0.011964714154601097
Validation loss = 0.011859404854476452
Validation loss = 0.012402698397636414
Validation loss = 0.012612744234502316
Validation loss = 0.013056383468210697
Validation loss = 0.012576460838317871
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 750
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 752
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 754
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 766
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 773
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 761
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.04e+03 |
| Iteration     | 16       |
| MaximumReturn | 1.1e+03  |
| MinimumReturn | 997      |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014018912799656391
Validation loss = 0.011848744004964828
Validation loss = 0.012839343398809433
Validation loss = 0.012000754475593567
Validation loss = 0.012041255831718445
Validation loss = 0.012001868337392807
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012928370386362076
Validation loss = 0.012297961860895157
Validation loss = 0.011967707425355911
Validation loss = 0.01227018516510725
Validation loss = 0.012843001633882523
Validation loss = 0.011912841349840164
Validation loss = 0.01179652288556099
Validation loss = 0.013131175190210342
Validation loss = 0.01136587280780077
Validation loss = 0.012703461572527885
Validation loss = 0.01182811800390482
Validation loss = 0.012294149026274681
Validation loss = 0.01214814092963934
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013229730539023876
Validation loss = 0.012457717210054398
Validation loss = 0.011960129253566265
Validation loss = 0.01242042239755392
Validation loss = 0.011730818077921867
Validation loss = 0.011809278279542923
Validation loss = 0.011627920903265476
Validation loss = 0.012104952707886696
Validation loss = 0.012158050201833248
Validation loss = 0.011887436732649803
Validation loss = 0.011968623846769333
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01249523926526308
Validation loss = 0.011703123338520527
Validation loss = 0.01170679647475481
Validation loss = 0.012384692206978798
Validation loss = 0.012365696020424366
Validation loss = 0.01151141058653593
Validation loss = 0.011926607228815556
Validation loss = 0.01241413876414299
Validation loss = 0.011234419420361519
Validation loss = 0.012770853005349636
Validation loss = 0.012028892524540424
Validation loss = 0.011796567589044571
Validation loss = 0.01129904855042696
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012879534624516964
Validation loss = 0.011796996928751469
Validation loss = 0.012288402765989304
Validation loss = 0.011475831270217896
Validation loss = 0.011681399308145046
Validation loss = 0.011823367327451706
Validation loss = 0.011541824787855148
Validation loss = 0.012001002207398415
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 767
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 762
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 756
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 771
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 783
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 766
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.04e+03 |
| Iteration     | 17       |
| MaximumReturn | 1.14e+03 |
| MinimumReturn | 895      |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013365763239562511
Validation loss = 0.011843430809676647
Validation loss = 0.012030528858304024
Validation loss = 0.01220167800784111
Validation loss = 0.012201664038002491
Validation loss = 0.011592444963753223
Validation loss = 0.01170954667031765
Validation loss = 0.012120560742914677
Validation loss = 0.011533835902810097
Validation loss = 0.011570398695766926
Validation loss = 0.012383040972054005
Validation loss = 0.011316454038023949
Validation loss = 0.011602574028074741
Validation loss = 0.011299997568130493
Validation loss = 0.012952183373272419
Validation loss = 0.011429552920162678
Validation loss = 0.011941153556108475
Validation loss = 0.011178087443113327
Validation loss = 0.011764878407120705
Validation loss = 0.012637310661375523
Validation loss = 0.010565702803432941
Validation loss = 0.011629167944192886
Validation loss = 0.010731538757681847
Validation loss = 0.011008192785084248
Validation loss = 0.011818092316389084
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013166463933885098
Validation loss = 0.01146694365888834
Validation loss = 0.011468181386590004
Validation loss = 0.01237388513982296
Validation loss = 0.011474063619971275
Validation loss = 0.011191180907189846
Validation loss = 0.011803992092609406
Validation loss = 0.010980444960296154
Validation loss = 0.012307470664381981
Validation loss = 0.012456885538995266
Validation loss = 0.011198706924915314
Validation loss = 0.012646229937672615
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012383858673274517
Validation loss = 0.012536478228867054
Validation loss = 0.01147662103176117
Validation loss = 0.010980588383972645
Validation loss = 0.011270121671259403
Validation loss = 0.010951181873679161
Validation loss = 0.011546229012310505
Validation loss = 0.01125732809305191
Validation loss = 0.012022547423839569
Validation loss = 0.01155814528465271
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011979836039245129
Validation loss = 0.011009189300239086
Validation loss = 0.010836923494935036
Validation loss = 0.011295638047158718
Validation loss = 0.010916832834482193
Validation loss = 0.011135008186101913
Validation loss = 0.010943271219730377
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012740127742290497
Validation loss = 0.011335067451000214
Validation loss = 0.01232245285063982
Validation loss = 0.01174174901098013
Validation loss = 0.011470380239188671
Validation loss = 0.01112978532910347
Validation loss = 0.011576873250305653
Validation loss = 0.011386574245989323
Validation loss = 0.011693846434354782
Validation loss = 0.011636805720627308
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 799
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 789
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 780
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 787
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 796
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 775
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.03e+03 |
| Iteration     | 18       |
| MaximumReturn | 1.08e+03 |
| MinimumReturn | 963      |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011705730110406876
Validation loss = 0.010769817978143692
Validation loss = 0.012030543759465218
Validation loss = 0.010777292773127556
Validation loss = 0.010723480954766273
Validation loss = 0.011697681620717049
Validation loss = 0.011328824795782566
Validation loss = 0.010762190446257591
Validation loss = 0.010929204523563385
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011363210156559944
Validation loss = 0.010755695402622223
Validation loss = 0.01079908013343811
Validation loss = 0.011357363313436508
Validation loss = 0.010551848448812962
Validation loss = 0.010832059197127819
Validation loss = 0.010985351167619228
Validation loss = 0.01074131764471531
Validation loss = 0.010731438174843788
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012488849461078644
Validation loss = 0.011575156822800636
Validation loss = 0.01147228479385376
Validation loss = 0.011169472709298134
Validation loss = 0.010918541811406612
Validation loss = 0.010746315121650696
Validation loss = 0.010428732261061668
Validation loss = 0.010621579363942146
Validation loss = 0.010955694131553173
Validation loss = 0.011033190414309502
Validation loss = 0.011721148155629635
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011287117376923561
Validation loss = 0.011572135612368584
Validation loss = 0.010617861524224281
Validation loss = 0.010679027065634727
Validation loss = 0.01035421621054411
Validation loss = 0.010938798077404499
Validation loss = 0.010804899036884308
Validation loss = 0.011100338771939278
Validation loss = 0.010765944607555866
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013498073443770409
Validation loss = 0.010471852496266365
Validation loss = 0.011501527391374111
Validation loss = 0.011679057031869888
Validation loss = 0.011116566136479378
Validation loss = 0.011695298366248608
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 798
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 789
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 789
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 794
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 799
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 770
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.04e+03 |
| Iteration     | 19       |
| MaximumReturn | 1.07e+03 |
| MinimumReturn | 997      |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01178727112710476
Validation loss = 0.011066476814448833
Validation loss = 0.011183910071849823
Validation loss = 0.010973398573696613
Validation loss = 0.010698080994188786
Validation loss = 0.010691804811358452
Validation loss = 0.011267145164310932
Validation loss = 0.010704430751502514
Validation loss = 0.009861608035862446
Validation loss = 0.01101031806319952
Validation loss = 0.010040461085736752
Validation loss = 0.010036646388471127
Validation loss = 0.011090145446360111
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011146835051476955
Validation loss = 0.010396585799753666
Validation loss = 0.010511939413845539
Validation loss = 0.010325310751795769
Validation loss = 0.010506018996238708
Validation loss = 0.010489251464605331
Validation loss = 0.01101021096110344
Validation loss = 0.010595143772661686
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011018853634595871
Validation loss = 0.010407776571810246
Validation loss = 0.010387797839939594
Validation loss = 0.01066708005964756
Validation loss = 0.010423590429127216
Validation loss = 0.010819238610565662
Validation loss = 0.010511822998523712
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011117503978312016
Validation loss = 0.01103133987635374
Validation loss = 0.010027115233242512
Validation loss = 0.010279272682964802
Validation loss = 0.009941095486283302
Validation loss = 0.010534320957958698
Validation loss = 0.010441644117236137
Validation loss = 0.010344208218157291
Validation loss = 0.010173855349421501
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010999858379364014
Validation loss = 0.010610513389110565
Validation loss = 0.010611687786877155
Validation loss = 0.011129052378237247
Validation loss = 0.01133796013891697
Validation loss = 0.010281745344400406
Validation loss = 0.010683395899832249
Validation loss = 0.010522495955228806
Validation loss = 0.010530464351177216
Validation loss = 0.011207330971956253
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 784
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 780
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 763
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 788
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 776
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 786
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.01e+03 |
| Iteration     | 20       |
| MaximumReturn | 1.06e+03 |
| MinimumReturn | 920      |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011252827942371368
Validation loss = 0.011224888265132904
Validation loss = 0.010199381969869137
Validation loss = 0.010106030851602554
Validation loss = 0.010620144195854664
Validation loss = 0.0098313819617033
Validation loss = 0.010018070228397846
Validation loss = 0.00981784239411354
Validation loss = 0.010286428034305573
Validation loss = 0.010419398546218872
Validation loss = 0.010070164687931538
Validation loss = 0.010494972579181194
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010660778731107712
Validation loss = 0.010037822648882866
Validation loss = 0.010703707113862038
Validation loss = 0.010320263914763927
Validation loss = 0.010610407218337059
Validation loss = 0.010402188636362553
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01039707101881504
Validation loss = 0.010171801783144474
Validation loss = 0.01005913969129324
Validation loss = 0.010506770573556423
Validation loss = 0.011258687824010849
Validation loss = 0.010369689203798771
Validation loss = 0.010350777767598629
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010014827363193035
Validation loss = 0.010460229590535164
Validation loss = 0.010453935712575912
Validation loss = 0.01032108161598444
Validation loss = 0.009876394644379616
Validation loss = 0.010438909754157066
Validation loss = 0.010672766715288162
Validation loss = 0.010227065533399582
Validation loss = 0.010633920319378376
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0105048306286335
Validation loss = 0.010718848556280136
Validation loss = 0.010222550481557846
Validation loss = 0.010847504250705242
Validation loss = 0.010564450174570084
Validation loss = 0.01044503878802061
Validation loss = 0.010013241320848465
Validation loss = 0.010666795074939728
Validation loss = 0.00975594762712717
Validation loss = 0.010077432729303837
Validation loss = 0.011181599460542202
Validation loss = 0.010057959705591202
Validation loss = 0.009944274090230465
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 765
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 772
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 777
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 792
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 780
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 789
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.06e+03 |
| Iteration     | 21       |
| MaximumReturn | 1.09e+03 |
| MinimumReturn | 985      |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01029899064451456
Validation loss = 0.00959288701415062
Validation loss = 0.010068068280816078
Validation loss = 0.009905190207064152
Validation loss = 0.010121486149728298
Validation loss = 0.009652542881667614
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01027910877019167
Validation loss = 0.010108491405844688
Validation loss = 0.010490290820598602
Validation loss = 0.009679091162979603
Validation loss = 0.010140582919120789
Validation loss = 0.009757579304277897
Validation loss = 0.010067842900753021
Validation loss = 0.009792705066502094
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009956466034054756
Validation loss = 0.009677914902567863
Validation loss = 0.009770537726581097
Validation loss = 0.010257471352815628
Validation loss = 0.010325534269213676
Validation loss = 0.010111387819051743
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009795676916837692
Validation loss = 0.009695901535451412
Validation loss = 0.010381842963397503
Validation loss = 0.00995350070297718
Validation loss = 0.00930183008313179
Validation loss = 0.009796260856091976
Validation loss = 0.009545715525746346
Validation loss = 0.010082033462822437
Validation loss = 0.009618908166885376
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010557984933257103
Validation loss = 0.009880492463707924
Validation loss = 0.010301289148628712
Validation loss = 0.009976739063858986
Validation loss = 0.009539187885820866
Validation loss = 0.009786790236830711
Validation loss = 0.009268221445381641
Validation loss = 0.009533596225082874
Validation loss = 0.010123730637133121
Validation loss = 0.00947293359786272
Validation loss = 0.009816590696573257
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 786
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 799
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 811
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 807
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 779
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 811
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.08e+03 |
| Iteration     | 22       |
| MaximumReturn | 1.12e+03 |
| MinimumReturn | 1.02e+03 |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010038813576102257
Validation loss = 0.009779435582458973
Validation loss = 0.009597036987543106
Validation loss = 0.009602769277989864
Validation loss = 0.010391813702881336
Validation loss = 0.01009664312005043
Validation loss = 0.009174044243991375
Validation loss = 0.009660185314714909
Validation loss = 0.009543420746922493
Validation loss = 0.009652874432504177
Validation loss = 0.010535667650401592
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009974529035389423
Validation loss = 0.00959873665124178
Validation loss = 0.00957738608121872
Validation loss = 0.009468434378504753
Validation loss = 0.010539104230701923
Validation loss = 0.009512807242572308
Validation loss = 0.009853688068687916
Validation loss = 0.010013687424361706
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009849279187619686
Validation loss = 0.00964707788079977
Validation loss = 0.009869729168713093
Validation loss = 0.00960592646151781
Validation loss = 0.010264893062412739
Validation loss = 0.009809442795813084
Validation loss = 0.010405993089079857
Validation loss = 0.009637643583118916
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00929939839988947
Validation loss = 0.009284059517085552
Validation loss = 0.009502723813056946
Validation loss = 0.009551173076033592
Validation loss = 0.009456506930291653
Validation loss = 0.00961031299084425
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009846433065831661
Validation loss = 0.009389056824147701
Validation loss = 0.009806260466575623
Validation loss = 0.009639643132686615
Validation loss = 0.010064211674034595
Validation loss = 0.009423215873539448
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 795
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 778
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 777
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 781
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 774
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 784
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.03e+03 |
| Iteration     | 23       |
| MaximumReturn | 1.11e+03 |
| MinimumReturn | 937      |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009937532246112823
Validation loss = 0.009177127853035927
Validation loss = 0.009238672442734241
Validation loss = 0.009208687581121922
Validation loss = 0.009071492590010166
Validation loss = 0.009019744582474232
Validation loss = 0.009270220063626766
Validation loss = 0.009172143414616585
Validation loss = 0.009274845942854881
Validation loss = 0.009237592108547688
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009693046100437641
Validation loss = 0.009614204987883568
Validation loss = 0.009427337907254696
Validation loss = 0.009270897135138512
Validation loss = 0.009740295819938183
Validation loss = 0.009162637405097485
Validation loss = 0.009721673093736172
Validation loss = 0.009308088570833206
Validation loss = 0.009736360050737858
Validation loss = 0.009437421336770058
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009403344243764877
Validation loss = 0.009008843451738358
Validation loss = 0.009216729551553726
Validation loss = 0.009453426115214825
Validation loss = 0.009023788385093212
Validation loss = 0.009061062708497047
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009223883971571922
Validation loss = 0.009116464294493198
Validation loss = 0.00954482052475214
Validation loss = 0.008661267347633839
Validation loss = 0.008927092887461185
Validation loss = 0.009208635427057743
Validation loss = 0.009052113629877567
Validation loss = 0.009052149951457977
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009730251505970955
Validation loss = 0.011299136094748974
Validation loss = 0.009571341797709465
Validation loss = 0.010782507248222828
Validation loss = 0.009484436362981796
Validation loss = 0.009428286924958229
Validation loss = 0.009537803940474987
Validation loss = 0.00913872942328453
Validation loss = 0.009074301458895206
Validation loss = 0.009234480559825897
Validation loss = 0.009773069992661476
Validation loss = 0.009031469002366066
Validation loss = 0.008955404162406921
Validation loss = 0.009088817983865738
Validation loss = 0.00876719318330288
Validation loss = 0.008926673792302608
Validation loss = 0.008950809016823769
Validation loss = 0.00898488238453865
Validation loss = 0.008995793759822845
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 785
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 791
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 793
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 775
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 788
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 795
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.1e+03  |
| Iteration     | 24       |
| MaximumReturn | 1.22e+03 |
| MinimumReturn | 1.03e+03 |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009097238071262836
Validation loss = 0.009110637940466404
Validation loss = 0.009537821635603905
Validation loss = 0.009913637302815914
Validation loss = 0.00900827907025814
Validation loss = 0.00938540417701006
Validation loss = 0.008675945922732353
Validation loss = 0.00867465790361166
Validation loss = 0.008944641798734665
Validation loss = 0.008593853563070297
Validation loss = 0.008615909144282341
Validation loss = 0.008718789555132389
Validation loss = 0.009061555378139019
Validation loss = 0.008481670171022415
Validation loss = 0.008430593647062778
Validation loss = 0.008827797137200832
Validation loss = 0.009026847779750824
Validation loss = 0.008638797327876091
Validation loss = 0.009534914046525955
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009354045614600182
Validation loss = 0.00907590240240097
Validation loss = 0.00914088822901249
Validation loss = 0.00917181745171547
Validation loss = 0.009441127069294453
Validation loss = 0.009780068881809711
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009859621524810791
Validation loss = 0.009205412119626999
Validation loss = 0.00923868641257286
Validation loss = 0.008805083110928535
Validation loss = 0.008920824155211449
Validation loss = 0.009130545891821384
Validation loss = 0.009113660082221031
Validation loss = 0.008804957382380962
Validation loss = 0.00941393245011568
Validation loss = 0.008815813809633255
Validation loss = 0.008737500756978989
Validation loss = 0.008764859288930893
Validation loss = 0.009005733765661716
Validation loss = 0.009058359079062939
Validation loss = 0.008504020981490612
Validation loss = 0.008934869430959225
Validation loss = 0.009122692048549652
Validation loss = 0.009073027409613132
Validation loss = 0.008630982600152493
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008884136565029621
Validation loss = 0.009200423024594784
Validation loss = 0.008893854916095734
Validation loss = 0.009061119519174099
Validation loss = 0.009031056426465511
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009712222963571548
Validation loss = 0.008818023838102818
Validation loss = 0.0091659314930439
Validation loss = 0.008548669517040253
Validation loss = 0.008739588782191277
Validation loss = 0.009115813300013542
Validation loss = 0.008746493607759476
Validation loss = 0.008913996629416943
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 753
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 773
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 780
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 776
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 770
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 781
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.08e+03 |
| Iteration     | 25       |
| MaximumReturn | 1.14e+03 |
| MinimumReturn | 1.03e+03 |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008783068507909775
Validation loss = 0.008523152209818363
Validation loss = 0.008896440267562866
Validation loss = 0.008265828713774681
Validation loss = 0.0089137377217412
Validation loss = 0.00917936023324728
Validation loss = 0.0077753146179020405
Validation loss = 0.009160027839243412
Validation loss = 0.008171156980097294
Validation loss = 0.008811493404209614
Validation loss = 0.008599098771810532
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009523318149149418
Validation loss = 0.0087730772793293
Validation loss = 0.008693497627973557
Validation loss = 0.008801691234111786
Validation loss = 0.008819563314318657
Validation loss = 0.009098113514482975
Validation loss = 0.008669382892549038
Validation loss = 0.00881121214479208
Validation loss = 0.008902515284717083
Validation loss = 0.008983870036900043
Validation loss = 0.009555579163134098
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008767970837652683
Validation loss = 0.00905909575521946
Validation loss = 0.008306995965540409
Validation loss = 0.008233295753598213
Validation loss = 0.008463671430945396
Validation loss = 0.008294987492263317
Validation loss = 0.009021210484206676
Validation loss = 0.008578265085816383
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00912562571465969
Validation loss = 0.00885019451379776
Validation loss = 0.00842519011348486
Validation loss = 0.008789400570094585
Validation loss = 0.008478826843202114
Validation loss = 0.008491895161569118
Validation loss = 0.00848136655986309
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008846604265272617
Validation loss = 0.00849712360650301
Validation loss = 0.008224191144108772
Validation loss = 0.008685571141541004
Validation loss = 0.008793777786195278
Validation loss = 0.008512363769114017
Validation loss = 0.00868405494838953
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 774
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 777
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 782
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 785
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 781
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 785
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.09e+03 |
| Iteration     | 26       |
| MaximumReturn | 1.14e+03 |
| MinimumReturn | 1.04e+03 |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008054576814174652
Validation loss = 0.008100886829197407
Validation loss = 0.008448965847492218
Validation loss = 0.008065399713814259
Validation loss = 0.008191773667931557
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008848614059388638
Validation loss = 0.008388342335820198
Validation loss = 0.00896748248487711
Validation loss = 0.008707492612302303
Validation loss = 0.008496933616697788
Validation loss = 0.008978285826742649
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008443055674433708
Validation loss = 0.008183075115084648
Validation loss = 0.008117604069411755
Validation loss = 0.008447797037661076
Validation loss = 0.008199797011911869
Validation loss = 0.008269802667200565
Validation loss = 0.007790129631757736
Validation loss = 0.008671759627759457
Validation loss = 0.007955135777592659
Validation loss = 0.008091413415968418
Validation loss = 0.008338752202689648
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008318303152918816
Validation loss = 0.008675343357026577
Validation loss = 0.008399521932005882
Validation loss = 0.008536177687346935
Validation loss = 0.008461522869765759
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008289608173072338
Validation loss = 0.008484660647809505
Validation loss = 0.007884522899985313
Validation loss = 0.008093745447695255
Validation loss = 0.008037908934056759
Validation loss = 0.008129698224365711
Validation loss = 0.0081942742690444
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 787
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 786
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 795
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 788
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 791
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 792
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.05e+03 |
| Iteration     | 27       |
| MaximumReturn | 1.07e+03 |
| MinimumReturn | 1.03e+03 |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008221023716032505
Validation loss = 0.007726431358605623
Validation loss = 0.009089983068406582
Validation loss = 0.008214827626943588
Validation loss = 0.008109930902719498
Validation loss = 0.007760644890367985
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00851155910640955
Validation loss = 0.00844829622656107
Validation loss = 0.008794602006673813
Validation loss = 0.008885644376277924
Validation loss = 0.008094759657979012
Validation loss = 0.008295398205518723
Validation loss = 0.00803812500089407
Validation loss = 0.008330362848937511
Validation loss = 0.00811734702438116
Validation loss = 0.008017607033252716
Validation loss = 0.0078581552952528
Validation loss = 0.008367224596440792
Validation loss = 0.00800956692546606
Validation loss = 0.008173277601599693
Validation loss = 0.00811744574457407
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008221270516514778
Validation loss = 0.007725397124886513
Validation loss = 0.008435767143964767
Validation loss = 0.007879028096795082
Validation loss = 0.008682157844305038
Validation loss = 0.00793447345495224
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008232751861214638
Validation loss = 0.008445877581834793
Validation loss = 0.008131518959999084
Validation loss = 0.00805998221039772
Validation loss = 0.008252774365246296
Validation loss = 0.008296252228319645
Validation loss = 0.008376426063477993
Validation loss = 0.00809211190789938
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009116859175264835
Validation loss = 0.00808120146393776
Validation loss = 0.00914859864860773
Validation loss = 0.008034375496208668
Validation loss = 0.008386480621993542
Validation loss = 0.00873036589473486
Validation loss = 0.008209756575524807
Validation loss = 0.008243533782660961
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 780
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 775
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 777
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 768
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 792
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 779
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.1e+03  |
| Iteration     | 28       |
| MaximumReturn | 1.15e+03 |
| MinimumReturn | 1.05e+03 |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00818378571420908
Validation loss = 0.007804368156939745
Validation loss = 0.007900627329945564
Validation loss = 0.007927123457193375
Validation loss = 0.0072799380868673325
Validation loss = 0.007915330119431019
Validation loss = 0.00787047017365694
Validation loss = 0.007872071117162704
Validation loss = 0.007933506742119789
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008140136487782001
Validation loss = 0.007866562344133854
Validation loss = 0.008093302138149738
Validation loss = 0.008481724187731743
Validation loss = 0.008767962455749512
Validation loss = 0.007759775500744581
Validation loss = 0.007890578359365463
Validation loss = 0.008261824026703835
Validation loss = 0.007662941236048937
Validation loss = 0.00788677018135786
Validation loss = 0.007813883014023304
Validation loss = 0.007710279431194067
Validation loss = 0.007994712330400944
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008084635250270367
Validation loss = 0.007739483844488859
Validation loss = 0.008505342528223991
Validation loss = 0.008069363422691822
Validation loss = 0.008225355297327042
Validation loss = 0.00753345200791955
Validation loss = 0.007468775380402803
Validation loss = 0.008484364487230778
Validation loss = 0.007830213755369186
Validation loss = 0.007877040654420853
Validation loss = 0.007680256385356188
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008001206442713737
Validation loss = 0.008247938007116318
Validation loss = 0.007841759361326694
Validation loss = 0.007995194755494595
Validation loss = 0.008268265053629875
Validation loss = 0.00836261734366417
Validation loss = 0.007899453863501549
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008069857023656368
Validation loss = 0.00794074684381485
Validation loss = 0.007934093475341797
Validation loss = 0.008104624226689339
Validation loss = 0.007635323330760002
Validation loss = 0.007675379980355501
Validation loss = 0.008153255097568035
Validation loss = 0.007991867139935493
Validation loss = 0.008097932673990726
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 774
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 778
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 788
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 780
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 788
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 790
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.08e+03 |
| Iteration     | 29       |
| MaximumReturn | 1.16e+03 |
| MinimumReturn | 1.02e+03 |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007894242182374
Validation loss = 0.007616388611495495
Validation loss = 0.007343528792262077
Validation loss = 0.00800264347344637
Validation loss = 0.007332858629524708
Validation loss = 0.007984518073499203
Validation loss = 0.007783815730363131
Validation loss = 0.007380512077361345
Validation loss = 0.007204277906566858
Validation loss = 0.007653701584786177
Validation loss = 0.007788046263158321
Validation loss = 0.007333706133067608
Validation loss = 0.007734501268714666
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007912304252386093
Validation loss = 0.00755856279283762
Validation loss = 0.007934052497148514
Validation loss = 0.007908869534730911
Validation loss = 0.00771296164020896
Validation loss = 0.008379745297133923
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00814803782850504
Validation loss = 0.007656121626496315
Validation loss = 0.007550844457000494
Validation loss = 0.007667194586247206
Validation loss = 0.007712232414633036
Validation loss = 0.007547528948634863
Validation loss = 0.0071971360594034195
Validation loss = 0.00785832479596138
Validation loss = 0.0082047488540411
Validation loss = 0.007243148051202297
Validation loss = 0.007410903926938772
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008280902169644833
Validation loss = 0.008073589764535427
Validation loss = 0.007888381369411945
Validation loss = 0.007766743190586567
Validation loss = 0.007702480535954237
Validation loss = 0.007688520941883326
Validation loss = 0.007312707137316465
Validation loss = 0.007425883784890175
Validation loss = 0.007789728697389364
Validation loss = 0.0076545123010873795
Validation loss = 0.007916382513940334
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007750634104013443
Validation loss = 0.007671498227864504
Validation loss = 0.00794505700469017
Validation loss = 0.00746620399877429
Validation loss = 0.00765472836792469
Validation loss = 0.007321486249566078
Validation loss = 0.007579730357974768
Validation loss = 0.00810405146330595
Validation loss = 0.007721561472862959
Validation loss = 0.007515812758356333
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 755
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 766
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 766
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 777
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 772
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 767
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.08e+03 |
| Iteration     | 30       |
| MaximumReturn | 1.1e+03  |
| MinimumReturn | 1.03e+03 |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0075328536331653595
Validation loss = 0.007734042126685381
Validation loss = 0.007315049879252911
Validation loss = 0.007702279835939407
Validation loss = 0.007433425169438124
Validation loss = 0.007189261727035046
Validation loss = 0.0073238336481153965
Validation loss = 0.007253129966557026
Validation loss = 0.007367111276835203
Validation loss = 0.007220820523798466
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007631867658346891
Validation loss = 0.007470349315553904
Validation loss = 0.007596683222800493
Validation loss = 0.007686404045671225
Validation loss = 0.0074438899755477905
Validation loss = 0.007799241691827774
Validation loss = 0.007506427355110645
Validation loss = 0.007580648176372051
Validation loss = 0.007133373990654945
Validation loss = 0.0073842499405145645
Validation loss = 0.007560565136373043
Validation loss = 0.008474545553326607
Validation loss = 0.007384520024061203
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0074101099744439125
Validation loss = 0.007636206690222025
Validation loss = 0.007073691114783287
Validation loss = 0.00710810162127018
Validation loss = 0.0073185451328754425
Validation loss = 0.007905879057943821
Validation loss = 0.007165746763348579
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007607738953083754
Validation loss = 0.007278675213456154
Validation loss = 0.00736640952527523
Validation loss = 0.007796166464686394
Validation loss = 0.0075437892228364944
Validation loss = 0.007595415227115154
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0074842567555606365
Validation loss = 0.007487722206860781
Validation loss = 0.007929997518658638
Validation loss = 0.007329738233238459
Validation loss = 0.0074698650278151035
Validation loss = 0.007370878476649523
Validation loss = 0.0073659373447299
Validation loss = 0.0070519279688596725
Validation loss = 0.007325754500925541
Validation loss = 0.007489871233701706
Validation loss = 0.007375848479568958
Validation loss = 0.007337935268878937
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 757
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 778
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 751
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 751
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 757
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 762
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.13e+03 |
| Iteration     | 31       |
| MaximumReturn | 1.18e+03 |
| MinimumReturn | 1.09e+03 |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007758105173707008
Validation loss = 0.007085599470883608
Validation loss = 0.007259635254740715
Validation loss = 0.006989734712988138
Validation loss = 0.007308360189199448
Validation loss = 0.006834103725850582
Validation loss = 0.007179061882197857
Validation loss = 0.007869020104408264
Validation loss = 0.007121512666344643
Validation loss = 0.006763399578630924
Validation loss = 0.007096716668456793
Validation loss = 0.007215892430394888
Validation loss = 0.007244200445711613
Validation loss = 0.007016763091087341
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007184305228292942
Validation loss = 0.007435701787471771
Validation loss = 0.007517135702073574
Validation loss = 0.00738632632419467
Validation loss = 0.006997110787779093
Validation loss = 0.007131724618375301
Validation loss = 0.00744662107899785
Validation loss = 0.007600186392664909
Validation loss = 0.007187543902546167
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007459331303834915
Validation loss = 0.006950534880161285
Validation loss = 0.006995866075158119
Validation loss = 0.007091851439327002
Validation loss = 0.007244054228067398
Validation loss = 0.007522005587816238
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007265211548656225
Validation loss = 0.007189524360001087
Validation loss = 0.0074418289586901665
Validation loss = 0.007325466722249985
Validation loss = 0.007378504611551762
Validation loss = 0.007532557938247919
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007253322750329971
Validation loss = 0.007247157394886017
Validation loss = 0.008181363344192505
Validation loss = 0.007073673419654369
Validation loss = 0.007309498731046915
Validation loss = 0.007003088481724262
Validation loss = 0.007194567937403917
Validation loss = 0.007211481221020222
Validation loss = 0.0075587620958685875
Validation loss = 0.007447577081620693
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 748
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 753
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 754
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 751
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 765
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 762
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.08e+03 |
| Iteration     | 32       |
| MaximumReturn | 1.12e+03 |
| MinimumReturn | 995      |
| TotalSamples  | 136000   |
----------------------------
