Logging to experiments/half_cheetah/oct29/w350e3_seed2314
Print configuration .....
{'env_name': 'half_cheetah', 'random_seeds': [4321, 2314, 2341, 3421], 'save_variables': False, 'model_save_dir': '/tmp/half_cheetah_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 0
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 0
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.39886730909347534
Validation loss = 0.13649028539657593
Validation loss = 0.08740313351154327
Validation loss = 0.066469207406044
Validation loss = 0.05372481793165207
Validation loss = 0.048100948333740234
Validation loss = 0.046462878584861755
Validation loss = 0.04218262806534767
Validation loss = 0.042856425046920776
Validation loss = 0.0391291081905365
Validation loss = 0.03898376226425171
Validation loss = 0.039990149438381195
Validation loss = 0.03801310807466507
Validation loss = 0.04650803282856941
Validation loss = 0.03322119265794754
Validation loss = 0.0347181111574173
Validation loss = 0.03278703987598419
Validation loss = 0.036387279629707336
Validation loss = 0.032763898372650146
Validation loss = 0.031067322939634323
Validation loss = 0.038550205528736115
Validation loss = 0.03381381183862686
Validation loss = 0.029894735664129257
Validation loss = 0.032472312450408936
Validation loss = 0.033935800194740295
Validation loss = 0.031194768846035004
Validation loss = 0.03715090826153755
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.36341947317123413
Validation loss = 0.13097772002220154
Validation loss = 0.08244286477565765
Validation loss = 0.060617782175540924
Validation loss = 0.050687581300735474
Validation loss = 0.04444793611764908
Validation loss = 0.04445721209049225
Validation loss = 0.04157612472772598
Validation loss = 0.03999593108892441
Validation loss = 0.036180414259433746
Validation loss = 0.03604324907064438
Validation loss = 0.03513503819704056
Validation loss = 0.03875979408621788
Validation loss = 0.03280965983867645
Validation loss = 0.03255964815616608
Validation loss = 0.0323643758893013
Validation loss = 0.034622013568878174
Validation loss = 0.03275284171104431
Validation loss = 0.033272191882133484
Validation loss = 0.04022423177957535
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7096234560012817
Validation loss = 0.125383198261261
Validation loss = 0.08126267045736313
Validation loss = 0.0611802339553833
Validation loss = 0.05090928077697754
Validation loss = 0.04894373193383217
Validation loss = 0.0442894771695137
Validation loss = 0.05265960842370987
Validation loss = 0.039281897246837616
Validation loss = 0.043002642691135406
Validation loss = 0.035602424293756485
Validation loss = 0.04226865991950035
Validation loss = 0.03345528990030289
Validation loss = 0.03713775798678398
Validation loss = 0.035769395530223846
Validation loss = 0.034526072442531586
Validation loss = 0.03663327917456627
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.402626097202301
Validation loss = 0.12504449486732483
Validation loss = 0.08000899851322174
Validation loss = 0.05869556963443756
Validation loss = 0.04896984249353409
Validation loss = 0.04413894563913345
Validation loss = 0.04172380268573761
Validation loss = 0.039095476269721985
Validation loss = 0.037955038249492645
Validation loss = 0.03826051205396652
Validation loss = 0.04209555312991142
Validation loss = 0.0350133553147316
Validation loss = 0.0349770188331604
Validation loss = 0.03335832431912422
Validation loss = 0.038427017629146576
Validation loss = 0.03254293277859688
Validation loss = 0.03523232787847519
Validation loss = 0.03216995671391487
Validation loss = 0.032334230840206146
Validation loss = 0.037388868629932404
Validation loss = 0.032050106674432755
Validation loss = 0.029170971363782883
Validation loss = 0.030269473791122437
Validation loss = 0.029627084732055664
Validation loss = 0.0312114916741848
Validation loss = 0.02932675927877426
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.9007323980331421
Validation loss = 0.12792104482650757
Validation loss = 0.08403532207012177
Validation loss = 0.061887383460998535
Validation loss = 0.051115505397319794
Validation loss = 0.04508687183260918
Validation loss = 0.04728206247091293
Validation loss = 0.0405733585357666
Validation loss = 0.037480928003787994
Validation loss = 0.04061632975935936
Validation loss = 0.035369500517845154
Validation loss = 0.03696718066930771
Validation loss = 0.034021660685539246
Validation loss = 0.03269071877002716
Validation loss = 0.03233348950743675
Validation loss = 0.03077317774295807
Validation loss = 0.041591521352529526
Validation loss = 0.03244244307279587
Validation loss = 0.03127099573612213
Validation loss = 0.03438925743103027
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 129
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 66
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 132
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 100
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 129
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 77
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -394     |
| Iteration     | 0        |
| MaximumReturn | -365     |
| MinimumReturn | -434     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16248267889022827
Validation loss = 0.08450792729854584
Validation loss = 0.07649752497673035
Validation loss = 0.069877490401268
Validation loss = 0.06960316002368927
Validation loss = 0.06570156663656235
Validation loss = 0.06618502736091614
Validation loss = 0.07189712673425674
Validation loss = 0.06327547878026962
Validation loss = 0.06763678044080734
Validation loss = 0.06667136400938034
Validation loss = 0.06483082473278046
Validation loss = 0.06109269708395004
Validation loss = 0.06500543653964996
Validation loss = 0.061062321066856384
Validation loss = 0.06107306480407715
Validation loss = 0.06313863396644592
Validation loss = 0.07188554108142853
Validation loss = 0.06506790965795517
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14208832383155823
Validation loss = 0.08108308166265488
Validation loss = 0.07540812343358994
Validation loss = 0.07118351012468338
Validation loss = 0.07101184129714966
Validation loss = 0.06681956350803375
Validation loss = 0.07003149390220642
Validation loss = 0.07142916321754456
Validation loss = 0.06419680267572403
Validation loss = 0.06584027409553528
Validation loss = 0.06809503585100174
Validation loss = 0.06972351670265198
Validation loss = 0.08052806556224823
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14421412348747253
Validation loss = 0.08156083524227142
Validation loss = 0.07370343804359436
Validation loss = 0.07214953005313873
Validation loss = 0.08051329106092453
Validation loss = 0.06578738987445831
Validation loss = 0.0649118572473526
Validation loss = 0.07090585678815842
Validation loss = 0.06648389250040054
Validation loss = 0.06490013003349304
Validation loss = 0.06446388363838196
Validation loss = 0.06764901429414749
Validation loss = 0.06606435775756836
Validation loss = 0.06607924401760101
Validation loss = 0.06647029519081116
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.15424075722694397
Validation loss = 0.07971010357141495
Validation loss = 0.07307115942239761
Validation loss = 0.07233390212059021
Validation loss = 0.06844035536050797
Validation loss = 0.07061418890953064
Validation loss = 0.06555341184139252
Validation loss = 0.06273313611745834
Validation loss = 0.07642106711864471
Validation loss = 0.08935302495956421
Validation loss = 0.060208626091480255
Validation loss = 0.06014537066221237
Validation loss = 0.06337534636259079
Validation loss = 0.06271180510520935
Validation loss = 0.05926021933555603
Validation loss = 0.062344230711460114
Validation loss = 0.06356923282146454
Validation loss = 0.06243261322379112
Validation loss = 0.06162078306078911
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14754700660705566
Validation loss = 0.07904452830553055
Validation loss = 0.06964888423681259
Validation loss = 0.07519460469484329
Validation loss = 0.07319221645593643
Validation loss = 0.06609566509723663
Validation loss = 0.07243546843528748
Validation loss = 0.06299237161874771
Validation loss = 0.06481795758008957
Validation loss = 0.061044588685035706
Validation loss = 0.06653430312871933
Validation loss = 0.06050713732838631
Validation loss = 0.06296833604574203
Validation loss = 0.06878366321325302
Validation loss = 0.0664290115237236
Validation loss = 0.06572440266609192
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 598
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 627
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 581
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 568
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 607
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 555
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -355     |
| Iteration     | 1        |
| MaximumReturn | -249     |
| MinimumReturn | -404     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07555113732814789
Validation loss = 0.06366739422082901
Validation loss = 0.0652979239821434
Validation loss = 0.06343436241149902
Validation loss = 0.0662832260131836
Validation loss = 0.06479880958795547
Validation loss = 0.06596299260854721
Validation loss = 0.062571682035923
Validation loss = 0.06309831142425537
Validation loss = 0.062274862080812454
Validation loss = 0.06456444412469864
Validation loss = 0.06231610104441643
Validation loss = 0.06150871515274048
Validation loss = 0.06317749619483948
Validation loss = 0.06522687524557114
Validation loss = 0.06259896606206894
Validation loss = 0.06576476246118546
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08117152005434036
Validation loss = 0.06831324100494385
Validation loss = 0.06750162690877914
Validation loss = 0.06555940955877304
Validation loss = 0.06378385424613953
Validation loss = 0.07191478461027145
Validation loss = 0.07050022482872009
Validation loss = 0.0653272494673729
Validation loss = 0.06600312143564224
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07791608572006226
Validation loss = 0.07028757035732269
Validation loss = 0.06586235761642456
Validation loss = 0.06706071645021439
Validation loss = 0.06492793560028076
Validation loss = 0.06460528075695038
Validation loss = 0.06924732029438019
Validation loss = 0.07014628499746323
Validation loss = 0.06575587391853333
Validation loss = 0.06734562665224075
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08049873262643814
Validation loss = 0.0638548731803894
Validation loss = 0.07042631506919861
Validation loss = 0.06300415843725204
Validation loss = 0.06979259848594666
Validation loss = 0.06293793767690659
Validation loss = 0.06274688988924026
Validation loss = 0.07875663787126541
Validation loss = 0.061318088322877884
Validation loss = 0.061786338686943054
Validation loss = 0.06252527981996536
Validation loss = 0.06190260127186775
Validation loss = 0.0625993087887764
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0788339152932167
Validation loss = 0.06477391719818115
Validation loss = 0.06861928850412369
Validation loss = 0.06343331187963486
Validation loss = 0.06626477092504501
Validation loss = 0.06268604844808578
Validation loss = 0.06305208057165146
Validation loss = 0.06335142999887466
Validation loss = 0.06281761825084686
Validation loss = 0.06553008407354355
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 474
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 515
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 470
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 493
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 572
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 501
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 109      |
| Iteration     | 2        |
| MaximumReturn | 322      |
| MinimumReturn | -160     |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0673324465751648
Validation loss = 0.05433952808380127
Validation loss = 0.054613277316093445
Validation loss = 0.0561293289065361
Validation loss = 0.05578363686800003
Validation loss = 0.055026739835739136
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.069846510887146
Validation loss = 0.058856554329395294
Validation loss = 0.05600105598568916
Validation loss = 0.05954490974545479
Validation loss = 0.05571945011615753
Validation loss = 0.05540734902024269
Validation loss = 0.059198055416345596
Validation loss = 0.05607718974351883
Validation loss = 0.05562390759587288
Validation loss = 0.05801358073949814
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06261050701141357
Validation loss = 0.06200295686721802
Validation loss = 0.05693483352661133
Validation loss = 0.059682924300432205
Validation loss = 0.05496871471405029
Validation loss = 0.05786895006895065
Validation loss = 0.06220487505197525
Validation loss = 0.06057772785425186
Validation loss = 0.05398348718881607
Validation loss = 0.05630412697792053
Validation loss = 0.05623555928468704
Validation loss = 0.05524434521794319
Validation loss = 0.05448823422193527
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06468003988265991
Validation loss = 0.05376649275422096
Validation loss = 0.055667951703071594
Validation loss = 0.05505196005105972
Validation loss = 0.0556214414536953
Validation loss = 0.05410519987344742
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06762973219156265
Validation loss = 0.05621052533388138
Validation loss = 0.057643305510282516
Validation loss = 0.05852583050727844
Validation loss = 0.055280901491642
Validation loss = 0.055225919932127
Validation loss = 0.05519869178533554
Validation loss = 0.06316271424293518
Validation loss = 0.054056718945503235
Validation loss = 0.053135666996240616
Validation loss = 0.054364949464797974
Validation loss = 0.053872160613536835
Validation loss = 0.05599624663591385
Validation loss = 0.057054225355386734
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 664
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 607
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 609
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 579
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 446
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 609
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 63.6     |
| Iteration     | 3        |
| MaximumReturn | 366      |
| MinimumReturn | -263     |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0877300500869751
Validation loss = 0.07157786190509796
Validation loss = 0.06732803583145142
Validation loss = 0.07214255630970001
Validation loss = 0.075441874563694
Validation loss = 0.06794966757297516
Validation loss = 0.06604798138141632
Validation loss = 0.06490097939968109
Validation loss = 0.067222461104393
Validation loss = 0.06655773520469666
Validation loss = 0.06927099078893661
Validation loss = 0.06849421560764313
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08753418177366257
Validation loss = 0.06954824179410934
Validation loss = 0.07389961183071136
Validation loss = 0.06850731372833252
Validation loss = 0.0700337216258049
Validation loss = 0.06728208065032959
Validation loss = 0.07048820704221725
Validation loss = 0.0681711733341217
Validation loss = 0.06730344146490097
Validation loss = 0.06644512712955475
Validation loss = 0.07104582339525223
Validation loss = 0.06733676791191101
Validation loss = 0.06949488818645477
Validation loss = 0.07082905620336533
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08347101509571075
Validation loss = 0.06921450048685074
Validation loss = 0.06991010159254074
Validation loss = 0.07074899971485138
Validation loss = 0.06554095447063446
Validation loss = 0.06660744547843933
Validation loss = 0.06696822494268417
Validation loss = 0.06654046475887299
Validation loss = 0.06585495173931122
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08568914979696274
Validation loss = 0.0701071172952652
Validation loss = 0.06934799253940582
Validation loss = 0.06582667678594589
Validation loss = 0.06685536354780197
Validation loss = 0.06569425016641617
Validation loss = 0.06601832807064056
Validation loss = 0.0651329979300499
Validation loss = 0.06466611474752426
Validation loss = 0.06685332953929901
Validation loss = 0.06357522308826447
Validation loss = 0.06557782739400864
Validation loss = 0.06625257432460785
Validation loss = 0.06435704231262207
Validation loss = 0.0648651197552681
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08334395289421082
Validation loss = 0.06778211891651154
Validation loss = 0.06701932102441788
Validation loss = 0.06555753201246262
Validation loss = 0.06439370661973953
Validation loss = 0.066123828291893
Validation loss = 0.06699908524751663
Validation loss = 0.06729689240455627
Validation loss = 0.06791120022535324
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 616
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 698
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 657
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 690
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 647
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 530
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -82.7    |
| Iteration     | 4        |
| MaximumReturn | 329      |
| MinimumReturn | -456     |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09554537385702133
Validation loss = 0.08062643557786942
Validation loss = 0.08823812752962112
Validation loss = 0.08321031183004379
Validation loss = 0.0790887400507927
Validation loss = 0.0823846235871315
Validation loss = 0.08101383596658707
Validation loss = 0.08973145484924316
Validation loss = 0.0792260691523552
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08748549222946167
Validation loss = 0.08525703102350235
Validation loss = 0.08882195502519608
Validation loss = 0.08192643523216248
Validation loss = 0.0832407996058464
Validation loss = 0.08902537822723389
Validation loss = 0.08511380106210709
Validation loss = 0.08363407105207443
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08728840947151184
Validation loss = 0.08670929819345474
Validation loss = 0.08759792894124985
Validation loss = 0.08118408173322678
Validation loss = 0.08427512645721436
Validation loss = 0.07796905189752579
Validation loss = 0.08531510829925537
Validation loss = 0.08031103014945984
Validation loss = 0.08040992170572281
Validation loss = 0.08269607275724411
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0839579701423645
Validation loss = 0.08275336027145386
Validation loss = 0.08486974984407425
Validation loss = 0.07946524024009705
Validation loss = 0.08119381219148636
Validation loss = 0.08293090015649796
Validation loss = 0.08364340662956238
Validation loss = 0.08425312489271164
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08749256283044815
Validation loss = 0.08441851288080215
Validation loss = 0.08206186443567276
Validation loss = 0.08791489154100418
Validation loss = 0.08042129129171371
Validation loss = 0.0840868428349495
Validation loss = 0.09368833154439926
Validation loss = 0.07987780123949051
Validation loss = 0.07929884642362595
Validation loss = 0.08251076191663742
Validation loss = 0.08192535489797592
Validation loss = 0.08362460881471634
Validation loss = 0.08106803894042969
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 638
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 667
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 698
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 634
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 641
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 700
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 71.9     |
| Iteration     | 5        |
| MaximumReturn | 253      |
| MinimumReturn | -264     |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0833212360739708
Validation loss = 0.07336033135652542
Validation loss = 0.07283005863428116
Validation loss = 0.07163752615451813
Validation loss = 0.07099265605211258
Validation loss = 0.06865650415420532
Validation loss = 0.0728275328874588
Validation loss = 0.07246369123458862
Validation loss = 0.06965028494596481
Validation loss = 0.07158040255308151
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08148174732923508
Validation loss = 0.07530953735113144
Validation loss = 0.07117285579442978
Validation loss = 0.07168649137020111
Validation loss = 0.07362819463014603
Validation loss = 0.0752071812748909
Validation loss = 0.07235020399093628
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07683714479207993
Validation loss = 0.07048369199037552
Validation loss = 0.07254798710346222
Validation loss = 0.08142697811126709
Validation loss = 0.0730217918753624
Validation loss = 0.07267622649669647
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0728851780295372
Validation loss = 0.07074084877967834
Validation loss = 0.07240153104066849
Validation loss = 0.07259083539247513
Validation loss = 0.0724235400557518
Validation loss = 0.07870147377252579
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07518982887268066
Validation loss = 0.0716995820403099
Validation loss = 0.07254005968570709
Validation loss = 0.06930220872163773
Validation loss = 0.07205630093812943
Validation loss = 0.07215233892202377
Validation loss = 0.07333201169967651
Validation loss = 0.06923411041498184
Validation loss = 0.07370320707559586
Validation loss = 0.07281298190355301
Validation loss = 0.07303918153047562
Validation loss = 0.06971605122089386
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 723
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 722
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 710
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 598
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 728
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 619
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -301     |
| Iteration     | 6        |
| MaximumReturn | -202     |
| MinimumReturn | -434     |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06246478855609894
Validation loss = 0.0555320605635643
Validation loss = 0.05342058837413788
Validation loss = 0.053836431354284286
Validation loss = 0.05252080410718918
Validation loss = 0.055222079157829285
Validation loss = 0.053338609635829926
Validation loss = 0.0548272542655468
Validation loss = 0.05531487986445427
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06317224353551865
Validation loss = 0.05362839251756668
Validation loss = 0.0557674840092659
Validation loss = 0.057880572974681854
Validation loss = 0.055542539805173874
Validation loss = 0.05761204659938812
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06579811871051788
Validation loss = 0.0539960116147995
Validation loss = 0.05512836202979088
Validation loss = 0.05556304007768631
Validation loss = 0.0568544901907444
Validation loss = 0.052580367773771286
Validation loss = 0.051514603197574615
Validation loss = 0.054929833859205246
Validation loss = 0.05555371195077896
Validation loss = 0.05283050239086151
Validation loss = 0.060066014528274536
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.061665646731853485
Validation loss = 0.05620172247290611
Validation loss = 0.055197834968566895
Validation loss = 0.056742116808891296
Validation loss = 0.0560481995344162
Validation loss = 0.055712658911943436
Validation loss = 0.054696377366781235
Validation loss = 0.053266141563653946
Validation loss = 0.05712365731596947
Validation loss = 0.05523452162742615
Validation loss = 0.05598178133368492
Validation loss = 0.05432923138141632
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06447480618953705
Validation loss = 0.05331847071647644
Validation loss = 0.05113093927502632
Validation loss = 0.052669458091259
Validation loss = 0.051889874041080475
Validation loss = 0.05439513549208641
Validation loss = 0.05211026966571808
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 713
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 664
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 685
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 709
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 724
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 726
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -241     |
| Iteration     | 7        |
| MaximumReturn | 276      |
| MinimumReturn | -458     |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05478084087371826
Validation loss = 0.05075472220778465
Validation loss = 0.050195224583148956
Validation loss = 0.0516117624938488
Validation loss = 0.049438878893852234
Validation loss = 0.05250032991170883
Validation loss = 0.05064086616039276
Validation loss = 0.05083024874329567
Validation loss = 0.05222258344292641
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.054234493523836136
Validation loss = 0.053091004490852356
Validation loss = 0.05325160548090935
Validation loss = 0.052551738917827606
Validation loss = 0.051327548921108246
Validation loss = 0.051544155925512314
Validation loss = 0.05148006230592728
Validation loss = 0.05189088359475136
Validation loss = 0.05209171026945114
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05657705292105675
Validation loss = 0.05537154898047447
Validation loss = 0.05558076128363609
Validation loss = 0.05161527916789055
Validation loss = 0.05100564658641815
Validation loss = 0.051423706114292145
Validation loss = 0.05169026553630829
Validation loss = 0.052211299538612366
Validation loss = 0.0498957484960556
Validation loss = 0.052054062485694885
Validation loss = 0.051982633769512177
Validation loss = 0.050780393183231354
Validation loss = 0.0519188828766346
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05642181634902954
Validation loss = 0.05249544978141785
Validation loss = 0.05247665196657181
Validation loss = 0.052571266889572144
Validation loss = 0.054109830409288406
Validation loss = 0.051343999803066254
Validation loss = 0.05288396775722504
Validation loss = 0.050175391137599945
Validation loss = 0.04928762465715408
Validation loss = 0.05106820538640022
Validation loss = 0.050586119294166565
Validation loss = 0.050517767667770386
Validation loss = 0.05060683935880661
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.054458532482385635
Validation loss = 0.05186881124973297
Validation loss = 0.050480667501688004
Validation loss = 0.05166910961270332
Validation loss = 0.053450390696525574
Validation loss = 0.05048258602619171
Validation loss = 0.051414720714092255
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 690
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 689
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 710
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 684
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 683
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 710
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 56.4     |
| Iteration     | 8        |
| MaximumReturn | 629      |
| MinimumReturn | -359     |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.051773060113191605
Validation loss = 0.04852795973420143
Validation loss = 0.04944813996553421
Validation loss = 0.04838915914297104
Validation loss = 0.049299247562885284
Validation loss = 0.0477893128991127
Validation loss = 0.04809299111366272
Validation loss = 0.04804512858390808
Validation loss = 0.046342264860868454
Validation loss = 0.050488196313381195
Validation loss = 0.046580828726291656
Validation loss = 0.0469883568584919
Validation loss = 0.04746788367629051
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05616099759936333
Validation loss = 0.04882066696882248
Validation loss = 0.04763979837298393
Validation loss = 0.050869591534137726
Validation loss = 0.04906175285577774
Validation loss = 0.04794434830546379
Validation loss = 0.04952988773584366
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05177628993988037
Validation loss = 0.047218725085258484
Validation loss = 0.04770780727267265
Validation loss = 0.04651911184191704
Validation loss = 0.05051807314157486
Validation loss = 0.046630702912807465
Validation loss = 0.047636572271585464
Validation loss = 0.0465601310133934
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05140873044729233
Validation loss = 0.047956809401512146
Validation loss = 0.0481051430106163
Validation loss = 0.04701366275548935
Validation loss = 0.047738730907440186
Validation loss = 0.04880545288324356
Validation loss = 0.04595556855201721
Validation loss = 0.04646262153983116
Validation loss = 0.047044768929481506
Validation loss = 0.04576663300395012
Validation loss = 0.04816483333706856
Validation loss = 0.045651521533727646
Validation loss = 0.04522847756743431
Validation loss = 0.04689343273639679
Validation loss = 0.04683679714798927
Validation loss = 0.0460137203335762
Validation loss = 0.04646305739879608
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05209886282682419
Validation loss = 0.04681183025240898
Validation loss = 0.04867906868457794
Validation loss = 0.04519515857100487
Validation loss = 0.046943455934524536
Validation loss = 0.04617423564195633
Validation loss = 0.04754945635795593
Validation loss = 0.048399899154901505
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 707
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 732
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 719
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 706
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 726
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 707
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -295     |
| Iteration     | 9        |
| MaximumReturn | -85.7    |
| MinimumReturn | -389     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.049118317663669586
Validation loss = 0.04521402344107628
Validation loss = 0.044146161526441574
Validation loss = 0.0456443689763546
Validation loss = 0.0444500707089901
Validation loss = 0.044681258499622345
Validation loss = 0.046970952302217484
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.051644157618284225
Validation loss = 0.04626699909567833
Validation loss = 0.04526723921298981
Validation loss = 0.046868953853845596
Validation loss = 0.04554973542690277
Validation loss = 0.04556618630886078
Validation loss = 0.045151736587285995
Validation loss = 0.04503205046057701
Validation loss = 0.04420292004942894
Validation loss = 0.045640669763088226
Validation loss = 0.04445864260196686
Validation loss = 0.04472104460000992
Validation loss = 0.04502476379275322
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05343009531497955
Validation loss = 0.04475019499659538
Validation loss = 0.04442652687430382
Validation loss = 0.04416045919060707
Validation loss = 0.04263296350836754
Validation loss = 0.04407244548201561
Validation loss = 0.04436192661523819
Validation loss = 0.04527251794934273
Validation loss = 0.0446883961558342
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05243959650397301
Validation loss = 0.044879551976919174
Validation loss = 0.0441628135740757
Validation loss = 0.04367046058177948
Validation loss = 0.043299660086631775
Validation loss = 0.044567957520484924
Validation loss = 0.04418884590268135
Validation loss = 0.044082798063755035
Validation loss = 0.0431055873632431
Validation loss = 0.043572474271059036
Validation loss = 0.043391674757003784
Validation loss = 0.04240703955292702
Validation loss = 0.04247713088989258
Validation loss = 0.04275279864668846
Validation loss = 0.04305712506175041
Validation loss = 0.043108467012643814
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.055137719959020615
Validation loss = 0.04432820528745651
Validation loss = 0.046342238783836365
Validation loss = 0.04333408176898956
Validation loss = 0.044770386070013046
Validation loss = 0.044233355671167374
Validation loss = 0.04356798902153969
Validation loss = 0.04362073913216591
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 720
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 682
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 732
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 715
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 730
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 699
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 115      |
| Iteration     | 10       |
| MaximumReturn | 515      |
| MinimumReturn | -253     |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.049423545598983765
Validation loss = 0.04202820733189583
Validation loss = 0.042561400681734085
Validation loss = 0.04366666078567505
Validation loss = 0.0421864278614521
Validation loss = 0.04327138885855675
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.046789318323135376
Validation loss = 0.04388623312115669
Validation loss = 0.04288672283291817
Validation loss = 0.04261119291186333
Validation loss = 0.04299969598650932
Validation loss = 0.04304434731602669
Validation loss = 0.044027406722307205
Validation loss = 0.0424148254096508
Validation loss = 0.04392372444272041
Validation loss = 0.04208226874470711
Validation loss = 0.041737962514162064
Validation loss = 0.043345868587493896
Validation loss = 0.0421474315226078
Validation loss = 0.04605799540877342
Validation loss = 0.04247784614562988
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04799366369843483
Validation loss = 0.043514493852853775
Validation loss = 0.041195087134838104
Validation loss = 0.04330803081393242
Validation loss = 0.040996234863996506
Validation loss = 0.04247793182730675
Validation loss = 0.041405756026506424
Validation loss = 0.04261009767651558
Validation loss = 0.04238362982869148
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04825322702527046
Validation loss = 0.040768761187791824
Validation loss = 0.04086983576416969
Validation loss = 0.0411890372633934
Validation loss = 0.0407496877014637
Validation loss = 0.04139764606952667
Validation loss = 0.04141777381300926
Validation loss = 0.042746108025312424
Validation loss = 0.04111137613654137
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04695435240864754
Validation loss = 0.043953076004981995
Validation loss = 0.04256710410118103
Validation loss = 0.04189586639404297
Validation loss = 0.04355407878756523
Validation loss = 0.041925717145204544
Validation loss = 0.04094018042087555
Validation loss = 0.04323292896151543
Validation loss = 0.042683329433202744
Validation loss = 0.041308123618364334
Validation loss = 0.04183201491832733
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 673
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 685
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 701
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 659
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 680
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 718
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 742      |
| Iteration     | 11       |
| MaximumReturn | 1.18e+03 |
| MinimumReturn | -149     |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.047699764370918274
Validation loss = 0.04023917391896248
Validation loss = 0.04008825495839119
Validation loss = 0.04013647139072418
Validation loss = 0.0409570187330246
Validation loss = 0.041615311056375504
Validation loss = 0.040930017828941345
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04372264817357063
Validation loss = 0.04072977975010872
Validation loss = 0.04049504175782204
Validation loss = 0.04110817238688469
Validation loss = 0.04132247716188431
Validation loss = 0.03956928104162216
Validation loss = 0.03952242434024811
Validation loss = 0.0407760925590992
Validation loss = 0.03888952359557152
Validation loss = 0.04067476838827133
Validation loss = 0.039421480149030685
Validation loss = 0.039924509823322296
Validation loss = 0.03853217512369156
Validation loss = 0.038974978029727936
Validation loss = 0.042374394834041595
Validation loss = 0.03783925622701645
Validation loss = 0.039089836180210114
Validation loss = 0.03938756883144379
Validation loss = 0.03782418742775917
Validation loss = 0.040775034576654434
Validation loss = 0.03717033192515373
Validation loss = 0.03895440325140953
Validation loss = 0.038715608417987823
Validation loss = 0.03855013847351074
Validation loss = 0.0407717190682888
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04310639202594757
Validation loss = 0.03872108459472656
Validation loss = 0.038745105266571045
Validation loss = 0.04075794667005539
Validation loss = 0.04045833647251129
Validation loss = 0.03905349224805832
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04350602626800537
Validation loss = 0.0384477861225605
Validation loss = 0.038813862949609756
Validation loss = 0.03917085751891136
Validation loss = 0.03841188922524452
Validation loss = 0.037727270275354385
Validation loss = 0.03740527853369713
Validation loss = 0.03798294439911842
Validation loss = 0.038341838866472244
Validation loss = 0.03927600011229515
Validation loss = 0.037918124347925186
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04341741278767586
Validation loss = 0.03966949135065079
Validation loss = 0.04011387377977371
Validation loss = 0.04184267297387123
Validation loss = 0.038683731108903885
Validation loss = 0.03986877202987671
Validation loss = 0.039157576858997345
Validation loss = 0.03920930251479149
Validation loss = 0.03861469775438309
Validation loss = 0.0408613421022892
Validation loss = 0.03774276748299599
Validation loss = 0.0390947088599205
Validation loss = 0.03869634121656418
Validation loss = 0.03887457400560379
Validation loss = 0.038368579000234604
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 724
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 711
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 693
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 712
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 652
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 684
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 875      |
| Iteration     | 12       |
| MaximumReturn | 1.6e+03  |
| MinimumReturn | -111     |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0437680259346962
Validation loss = 0.0386536531150341
Validation loss = 0.0378984697163105
Validation loss = 0.03859701752662659
Validation loss = 0.038744669407606125
Validation loss = 0.03754909709095955
Validation loss = 0.037291109561920166
Validation loss = 0.03760277479887009
Validation loss = 0.0373738631606102
Validation loss = 0.03915252536535263
Validation loss = 0.03803060203790665
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0437084436416626
Validation loss = 0.036498066037893295
Validation loss = 0.03587717190384865
Validation loss = 0.037733081728219986
Validation loss = 0.036933813244104385
Validation loss = 0.03624335303902626
Validation loss = 0.04112660512328148
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.042111750692129135
Validation loss = 0.0372462160885334
Validation loss = 0.037671301513910294
Validation loss = 0.03746875375509262
Validation loss = 0.03696554899215698
Validation loss = 0.037150025367736816
Validation loss = 0.040168557316064835
Validation loss = 0.03664148598909378
Validation loss = 0.03738973289728165
Validation loss = 0.03593591973185539
Validation loss = 0.039285771548748016
Validation loss = 0.036295194178819656
Validation loss = 0.03563686087727547
Validation loss = 0.036278486251831055
Validation loss = 0.03811316937208176
Validation loss = 0.037407007068395615
Validation loss = 0.035493381321430206
Validation loss = 0.03603602573275566
Validation loss = 0.034911610186100006
Validation loss = 0.03500651195645332
Validation loss = 0.03534594923257828
Validation loss = 0.03533418849110603
Validation loss = 0.03482631593942642
Validation loss = 0.03581566736102104
Validation loss = 0.035663824528455734
Validation loss = 0.03699252009391785
Validation loss = 0.03518931195139885
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0409160740673542
Validation loss = 0.036635760217905045
Validation loss = 0.03760315105319023
Validation loss = 0.03762892261147499
Validation loss = 0.03694074973464012
Validation loss = 0.03701452538371086
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.042983438819646835
Validation loss = 0.03657836839556694
Validation loss = 0.04029863327741623
Validation loss = 0.037179868668317795
Validation loss = 0.03855079039931297
Validation loss = 0.03696369379758835
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 670
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 711
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 661
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 710
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 720
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 703
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 841      |
| Iteration     | 13       |
| MaximumReturn | 1.98e+03 |
| MinimumReturn | -464     |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.041005782783031464
Validation loss = 0.03673066571354866
Validation loss = 0.03600381314754486
Validation loss = 0.03557264804840088
Validation loss = 0.03641466051340103
Validation loss = 0.03552364557981491
Validation loss = 0.03582463786005974
Validation loss = 0.03517561033368111
Validation loss = 0.03456278517842293
Validation loss = 0.03517207130789757
Validation loss = 0.034762315452098846
Validation loss = 0.034892186522483826
Validation loss = 0.03445403650403023
Validation loss = 0.03367640823125839
Validation loss = 0.03413917496800423
Validation loss = 0.03560000658035278
Validation loss = 0.03304338827729225
Validation loss = 0.03491918370127678
Validation loss = 0.033650435507297516
Validation loss = 0.03426080197095871
Validation loss = 0.03441760316491127
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03850923851132393
Validation loss = 0.03490172699093819
Validation loss = 0.034046489745378494
Validation loss = 0.0350361205637455
Validation loss = 0.034301988780498505
Validation loss = 0.03458103910088539
Validation loss = 0.03592122346162796
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03828778117895126
Validation loss = 0.03426288440823555
Validation loss = 0.032908160239458084
Validation loss = 0.03297398239374161
Validation loss = 0.03373434767127037
Validation loss = 0.032134294509887695
Validation loss = 0.03366335108876228
Validation loss = 0.03390324488282204
Validation loss = 0.033080436289310455
Validation loss = 0.032171349972486496
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04094376787543297
Validation loss = 0.0345073938369751
Validation loss = 0.03453528508543968
Validation loss = 0.0349026657640934
Validation loss = 0.033739615231752396
Validation loss = 0.03460102155804634
Validation loss = 0.03438759222626686
Validation loss = 0.032587505877017975
Validation loss = 0.03516177088022232
Validation loss = 0.033183831721544266
Validation loss = 0.03454889357089996
Validation loss = 0.03354320675134659
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04001578316092491
Validation loss = 0.03455515578389168
Validation loss = 0.03578543663024902
Validation loss = 0.03572269529104233
Validation loss = 0.03494813293218613
Validation loss = 0.03372834250330925
Validation loss = 0.03442361578345299
Validation loss = 0.03523319959640503
Validation loss = 0.034367892891168594
Validation loss = 0.034178491681814194
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 692
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 707
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 706
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 694
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 711
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 696
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 296      |
| Iteration     | 14       |
| MaximumReturn | 1.42e+03 |
| MinimumReturn | -473     |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.037951745092868805
Validation loss = 0.03225123882293701
Validation loss = 0.0314076691865921
Validation loss = 0.032188959419727325
Validation loss = 0.031184930354356766
Validation loss = 0.03203873708844185
Validation loss = 0.03243388235569
Validation loss = 0.03148966282606125
Validation loss = 0.0312679223716259
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03750258684158325
Validation loss = 0.03280811011791229
Validation loss = 0.033538952469825745
Validation loss = 0.036118701100349426
Validation loss = 0.032358888536691666
Validation loss = 0.033581629395484924
Validation loss = 0.03210964426398277
Validation loss = 0.03146614879369736
Validation loss = 0.03168357536196709
Validation loss = 0.03216857463121414
Validation loss = 0.03175683692097664
Validation loss = 0.03198105841875076
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03729704022407532
Validation loss = 0.03096238151192665
Validation loss = 0.03171137720346451
Validation loss = 0.031320586800575256
Validation loss = 0.03183036297559738
Validation loss = 0.03279625624418259
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03765606880187988
Validation loss = 0.03145458549261093
Validation loss = 0.0323583222925663
Validation loss = 0.03174014389514923
Validation loss = 0.03183969855308533
Validation loss = 0.03105667233467102
Validation loss = 0.03195150941610336
Validation loss = 0.0325797013938427
Validation loss = 0.030558455735445023
Validation loss = 0.03109482303261757
Validation loss = 0.03127327561378479
Validation loss = 0.03054507076740265
Validation loss = 0.030751794576644897
Validation loss = 0.0322219654917717
Validation loss = 0.03048369660973549
Validation loss = 0.03500651195645332
Validation loss = 0.02957966923713684
Validation loss = 0.03080419823527336
Validation loss = 0.030872026458382607
Validation loss = 0.02999471127986908
Validation loss = 0.03391971439123154
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03723442181944847
Validation loss = 0.031964026391506195
Validation loss = 0.031941983848810196
Validation loss = 0.03282606601715088
Validation loss = 0.032660070806741714
Validation loss = 0.03114510327577591
Validation loss = 0.031784530729055405
Validation loss = 0.030975759029388428
Validation loss = 0.03319824859499931
Validation loss = 0.03113493323326111
Validation loss = 0.03171016275882721
Validation loss = 0.03259682282805443
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 697
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 714
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 707
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 710
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 728
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 717
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.43e+03 |
| Iteration     | 15       |
| MaximumReturn | 2.17e+03 |
| MinimumReturn | -395     |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.032977160066366196
Validation loss = 0.029235104098916054
Validation loss = 0.02997618541121483
Validation loss = 0.031044021248817444
Validation loss = 0.02968977764248848
Validation loss = 0.030928989872336388
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0369858592748642
Validation loss = 0.030370578169822693
Validation loss = 0.030255727469921112
Validation loss = 0.0326690636575222
Validation loss = 0.0289643295109272
Validation loss = 0.029879117384552956
Validation loss = 0.028451552614569664
Validation loss = 0.031034957617521286
Validation loss = 0.029270116239786148
Validation loss = 0.03227094188332558
Validation loss = 0.028376586735248566
Validation loss = 0.028941476717591286
Validation loss = 0.03196880593895912
Validation loss = 0.028080318123102188
Validation loss = 0.02791135199368
Validation loss = 0.0295570008456707
Validation loss = 0.02850167267024517
Validation loss = 0.02919682301580906
Validation loss = 0.03183161839842796
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03259768337011337
Validation loss = 0.029489124193787575
Validation loss = 0.029530122876167297
Validation loss = 0.029954759404063225
Validation loss = 0.030227387323975563
Validation loss = 0.028855228796601295
Validation loss = 0.03058679960668087
Validation loss = 0.02783345617353916
Validation loss = 0.028920993208885193
Validation loss = 0.028374694287776947
Validation loss = 0.02840454690158367
Validation loss = 0.02977311611175537
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03420623019337654
Validation loss = 0.027920011430978775
Validation loss = 0.0284158643335104
Validation loss = 0.02956763468682766
Validation loss = 0.028401197865605354
Validation loss = 0.028923291712999344
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.034807078540325165
Validation loss = 0.02929573692381382
Validation loss = 0.030000515282154083
Validation loss = 0.029888328164815903
Validation loss = 0.029644398018717766
Validation loss = 0.031781282275915146
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 721
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 707
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 724
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 719
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 720
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 715
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.06e+03 |
| Iteration     | 16       |
| MaximumReturn | 2.25e+03 |
| MinimumReturn | 1.58e+03 |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03102152608335018
Validation loss = 0.028487805277109146
Validation loss = 0.028766384348273277
Validation loss = 0.028243061155080795
Validation loss = 0.02874266356229782
Validation loss = 0.029393086209893227
Validation loss = 0.027905067428946495
Validation loss = 0.028388788923621178
Validation loss = 0.02826545387506485
Validation loss = 0.02938688173890114
Validation loss = 0.027064774185419083
Validation loss = 0.02824433706700802
Validation loss = 0.027797579765319824
Validation loss = 0.027341550216078758
Validation loss = 0.027796119451522827
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.031245728954672813
Validation loss = 0.026751235127449036
Validation loss = 0.02667800709605217
Validation loss = 0.028498075902462006
Validation loss = 0.026148999109864235
Validation loss = 0.03082749992609024
Validation loss = 0.02624179795384407
Validation loss = 0.028617147356271744
Validation loss = 0.02634098008275032
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03179603070020676
Validation loss = 0.027496788650751114
Validation loss = 0.02724708989262581
Validation loss = 0.028794053941965103
Validation loss = 0.02634771727025509
Validation loss = 0.02786601334810257
Validation loss = 0.027839843183755875
Validation loss = 0.026640234515070915
Validation loss = 0.028898533433675766
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03166346251964569
Validation loss = 0.02688436582684517
Validation loss = 0.028408782556653023
Validation loss = 0.026130136102437973
Validation loss = 0.029553094878792763
Validation loss = 0.025490039959549904
Validation loss = 0.02719751000404358
Validation loss = 0.02735757827758789
Validation loss = 0.026937393471598625
Validation loss = 0.02607586793601513
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03314831852912903
Validation loss = 0.027949068695306778
Validation loss = 0.03186063840985298
Validation loss = 0.027704069390892982
Validation loss = 0.02885364182293415
Validation loss = 0.028519637882709503
Validation loss = 0.02964184619486332
Validation loss = 0.027307063341140747
Validation loss = 0.027929382398724556
Validation loss = 0.027019238099455833
Validation loss = 0.027593214064836502
Validation loss = 0.027986841276288033
Validation loss = 0.0273436289280653
Validation loss = 0.027347488328814507
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 766
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 762
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 723
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 754
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 748
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 732
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.98e+03 |
| Iteration     | 17       |
| MaximumReturn | 2.45e+03 |
| MinimumReturn | 533      |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.030190913006663322
Validation loss = 0.025675417855381966
Validation loss = 0.025814717635512352
Validation loss = 0.02588755451142788
Validation loss = 0.025559300556778908
Validation loss = 0.02501935325562954
Validation loss = 0.026380177587270737
Validation loss = 0.02593538723886013
Validation loss = 0.02522871643304825
Validation loss = 0.02581123821437359
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02762959524989128
Validation loss = 0.02546156942844391
Validation loss = 0.025992948561906815
Validation loss = 0.026460042223334312
Validation loss = 0.024827396497130394
Validation loss = 0.026160433888435364
Validation loss = 0.027028275653719902
Validation loss = 0.02495468035340309
Validation loss = 0.026870960369706154
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.030313216149806976
Validation loss = 0.025350887328386307
Validation loss = 0.026327375322580338
Validation loss = 0.025326168164610863
Validation loss = 0.027059748768806458
Validation loss = 0.025263190269470215
Validation loss = 0.025678083300590515
Validation loss = 0.025185300037264824
Validation loss = 0.025481604039669037
Validation loss = 0.027516240254044533
Validation loss = 0.025018464773893356
Validation loss = 0.024734176695346832
Validation loss = 0.02467978000640869
Validation loss = 0.02764863334596157
Validation loss = 0.02392604388296604
Validation loss = 0.024722710251808167
Validation loss = 0.024863069877028465
Validation loss = 0.02587721310555935
Validation loss = 0.024523423984646797
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028155919164419174
Validation loss = 0.024868346750736237
Validation loss = 0.02599955163896084
Validation loss = 0.026049181818962097
Validation loss = 0.02577897161245346
Validation loss = 0.025126710534095764
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02999112568795681
Validation loss = 0.02501598373055458
Validation loss = 0.026807144284248352
Validation loss = 0.026845896616578102
Validation loss = 0.026109836995601654
Validation loss = 0.02592686377465725
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 720
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 765
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 759
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 754
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 749
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 748
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.08e+03 |
| Iteration     | 18       |
| MaximumReturn | 2.77e+03 |
| MinimumReturn | 274      |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.028661128133535385
Validation loss = 0.023933570832014084
Validation loss = 0.02416416071355343
Validation loss = 0.02415343001484871
Validation loss = 0.02382649853825569
Validation loss = 0.025049638003110886
Validation loss = 0.023635368794202805
Validation loss = 0.024751950055360794
Validation loss = 0.024009080603718758
Validation loss = 0.024811850860714912
Validation loss = 0.023018967360258102
Validation loss = 0.02465406060218811
Validation loss = 0.023556310683488846
Validation loss = 0.02356625720858574
Validation loss = 0.023327410221099854
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02796856127679348
Validation loss = 0.024073902517557144
Validation loss = 0.02403656765818596
Validation loss = 0.025431573390960693
Validation loss = 0.024130193516612053
Validation loss = 0.024953190237283707
Validation loss = 0.02430480532348156
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0266462080180645
Validation loss = 0.024943264201283455
Validation loss = 0.023698192089796066
Validation loss = 0.023215968161821365
Validation loss = 0.024380717426538467
Validation loss = 0.022518759593367577
Validation loss = 0.025065327063202858
Validation loss = 0.023390328511595726
Validation loss = 0.02345985360443592
Validation loss = 0.02333845943212509
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027340853586792946
Validation loss = 0.024353185668587685
Validation loss = 0.024931173771619797
Validation loss = 0.023985961452126503
Validation loss = 0.02341661974787712
Validation loss = 0.02378503791987896
Validation loss = 0.025464732199907303
Validation loss = 0.023218348622322083
Validation loss = 0.02350820228457451
Validation loss = 0.024556249380111694
Validation loss = 0.023447513580322266
Validation loss = 0.024839524179697037
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026691367849707603
Validation loss = 0.024758707731962204
Validation loss = 0.02552606165409088
Validation loss = 0.024393174797296524
Validation loss = 0.024771180003881454
Validation loss = 0.025989031419157982
Validation loss = 0.023966608569025993
Validation loss = 0.024671753868460655
Validation loss = 0.02360657788813114
Validation loss = 0.02381961978971958
Validation loss = 0.02545330859720707
Validation loss = 0.023574959486722946
Validation loss = 0.024908097460865974
Validation loss = 0.023728877305984497
Validation loss = 0.023788010701537132
Validation loss = 0.023736681789159775
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 747
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 735
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 735
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 761
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 730
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 744
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.56e+03 |
| Iteration     | 19       |
| MaximumReturn | 2.95e+03 |
| MinimumReturn | 2.32e+03 |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02461334876716137
Validation loss = 0.022596050053834915
Validation loss = 0.022302545607089996
Validation loss = 0.021655213087797165
Validation loss = 0.022793909534811974
Validation loss = 0.022524382919073105
Validation loss = 0.021939592435956
Validation loss = 0.022013068199157715
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024654939770698547
Validation loss = 0.022565510123968124
Validation loss = 0.022818392142653465
Validation loss = 0.02235027775168419
Validation loss = 0.02361871674656868
Validation loss = 0.02182202786207199
Validation loss = 0.023493053391575813
Validation loss = 0.023121461272239685
Validation loss = 0.022849373519420624
Validation loss = 0.021843481808900833
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02498851716518402
Validation loss = 0.021799560636281967
Validation loss = 0.023473169654607773
Validation loss = 0.021331170573830605
Validation loss = 0.022667929530143738
Validation loss = 0.021778322756290436
Validation loss = 0.02246789075434208
Validation loss = 0.021957343444228172
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025301001965999603
Validation loss = 0.022115839645266533
Validation loss = 0.022642221301794052
Validation loss = 0.02197549119591713
Validation loss = 0.023404045030474663
Validation loss = 0.022049585357308388
Validation loss = 0.022777119651436806
Validation loss = 0.021880682557821274
Validation loss = 0.022017037495970726
Validation loss = 0.02242211438715458
Validation loss = 0.021447721868753433
Validation loss = 0.022947970777750015
Validation loss = 0.020881012082099915
Validation loss = 0.023156747221946716
Validation loss = 0.021287407726049423
Validation loss = 0.022265033796429634
Validation loss = 0.02076798304915428
Validation loss = 0.02104562148451805
Validation loss = 0.02185763418674469
Validation loss = 0.02149800956249237
Validation loss = 0.02314373478293419
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024700479581952095
Validation loss = 0.0225264523178339
Validation loss = 0.024897530674934387
Validation loss = 0.022172966971993446
Validation loss = 0.024314355105161667
Validation loss = 0.022403355687856674
Validation loss = 0.02491198480129242
Validation loss = 0.022289883345365524
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 721
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 737
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 732
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 734
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 731
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 723
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.53e+03 |
| Iteration     | 20       |
| MaximumReturn | 2.78e+03 |
| MinimumReturn | 2.12e+03 |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02448931522667408
Validation loss = 0.021365882828831673
Validation loss = 0.02180834300816059
Validation loss = 0.020600026473402977
Validation loss = 0.021213239058852196
Validation loss = 0.021496040746569633
Validation loss = 0.020758500322699547
Validation loss = 0.020521653816103935
Validation loss = 0.022535186260938644
Validation loss = 0.020384686067700386
Validation loss = 0.022561850026249886
Validation loss = 0.019724206998944283
Validation loss = 0.0227514635771513
Validation loss = 0.02006041258573532
Validation loss = 0.020339446142315865
Validation loss = 0.020229866728186607
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023389311507344246
Validation loss = 0.021511469036340714
Validation loss = 0.02284952439367771
Validation loss = 0.021333277225494385
Validation loss = 0.0211642999202013
Validation loss = 0.021873539313673973
Validation loss = 0.023175610229372978
Validation loss = 0.020456789061427116
Validation loss = 0.022236010059714317
Validation loss = 0.020692134276032448
Validation loss = 0.021663742139935493
Validation loss = 0.021137798205018044
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024397848173975945
Validation loss = 0.0208953395485878
Validation loss = 0.02080659568309784
Validation loss = 0.021347181871533394
Validation loss = 0.021486995741724968
Validation loss = 0.02244739979505539
Validation loss = 0.020780064165592194
Validation loss = 0.021894384175539017
Validation loss = 0.02086018957197666
Validation loss = 0.02188188023865223
Validation loss = 0.020651184022426605
Validation loss = 0.020630819723010063
Validation loss = 0.020383473485708237
Validation loss = 0.020059306174516678
Validation loss = 0.020602485164999962
Validation loss = 0.021738648414611816
Validation loss = 0.02011829800903797
Validation loss = 0.021094879135489464
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02300160564482212
Validation loss = 0.020136235281825066
Validation loss = 0.020867284387350082
Validation loss = 0.019785594195127487
Validation loss = 0.02034926600754261
Validation loss = 0.02005409635603428
Validation loss = 0.020341822877526283
Validation loss = 0.021142510697245598
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025106346234679222
Validation loss = 0.022107869386672974
Validation loss = 0.021555224433541298
Validation loss = 0.02234646864235401
Validation loss = 0.021506786346435547
Validation loss = 0.022175051271915436
Validation loss = 0.02068091370165348
Validation loss = 0.02213156409561634
Validation loss = 0.020851925015449524
Validation loss = 0.021521365270018578
Validation loss = 0.02205750346183777
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 718
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 704
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 730
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 712
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 733
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 726
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.52e+03 |
| Iteration     | 21       |
| MaximumReturn | 2.79e+03 |
| MinimumReturn | 1.85e+03 |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02176768332719803
Validation loss = 0.01960139162838459
Validation loss = 0.020718172192573547
Validation loss = 0.018763327971100807
Validation loss = 0.020878231152892113
Validation loss = 0.019443418830633163
Validation loss = 0.01911323145031929
Validation loss = 0.019854767248034477
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021919023245573044
Validation loss = 0.02026928961277008
Validation loss = 0.020919635891914368
Validation loss = 0.02077963389456272
Validation loss = 0.0195047315210104
Validation loss = 0.020582087337970734
Validation loss = 0.019505031406879425
Validation loss = 0.021229948848485947
Validation loss = 0.018846962600946426
Validation loss = 0.021282043308019638
Validation loss = 0.019167356193065643
Validation loss = 0.020054161548614502
Validation loss = 0.01961427368223667
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02315589413046837
Validation loss = 0.019031845033168793
Validation loss = 0.019327029585838318
Validation loss = 0.019694766029715538
Validation loss = 0.01976095885038376
Validation loss = 0.019759180024266243
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022404968738555908
Validation loss = 0.019493697211146355
Validation loss = 0.020605728030204773
Validation loss = 0.019421422854065895
Validation loss = 0.018572263419628143
Validation loss = 0.02147006429731846
Validation loss = 0.01859019324183464
Validation loss = 0.019591204822063446
Validation loss = 0.018842341378331184
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021868830546736717
Validation loss = 0.020177477970719337
Validation loss = 0.021783968433737755
Validation loss = 0.020171433687210083
Validation loss = 0.020891403779387474
Validation loss = 0.02030053175985813
Validation loss = 0.020162928849458694
Validation loss = 0.020089315250515938
Validation loss = 0.02111976407468319
Validation loss = 0.019203688949346542
Validation loss = 0.020168902352452278
Validation loss = 0.019294455647468567
Validation loss = 0.02119526080787182
Validation loss = 0.019426552578806877
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 699
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 703
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 716
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 719
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 717
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 711
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.75e+03 |
| Iteration     | 22       |
| MaximumReturn | 3.03e+03 |
| MinimumReturn | 2.51e+03 |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020171429961919785
Validation loss = 0.0187231358140707
Validation loss = 0.01967298984527588
Validation loss = 0.019298920407891273
Validation loss = 0.01823895238339901
Validation loss = 0.018162986263632774
Validation loss = 0.0190487802028656
Validation loss = 0.01875768043100834
Validation loss = 0.01809326745569706
Validation loss = 0.018778173252940178
Validation loss = 0.019349584355950356
Validation loss = 0.018125107511878014
Validation loss = 0.0203968808054924
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020150544121861458
Validation loss = 0.018683644011616707
Validation loss = 0.021591901779174805
Validation loss = 0.0182933546602726
Validation loss = 0.020322244614362717
Validation loss = 0.019015930593013763
Validation loss = 0.019122805446386337
Validation loss = 0.018515881150960922
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019669784232974052
Validation loss = 0.01870661787688732
Validation loss = 0.01952993869781494
Validation loss = 0.018340017646551132
Validation loss = 0.01829555258154869
Validation loss = 0.01933460123836994
Validation loss = 0.018231255933642387
Validation loss = 0.018512185662984848
Validation loss = 0.01854853704571724
Validation loss = 0.017676733434200287
Validation loss = 0.018128519877791405
Validation loss = 0.019359448924660683
Validation loss = 0.018126772716641426
Validation loss = 0.018027598038315773
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019981209188699722
Validation loss = 0.018564170226454735
Validation loss = 0.019246576353907585
Validation loss = 0.01807425543665886
Validation loss = 0.020607799291610718
Validation loss = 0.01853497512638569
Validation loss = 0.020366325974464417
Validation loss = 0.0181815754622221
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020692715421319008
Validation loss = 0.018839864060282707
Validation loss = 0.01995726116001606
Validation loss = 0.01870044693350792
Validation loss = 0.019589880481362343
Validation loss = 0.02005513198673725
Validation loss = 0.01934286393225193
Validation loss = 0.018964305520057678
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 720
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 713
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 700
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 706
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 700
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 712
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.58e+03 |
| Iteration     | 23       |
| MaximumReturn | 3.21e+03 |
| MinimumReturn | 75.2     |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019595885649323463
Validation loss = 0.01791342906653881
Validation loss = 0.017913561314344406
Validation loss = 0.017893707379698753
Validation loss = 0.017423994839191437
Validation loss = 0.018353022634983063
Validation loss = 0.017881520092487335
Validation loss = 0.01775282993912697
Validation loss = 0.017738061025738716
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019482266157865524
Validation loss = 0.018963197246193886
Validation loss = 0.018389742821455002
Validation loss = 0.019319094717502594
Validation loss = 0.017640629783272743
Validation loss = 0.018552804365754128
Validation loss = 0.01901950314640999
Validation loss = 0.018648721277713776
Validation loss = 0.01783532090485096
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01961815357208252
Validation loss = 0.01830763928592205
Validation loss = 0.018417395651340485
Validation loss = 0.017731813713908195
Validation loss = 0.017418034374713898
Validation loss = 0.01797380857169628
Validation loss = 0.017129184678196907
Validation loss = 0.01758475787937641
Validation loss = 0.01723417267203331
Validation loss = 0.01875346153974533
Validation loss = 0.017309900373220444
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019730867817997932
Validation loss = 0.018493758514523506
Validation loss = 0.018694669008255005
Validation loss = 0.0180149395018816
Validation loss = 0.019034504890441895
Validation loss = 0.01743301935493946
Validation loss = 0.01797822117805481
Validation loss = 0.017424046993255615
Validation loss = 0.01813623122870922
Validation loss = 0.017203539609909058
Validation loss = 0.018585702404379845
Validation loss = 0.017119307070970535
Validation loss = 0.01720386929810047
Validation loss = 0.017270248383283615
Validation loss = 0.017330745235085487
Validation loss = 0.017803117632865906
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019864119589328766
Validation loss = 0.018846331164240837
Validation loss = 0.019050128757953644
Validation loss = 0.019368479028344154
Validation loss = 0.018935976549983025
Validation loss = 0.01827126368880272
Validation loss = 0.01843111217021942
Validation loss = 0.01881074160337448
Validation loss = 0.01829463243484497
Validation loss = 0.017995653674006462
Validation loss = 0.01781216450035572
Validation loss = 0.017666133120656013
Validation loss = 0.01822350174188614
Validation loss = 0.01793598011136055
Validation loss = 0.018385443836450577
Validation loss = 0.019455796107649803
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 730
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 733
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 715
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 734
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 728
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 721
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.18e+03 |
| Iteration     | 24       |
| MaximumReturn | 3.4e+03  |
| MinimumReturn | 2.83e+03 |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018666371703147888
Validation loss = 0.016652725636959076
Validation loss = 0.016783926635980606
Validation loss = 0.016769465059041977
Validation loss = 0.01833127625286579
Validation loss = 0.016653867438435555
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018803320825099945
Validation loss = 0.01676873490214348
Validation loss = 0.017624719068408012
Validation loss = 0.017597483471035957
Validation loss = 0.016896149143576622
Validation loss = 0.01818574219942093
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018886368721723557
Validation loss = 0.016733961179852486
Validation loss = 0.01761390082538128
Validation loss = 0.01638445071876049
Validation loss = 0.016601772978901863
Validation loss = 0.01677720993757248
Validation loss = 0.01622922532260418
Validation loss = 0.016902700066566467
Validation loss = 0.016232704743742943
Validation loss = 0.01709563098847866
Validation loss = 0.01663600280880928
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018403343856334686
Validation loss = 0.01664520613849163
Validation loss = 0.01855326071381569
Validation loss = 0.016453959047794342
Validation loss = 0.01789380982518196
Validation loss = 0.016063978895545006
Validation loss = 0.017364542931318283
Validation loss = 0.016560424119234085
Validation loss = 0.01727483421564102
Validation loss = 0.01713346317410469
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019257184118032455
Validation loss = 0.017223238945007324
Validation loss = 0.017860932275652885
Validation loss = 0.017786137759685516
Validation loss = 0.016753507778048515
Validation loss = 0.017176927998661995
Validation loss = 0.01783691719174385
Validation loss = 0.017222557216882706
Validation loss = 0.01717938296496868
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 718
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 752
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 717
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 735
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 740
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 733
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.69e+03 |
| Iteration     | 25       |
| MaximumReturn | 3.41e+03 |
| MinimumReturn | 1.08e+03 |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019802937284111977
Validation loss = 0.016099167987704277
Validation loss = 0.016670916229486465
Validation loss = 0.01609473116695881
Validation loss = 0.016741935163736343
Validation loss = 0.016465721651911736
Validation loss = 0.016330046579241753
Validation loss = 0.016475774347782135
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018299520015716553
Validation loss = 0.016588885337114334
Validation loss = 0.016989251598715782
Validation loss = 0.017169032245874405
Validation loss = 0.01665951870381832
Validation loss = 0.016683531925082207
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01717490330338478
Validation loss = 0.015832683071494102
Validation loss = 0.016188576817512512
Validation loss = 0.0158302690833807
Validation loss = 0.015835335478186607
Validation loss = 0.016840923577547073
Validation loss = 0.0159920584410429
Validation loss = 0.01667015440762043
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018546001985669136
Validation loss = 0.0162657480686903
Validation loss = 0.01679631695151329
Validation loss = 0.016557196155190468
Validation loss = 0.015881923958659172
Validation loss = 0.017430756241083145
Validation loss = 0.016009287908673286
Validation loss = 0.016314445063471794
Validation loss = 0.01600264571607113
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01839054934680462
Validation loss = 0.016297075897455215
Validation loss = 0.01648087613284588
Validation loss = 0.01659512333571911
Validation loss = 0.016958260908722878
Validation loss = 0.016582302749156952
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 742
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 742
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 753
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 741
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 752
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 728
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.12e+03 |
| Iteration     | 26       |
| MaximumReturn | 3.61e+03 |
| MinimumReturn | 2.37e+03 |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016687527298927307
Validation loss = 0.016156023368239403
Validation loss = 0.01562579534947872
Validation loss = 0.015369764529168606
Validation loss = 0.016156097874045372
Validation loss = 0.015860246494412422
Validation loss = 0.015420460142195225
Validation loss = 0.015172609128057957
Validation loss = 0.015199853107333183
Validation loss = 0.015411055646836758
Validation loss = 0.015932241454720497
Validation loss = 0.014885233715176582
Validation loss = 0.01588393934071064
Validation loss = 0.01544541772454977
Validation loss = 0.015496588312089443
Validation loss = 0.015500442124903202
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017095765098929405
Validation loss = 0.015939654782414436
Validation loss = 0.015424811281263828
Validation loss = 0.016048679128289223
Validation loss = 0.015864020213484764
Validation loss = 0.016497263684868813
Validation loss = 0.015772497281432152
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015995075926184654
Validation loss = 0.01610463485121727
Validation loss = 0.01560962200164795
Validation loss = 0.016355423256754875
Validation loss = 0.01501383725553751
Validation loss = 0.016111163422465324
Validation loss = 0.015341795049607754
Validation loss = 0.015473275445401669
Validation loss = 0.015066646970808506
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01680794358253479
Validation loss = 0.01551640685647726
Validation loss = 0.017077265307307243
Validation loss = 0.015394228510558605
Validation loss = 0.01548784039914608
Validation loss = 0.01545962505042553
Validation loss = 0.016074759885668755
Validation loss = 0.015252754092216492
Validation loss = 0.015458817593753338
Validation loss = 0.015123819932341576
Validation loss = 0.015518717467784882
Validation loss = 0.015783796086907387
Validation loss = 0.015207527205348015
Validation loss = 0.015566849149763584
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017505789175629616
Validation loss = 0.01577269658446312
Validation loss = 0.015913818031549454
Validation loss = 0.015633566305041313
Validation loss = 0.016829799860715866
Validation loss = 0.015640605241060257
Validation loss = 0.017331354320049286
Validation loss = 0.016510142013430595
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 752
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 746
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 756
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 744
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 729
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 752
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.28e+03 |
| Iteration     | 27       |
| MaximumReturn | 3.63e+03 |
| MinimumReturn | 2.89e+03 |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015835942700505257
Validation loss = 0.014814755879342556
Validation loss = 0.015637651085853577
Validation loss = 0.014741592109203339
Validation loss = 0.01556696929037571
Validation loss = 0.014491342939436436
Validation loss = 0.01586407981812954
Validation loss = 0.014716451987624168
Validation loss = 0.015118999406695366
Validation loss = 0.015469564124941826
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016300227493047714
Validation loss = 0.01520485244691372
Validation loss = 0.015644337981939316
Validation loss = 0.015454331412911415
Validation loss = 0.015091104432940483
Validation loss = 0.01556925754994154
Validation loss = 0.014851225540041924
Validation loss = 0.01555477362126112
Validation loss = 0.01513227540999651
Validation loss = 0.01506673451513052
Validation loss = 0.015324887819588184
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016513289883732796
Validation loss = 0.0146749597042799
Validation loss = 0.015088902786374092
Validation loss = 0.014808573760092258
Validation loss = 0.015151424333453178
Validation loss = 0.01443211268633604
Validation loss = 0.015934303402900696
Validation loss = 0.014760258607566357
Validation loss = 0.015469101257622242
Validation loss = 0.014988388866186142
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015611084178090096
Validation loss = 0.014464939013123512
Validation loss = 0.015530848875641823
Validation loss = 0.015232576057314873
Validation loss = 0.015030773356556892
Validation loss = 0.014859422110021114
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015721455216407776
Validation loss = 0.015029517002403736
Validation loss = 0.015750210732221603
Validation loss = 0.01555085089057684
Validation loss = 0.01569373905658722
Validation loss = 0.015065428800880909
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 736
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 745
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 732
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 740
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 745
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 718
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.27e+03 |
| Iteration     | 28       |
| MaximumReturn | 3.78e+03 |
| MinimumReturn | 2.97e+03 |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015056286938488483
Validation loss = 0.014675821177661419
Validation loss = 0.013990545645356178
Validation loss = 0.01410036999732256
Validation loss = 0.014545973390340805
Validation loss = 0.014625745825469494
Validation loss = 0.014862123876810074
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015745388343930244
Validation loss = 0.014496362768113613
Validation loss = 0.015409986488521099
Validation loss = 0.014841863885521889
Validation loss = 0.014527496881783009
Validation loss = 0.01475356612354517
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014676978811621666
Validation loss = 0.015332875773310661
Validation loss = 0.01435751561075449
Validation loss = 0.01440673228353262
Validation loss = 0.01456749439239502
Validation loss = 0.014012093655765057
Validation loss = 0.014841889031231403
Validation loss = 0.014320921152830124
Validation loss = 0.015393772162497044
Validation loss = 0.015130946412682533
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01444657426327467
Validation loss = 0.015222547575831413
Validation loss = 0.014095340855419636
Validation loss = 0.015010228380560875
Validation loss = 0.013779298402369022
Validation loss = 0.01437345240265131
Validation loss = 0.01428312063217163
Validation loss = 0.01440082024782896
Validation loss = 0.014178969897329807
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015574062243103981
Validation loss = 0.01470479927957058
Validation loss = 0.0155098345130682
Validation loss = 0.01442357711493969
Validation loss = 0.015150072984397411
Validation loss = 0.01422017253935337
Validation loss = 0.015966275706887245
Validation loss = 0.01458144374191761
Validation loss = 0.01569940522313118
Validation loss = 0.014454437419772148
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 752
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 738
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 749
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 747
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 767
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 735
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.41e+03 |
| Iteration     | 29       |
| MaximumReturn | 3.71e+03 |
| MinimumReturn | 3.17e+03 |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01503235474228859
Validation loss = 0.013655380345880985
Validation loss = 0.01457762997597456
Validation loss = 0.013962389901280403
Validation loss = 0.013777713291347027
Validation loss = 0.01360713317990303
Validation loss = 0.013837791047990322
Validation loss = 0.013966680504381657
Validation loss = 0.013824833557009697
Validation loss = 0.013900564052164555
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014259519055485725
Validation loss = 0.014074421487748623
Validation loss = 0.013913432136178017
Validation loss = 0.014005090110003948
Validation loss = 0.01453468482941389
Validation loss = 0.014058818109333515
Validation loss = 0.014174305833876133
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014102092944085598
Validation loss = 0.013702205382287502
Validation loss = 0.013852058909833431
Validation loss = 0.013817529194056988
Validation loss = 0.015491441823542118
Validation loss = 0.01355093251913786
Validation loss = 0.014032621867954731
Validation loss = 0.014681876637041569
Validation loss = 0.013867760077118874
Validation loss = 0.013775049708783627
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014433794654905796
Validation loss = 0.014081208035349846
Validation loss = 0.014031732454895973
Validation loss = 0.013913219794631004
Validation loss = 0.014505895785987377
Validation loss = 0.01362222246825695
Validation loss = 0.014297615736722946
Validation loss = 0.013739731162786484
Validation loss = 0.014094916172325611
Validation loss = 0.014431522227823734
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01475345529615879
Validation loss = 0.014595230109989643
Validation loss = 0.014454422518610954
Validation loss = 0.014463401399552822
Validation loss = 0.013980187475681305
Validation loss = 0.014717982150614262
Validation loss = 0.01386529766023159
Validation loss = 0.014473593793809414
Validation loss = 0.014101165346801281
Validation loss = 0.014287699945271015
Validation loss = 0.0143996961414814
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 745
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 743
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 737
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 748
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 752
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 750
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.41e+03 |
| Iteration     | 30       |
| MaximumReturn | 3.68e+03 |
| MinimumReturn | 3.17e+03 |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014315860345959663
Validation loss = 0.013286061584949493
Validation loss = 0.014062117785215378
Validation loss = 0.013274000957608223
Validation loss = 0.013916391879320145
Validation loss = 0.013493159785866737
Validation loss = 0.013662598095834255
Validation loss = 0.013395196758210659
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014695189893245697
Validation loss = 0.013887117616832256
Validation loss = 0.013402769342064857
Validation loss = 0.013752211816608906
Validation loss = 0.013681094162166119
Validation loss = 0.01367559377104044
Validation loss = 0.01343049667775631
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014334848150610924
Validation loss = 0.013396581634879112
Validation loss = 0.013410275802016258
Validation loss = 0.013686064630746841
Validation loss = 0.01396113634109497
Validation loss = 0.013369094580411911
Validation loss = 0.013055609539151192
Validation loss = 0.013103103265166283
Validation loss = 0.013078639283776283
Validation loss = 0.013171214610338211
Validation loss = 0.014108714647591114
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014158830046653748
Validation loss = 0.013392629101872444
Validation loss = 0.01461631990969181
Validation loss = 0.013099171221256256
Validation loss = 0.01337368879467249
Validation loss = 0.013332933187484741
Validation loss = 0.013695980422198772
Validation loss = 0.014241324737668037
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0140493493527174
Validation loss = 0.013718502596020699
Validation loss = 0.014059529639780521
Validation loss = 0.013805625028908253
Validation loss = 0.013541659340262413
Validation loss = 0.013707305304706097
Validation loss = 0.013972749933600426
Validation loss = 0.014041338115930557
Validation loss = 0.013100143522024155
Validation loss = 0.01376652717590332
Validation loss = 0.014160478487610817
Validation loss = 0.013238998129963875
Validation loss = 0.014181997627019882
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 733
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 731
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 739
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 737
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 739
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 730
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.57e+03 |
| Iteration     | 31       |
| MaximumReturn | 3.84e+03 |
| MinimumReturn | 3.31e+03 |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013235841877758503
Validation loss = 0.013418125919997692
Validation loss = 0.01319106575101614
Validation loss = 0.013093587942421436
Validation loss = 0.013098081573843956
Validation loss = 0.013992545194923878
Validation loss = 0.013235462829470634
Validation loss = 0.013022135011851788
Validation loss = 0.013284735381603241
Validation loss = 0.01310735009610653
Validation loss = 0.013161791488528252
Validation loss = 0.012778746895492077
Validation loss = 0.013180669397115707
Validation loss = 0.012728212401270866
Validation loss = 0.01341911032795906
Validation loss = 0.01306317187845707
Validation loss = 0.013502782210707664
Validation loss = 0.01308500673621893
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01437668688595295
Validation loss = 0.013041201047599316
Validation loss = 0.01301814615726471
Validation loss = 0.013875902630388737
Validation loss = 0.013408265076577663
Validation loss = 0.012968980707228184
Validation loss = 0.013302055187523365
Validation loss = 0.013000814244151115
Validation loss = 0.013177119195461273
Validation loss = 0.0134739363566041
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014230800792574883
Validation loss = 0.013140775263309479
Validation loss = 0.012930072844028473
Validation loss = 0.012984402477741241
Validation loss = 0.013081349432468414
Validation loss = 0.013000384904444218
Validation loss = 0.012694849632680416
Validation loss = 0.013326608575880527
Validation loss = 0.012982542626559734
Validation loss = 0.013078468851745129
Validation loss = 0.013172045350074768
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01406767126172781
Validation loss = 0.012860393151640892
Validation loss = 0.013528142124414444
Validation loss = 0.012781109660863876
Validation loss = 0.012976349331438541
Validation loss = 0.013089514337480068
Validation loss = 0.012689675204455853
Validation loss = 0.014399563893675804
Validation loss = 0.012595443986356258
Validation loss = 0.012714765965938568
Validation loss = 0.013099648989737034
Validation loss = 0.012662233784794807
Validation loss = 0.012900386936962605
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013990318402647972
Validation loss = 0.013382676057517529
Validation loss = 0.013848807662725449
Validation loss = 0.013211283832788467
Validation loss = 0.013337391428649426
Validation loss = 0.013832330703735352
Validation loss = 0.012660973705351353
Validation loss = 0.013244112022221088
Validation loss = 0.013204040937125683
Validation loss = 0.013721944764256477
Validation loss = 0.01298450492322445
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 733
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 718
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 721
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 724
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 726
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 722
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.58e+03 |
| Iteration     | 32       |
| MaximumReturn | 3.76e+03 |
| MinimumReturn | 3.46e+03 |
| TotalSamples  | 136000   |
----------------------------
