Logging to experiments/invertedPendulum/nov1/IA01_w350e3_seed2431
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7334864139556885
Validation loss = 0.4514870047569275
Validation loss = 0.39958468079566956
Validation loss = 0.3203929662704468
Validation loss = 0.3143361508846283
Validation loss = 0.28780192136764526
Validation loss = 0.2756994366645813
Validation loss = 0.2546445429325104
Validation loss = 0.24689653515815735
Validation loss = 0.2292143851518631
Validation loss = 0.2234257012605667
Validation loss = 0.2113441377878189
Validation loss = 0.19836874306201935
Validation loss = 0.21060381829738617
Validation loss = 0.19123125076293945
Validation loss = 0.17820769548416138
Validation loss = 0.17795602977275848
Validation loss = 0.16518208384513855
Validation loss = 0.1623767763376236
Validation loss = 0.16827598214149475
Validation loss = 0.16088147461414337
Validation loss = 0.1642947643995285
Validation loss = 0.15329253673553467
Validation loss = 0.1473521590232849
Validation loss = 0.12856920063495636
Validation loss = 0.13714735209941864
Validation loss = 0.1280284970998764
Validation loss = 0.11940323561429977
Validation loss = 0.11279584467411041
Validation loss = 0.11917416751384735
Validation loss = 0.11991497129201889
Validation loss = 0.1218905821442604
Validation loss = 0.11313705891370773
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7503184676170349
Validation loss = 0.38404881954193115
Validation loss = 0.3592195212841034
Validation loss = 0.3156532645225525
Validation loss = 0.30158910155296326
Validation loss = 0.2825673520565033
Validation loss = 0.255603551864624
Validation loss = 0.24650132656097412
Validation loss = 0.23572076857089996
Validation loss = 0.2214612364768982
Validation loss = 0.2090456485748291
Validation loss = 0.21134956181049347
Validation loss = 0.20549342036247253
Validation loss = 0.19109514355659485
Validation loss = 0.19022291898727417
Validation loss = 0.18279950320720673
Validation loss = 0.16634386777877808
Validation loss = 0.16887158155441284
Validation loss = 0.16545875370502472
Validation loss = 0.15022963285446167
Validation loss = 0.14169955253601074
Validation loss = 0.16489601135253906
Validation loss = 0.1407911479473114
Validation loss = 0.1406143456697464
Validation loss = 0.1352449506521225
Validation loss = 0.12237070500850677
Validation loss = 0.11411505192518234
Validation loss = 0.10869176685810089
Validation loss = 0.11927447468042374
Validation loss = 0.10946362465620041
Validation loss = 0.09603840112686157
Validation loss = 0.10814832150936127
Validation loss = 0.13218678534030914
Validation loss = 0.10768292844295502
Validation loss = 0.1141211986541748
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7538213729858398
Validation loss = 0.37205588817596436
Validation loss = 0.33616378903388977
Validation loss = 0.30116212368011475
Validation loss = 0.28821441531181335
Validation loss = 0.26616567373275757
Validation loss = 0.24426119029521942
Validation loss = 0.2333117574453354
Validation loss = 0.22913827002048492
Validation loss = 0.24043482542037964
Validation loss = 0.21362365782260895
Validation loss = 0.20503591001033783
Validation loss = 0.19349049031734467
Validation loss = 0.18968415260314941
Validation loss = 0.1946699172258377
Validation loss = 0.1778937578201294
Validation loss = 0.173984095454216
Validation loss = 0.17902275919914246
Validation loss = 0.1533777266740799
Validation loss = 0.14216065406799316
Validation loss = 0.13182809948921204
Validation loss = 0.144949808716774
Validation loss = 0.13332942128181458
Validation loss = 0.11818619817495346
Validation loss = 0.11473219841718674
Validation loss = 0.1157500296831131
Validation loss = 0.13237246870994568
Validation loss = 0.12873195111751556
Validation loss = 0.10556681454181671
Validation loss = 0.10421639680862427
Validation loss = 0.12171083688735962
Validation loss = 0.13144028186798096
Validation loss = 0.10814747959375381
Validation loss = 0.09623026102781296
Validation loss = 0.09110735356807709
Validation loss = 0.08598625659942627
Validation loss = 0.1225442960858345
Validation loss = 0.09905336052179337
Validation loss = 0.09174025803804398
Validation loss = 0.09092851728200912
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7493467330932617
Validation loss = 0.40642300248146057
Validation loss = 0.3425988256931305
Validation loss = 0.31311550736427307
Validation loss = 0.2968774139881134
Validation loss = 0.26256510615348816
Validation loss = 0.24822847545146942
Validation loss = 0.23262415826320648
Validation loss = 0.22060224413871765
Validation loss = 0.22887779772281647
Validation loss = 0.23310939967632294
Validation loss = 0.21653276681900024
Validation loss = 0.20078082382678986
Validation loss = 0.20441029965877533
Validation loss = 0.18851600587368011
Validation loss = 0.18989403545856476
Validation loss = 0.16398902237415314
Validation loss = 0.16794471442699432
Validation loss = 0.14997221529483795
Validation loss = 0.16404566168785095
Validation loss = 0.13844354450702667
Validation loss = 0.1363213062286377
Validation loss = 0.12859606742858887
Validation loss = 0.138399139046669
Validation loss = 0.12293726205825806
Validation loss = 0.11561133712530136
Validation loss = 0.12352089583873749
Validation loss = 0.11756201088428497
Validation loss = 0.10191971063613892
Validation loss = 0.11254147440195084
Validation loss = 0.10904752463102341
Validation loss = 0.09531811624765396
Validation loss = 0.10366535931825638
Validation loss = 0.10395993292331696
Validation loss = 0.08997328579425812
Validation loss = 0.09073103964328766
Validation loss = 0.09156107902526855
Validation loss = 0.10426085442304611
Validation loss = 0.12252024561166763
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7481126189231873
Validation loss = 0.3811863958835602
Validation loss = 0.33639928698539734
Validation loss = 0.3041114807128906
Validation loss = 0.3034340739250183
Validation loss = 0.266189843416214
Validation loss = 0.2514794170856476
Validation loss = 0.24403555691242218
Validation loss = 0.22528532147407532
Validation loss = 0.2268930822610855
Validation loss = 0.2120961844921112
Validation loss = 0.20850178599357605
Validation loss = 0.19221168756484985
Validation loss = 0.19151324033737183
Validation loss = 0.1769101619720459
Validation loss = 0.17451901733875275
Validation loss = 0.18063683807849884
Validation loss = 0.1719772070646286
Validation loss = 0.18261009454727173
Validation loss = 0.1478937864303589
Validation loss = 0.1483929604291916
Validation loss = 0.14724117517471313
Validation loss = 0.13438749313354492
Validation loss = 0.1261405646800995
Validation loss = 0.13479532301425934
Validation loss = 0.12775495648384094
Validation loss = 0.11872535943984985
Validation loss = 0.12258245795965195
Validation loss = 0.10655087977647781
Validation loss = 0.10593977570533752
Validation loss = 0.09433440119028091
Validation loss = 0.10553126782178879
Validation loss = 0.10094953328371048
Validation loss = 0.10083978623151779
Validation loss = 0.11174577474594116
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0484  |
| Iteration     | 0        |
| MaximumReturn | -0.0232  |
| MinimumReturn | -0.0778  |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2447035312652588
Validation loss = 0.12879003584384918
Validation loss = 0.10906732082366943
Validation loss = 0.09943101555109024
Validation loss = 0.09031753242015839
Validation loss = 0.08707629144191742
Validation loss = 0.08564316481351852
Validation loss = 0.07784116268157959
Validation loss = 0.07084214687347412
Validation loss = 0.0665789470076561
Validation loss = 0.06974612176418304
Validation loss = 0.05937562137842178
Validation loss = 0.05480681359767914
Validation loss = 0.05059570074081421
Validation loss = 0.05039850249886513
Validation loss = 0.05740126594901085
Validation loss = 0.04631182178854942
Validation loss = 0.04746737703680992
Validation loss = 0.047014135867357254
Validation loss = 0.049298983067274094
Validation loss = 0.04170329123735428
Validation loss = 0.04703710973262787
Validation loss = 0.04397083818912506
Validation loss = 0.045951902866363525
Validation loss = 0.04004508629441261
Validation loss = 0.03619806468486786
Validation loss = 0.03830885514616966
Validation loss = 0.037193942815065384
Validation loss = 0.03877183422446251
Validation loss = 0.04067738354206085
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2129894196987152
Validation loss = 0.12417390942573547
Validation loss = 0.10097061097621918
Validation loss = 0.09008485078811646
Validation loss = 0.08529391139745712
Validation loss = 0.07228833436965942
Validation loss = 0.0669463649392128
Validation loss = 0.061541084200143814
Validation loss = 0.05489566549658775
Validation loss = 0.0637906864285469
Validation loss = 0.05209066718816757
Validation loss = 0.04690760001540184
Validation loss = 0.0481659360229969
Validation loss = 0.048076413571834564
Validation loss = 0.04452122375369072
Validation loss = 0.046353332698345184
Validation loss = 0.05280658230185509
Validation loss = 0.05457209050655365
Validation loss = 0.05592471733689308
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.32100987434387207
Validation loss = 0.14221292734146118
Validation loss = 0.1206078976392746
Validation loss = 0.10538738965988159
Validation loss = 0.09559811651706696
Validation loss = 0.09213355928659439
Validation loss = 0.08784971386194229
Validation loss = 0.08566165715456009
Validation loss = 0.07427055388689041
Validation loss = 0.06581597775220871
Validation loss = 0.06659620255231857
Validation loss = 0.060243964195251465
Validation loss = 0.0536658950150013
Validation loss = 0.05636057257652283
Validation loss = 0.052134882658720016
Validation loss = 0.049656715244054794
Validation loss = 0.046900540590286255
Validation loss = 0.04613587260246277
Validation loss = 0.039256397634744644
Validation loss = 0.049026865512132645
Validation loss = 0.043521247804164886
Validation loss = 0.03985140472650528
Validation loss = 0.04268322139978409
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2207326591014862
Validation loss = 0.12733571231365204
Validation loss = 0.10843479633331299
Validation loss = 0.09546734392642975
Validation loss = 0.08507684618234634
Validation loss = 0.07506782561540604
Validation loss = 0.07414568960666656
Validation loss = 0.06228567659854889
Validation loss = 0.06416867673397064
Validation loss = 0.06890921294689178
Validation loss = 0.06018250808119774
Validation loss = 0.053605351597070694
Validation loss = 0.057289861142635345
Validation loss = 0.05049630254507065
Validation loss = 0.051032617688179016
Validation loss = 0.05004872754216194
Validation loss = 0.04013139754533768
Validation loss = 0.04978843033313751
Validation loss = 0.04137178137898445
Validation loss = 0.040886081755161285
Validation loss = 0.04405514523386955
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.23401476442813873
Validation loss = 0.13456332683563232
Validation loss = 0.10885592550039291
Validation loss = 0.09960513561964035
Validation loss = 0.08856913447380066
Validation loss = 0.08071362972259521
Validation loss = 0.06950891017913818
Validation loss = 0.0765366181731224
Validation loss = 0.06955272704362869
Validation loss = 0.06026766076683998
Validation loss = 0.05501206964254379
Validation loss = 0.057079486548900604
Validation loss = 0.04941607639193535
Validation loss = 0.05293403938412666
Validation loss = 0.049437541514635086
Validation loss = 0.04911282658576965
Validation loss = 0.0484369657933712
Validation loss = 0.044214099645614624
Validation loss = 0.04099395498633385
Validation loss = 0.04386821389198303
Validation loss = 0.03657376766204834
Validation loss = 0.04237779974937439
Validation loss = 0.051143500953912735
Validation loss = 0.042757753282785416
Validation loss = 0.041828006505966187
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00137 |
| Iteration     | 1        |
| MaximumReturn | -0.0011  |
| MinimumReturn | -0.00175 |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05659075081348419
Validation loss = 0.04851943626999855
Validation loss = 0.0348077192902565
Validation loss = 0.028085729107260704
Validation loss = 0.028364963829517365
Validation loss = 0.03936220332980156
Validation loss = 0.029951583594083786
Validation loss = 0.03164830803871155
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05079203099012375
Validation loss = 0.03886071965098381
Validation loss = 0.03627053648233414
Validation loss = 0.03378427401185036
Validation loss = 0.03330850973725319
Validation loss = 0.030605213716626167
Validation loss = 0.03350995108485222
Validation loss = 0.03950008004903793
Validation loss = 0.0397845022380352
Validation loss = 0.03811510652303696
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07750999927520752
Validation loss = 0.040844425559043884
Validation loss = 0.035599831491708755
Validation loss = 0.04234436899423599
Validation loss = 0.032327599823474884
Validation loss = 0.03350359573960304
Validation loss = 0.027375750243663788
Validation loss = 0.033961083739995956
Validation loss = 0.03521789237856865
Validation loss = 0.03362754359841347
Validation loss = 0.03075530007481575
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.052994295954704285
Validation loss = 0.03840657323598862
Validation loss = 0.03499351441860199
Validation loss = 0.030792787671089172
Validation loss = 0.03310352936387062
Validation loss = 0.028914334252476692
Validation loss = 0.03218549117445946
Validation loss = 0.02603728324174881
Validation loss = 0.03134401515126228
Validation loss = 0.026150187477469444
Validation loss = 0.024964462965726852
Validation loss = 0.02528146281838417
Validation loss = 0.02603914402425289
Validation loss = 0.030141577124595642
Validation loss = 0.027688894420862198
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06636855751276016
Validation loss = 0.03885023295879364
Validation loss = 0.03138704597949982
Validation loss = 0.030238496139645576
Validation loss = 0.03315482661128044
Validation loss = 0.027315817773342133
Validation loss = 0.025939473882317543
Validation loss = 0.029857629910111427
Validation loss = 0.02721182256937027
Validation loss = 0.027152759954333305
Validation loss = 0.026242421939969063
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000924 |
| Iteration     | 2         |
| MaximumReturn | -0.000655 |
| MinimumReturn | -0.00122  |
| TotalSamples  | 6664      |
-----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04140230640769005
Validation loss = 0.027761192992329597
Validation loss = 0.026174059137701988
Validation loss = 0.02386598475277424
Validation loss = 0.03790230304002762
Validation loss = 0.03876425698399544
Validation loss = 0.02682059444487095
Validation loss = 0.033260345458984375
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.030809177085757256
Validation loss = 0.027378777042031288
Validation loss = 0.029848426580429077
Validation loss = 0.03233949467539787
Validation loss = 0.028796637430787086
Validation loss = 0.03424990177154541
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04472656920552254
Validation loss = 0.03237634897232056
Validation loss = 0.02687148004770279
Validation loss = 0.026675112545490265
Validation loss = 0.02117355726659298
Validation loss = 0.02519938163459301
Validation loss = 0.03269515559077263
Validation loss = 0.030889445915818214
Validation loss = 0.031211182475090027
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.042131196707487106
Validation loss = 0.039621904492378235
Validation loss = 0.03449491038918495
Validation loss = 0.02345891483128071
Validation loss = 0.02561976946890354
Validation loss = 0.02460774965584278
Validation loss = 0.032941002398729324
Validation loss = 0.03490791842341423
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03958271071314812
Validation loss = 0.028402084484696388
Validation loss = 0.035131387412548065
Validation loss = 0.024884551763534546
Validation loss = 0.02159985341131687
Validation loss = 0.02473871409893036
Validation loss = 0.02104905992746353
Validation loss = 0.023251989856362343
Validation loss = 0.028449930250644684
Validation loss = 0.0309495497494936
Validation loss = 0.025774629786610603
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00104  |
| Iteration     | 3         |
| MaximumReturn | -0.000715 |
| MinimumReturn | -0.0013   |
| TotalSamples  | 8330      |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05389422923326492
Validation loss = 0.027906812727451324
Validation loss = 0.022534361109137535
Validation loss = 0.02161385677754879
Validation loss = 0.02427634969353676
Validation loss = 0.028041744604706764
Validation loss = 0.02132149413228035
Validation loss = 0.0213567353785038
Validation loss = 0.020059343427419662
Validation loss = 0.02697026915848255
Validation loss = 0.025519894436001778
Validation loss = 0.02689911425113678
Validation loss = 0.020737264305353165
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.030974362045526505
Validation loss = 0.024288509041070938
Validation loss = 0.024018164724111557
Validation loss = 0.030543845146894455
Validation loss = 0.025319332256913185
Validation loss = 0.02540099434554577
Validation loss = 0.02247803285717964
Validation loss = 0.02578151226043701
Validation loss = 0.021290941163897514
Validation loss = 0.021846294403076172
Validation loss = 0.020394323393702507
Validation loss = 0.021915709599852562
Validation loss = 0.027317868545651436
Validation loss = 0.021854905411601067
Validation loss = 0.034466445446014404
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04009489342570305
Validation loss = 0.02785339020192623
Validation loss = 0.022278711199760437
Validation loss = 0.02109808474779129
Validation loss = 0.029765387997031212
Validation loss = 0.02433927170932293
Validation loss = 0.024239571765065193
Validation loss = 0.02184685878455639
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02194177359342575
Validation loss = 0.0300060473382473
Validation loss = 0.02426624856889248
Validation loss = 0.024619748815894127
Validation loss = 0.02515827864408493
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024968618527054787
Validation loss = 0.028157806023955345
Validation loss = 0.022785089910030365
Validation loss = 0.025254100561141968
Validation loss = 0.023106029257178307
Validation loss = 0.021860487759113312
Validation loss = 0.02494538016617298
Validation loss = 0.020967040210962296
Validation loss = 0.025772901251912117
Validation loss = 0.019573239609599113
Validation loss = 0.02242935076355934
Validation loss = 0.017395127564668655
Validation loss = 0.02011301927268505
Validation loss = 0.02023405209183693
Validation loss = 0.019286619499325752
Validation loss = 0.018698936328291893
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00207 |
| Iteration     | 4        |
| MaximumReturn | -0.00116 |
| MinimumReturn | -0.00291 |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02313965931534767
Validation loss = 0.020009836181998253
Validation loss = 0.019170161336660385
Validation loss = 0.021921178326010704
Validation loss = 0.01569836214184761
Validation loss = 0.016352422535419464
Validation loss = 0.019402211531996727
Validation loss = 0.020284749567508698
Validation loss = 0.015980441123247147
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04187122732400894
Validation loss = 0.02465125359594822
Validation loss = 0.021511761471629143
Validation loss = 0.021300196647644043
Validation loss = 0.022046511992812157
Validation loss = 0.024059996008872986
Validation loss = 0.0194165650755167
Validation loss = 0.018158523365855217
Validation loss = 0.020264972001314163
Validation loss = 0.01685268245637417
Validation loss = 0.016002865508198738
Validation loss = 0.015039792284369469
Validation loss = 0.022034216672182083
Validation loss = 0.017619827762246132
Validation loss = 0.017361562699079514
Validation loss = 0.019062191247940063
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0238309595733881
Validation loss = 0.022210102528333664
Validation loss = 0.022093042731285095
Validation loss = 0.016072120517492294
Validation loss = 0.021320445463061333
Validation loss = 0.024978773668408394
Validation loss = 0.018864909186959267
Validation loss = 0.025991493836045265
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022503037005662918
Validation loss = 0.020424842834472656
Validation loss = 0.01885303109884262
Validation loss = 0.017323732376098633
Validation loss = 0.017576538026332855
Validation loss = 0.016859078779816628
Validation loss = 0.024148423224687576
Validation loss = 0.019381826743483543
Validation loss = 0.016938865184783936
Validation loss = 0.01678347773849964
Validation loss = 0.01881377026438713
Validation loss = 0.022628000006079674
Validation loss = 0.022921467199921608
Validation loss = 0.01641993597149849
Validation loss = 0.017784733325242996
Validation loss = 0.01611936464905739
Validation loss = 0.022190410643815994
Validation loss = 0.019106606021523476
Validation loss = 0.024954458698630333
Validation loss = 0.01760576292872429
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018574191257357597
Validation loss = 0.020909441635012627
Validation loss = 0.016161013394594193
Validation loss = 0.017530057579278946
Validation loss = 0.01765880361199379
Validation loss = 0.02162533812224865
Validation loss = 0.01731763407588005
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00103 |
| Iteration     | 5        |
| MaximumReturn | -0.00071 |
| MinimumReturn | -0.00128 |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019629474729299545
Validation loss = 0.015555943362414837
Validation loss = 0.02689151093363762
Validation loss = 0.01813639886677265
Validation loss = 0.017214994877576828
Validation loss = 0.018654946237802505
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02170962654054165
Validation loss = 0.0238542091101408
Validation loss = 0.020754007622599602
Validation loss = 0.018652845174074173
Validation loss = 0.01954532042145729
Validation loss = 0.02396010234951973
Validation loss = 0.016633126884698868
Validation loss = 0.021028202027082443
Validation loss = 0.022467385977506638
Validation loss = 0.01775279827415943
Validation loss = 0.02804814837872982
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025308754295110703
Validation loss = 0.02189653180539608
Validation loss = 0.019926607608795166
Validation loss = 0.01745273917913437
Validation loss = 0.015306954272091389
Validation loss = 0.017187096178531647
Validation loss = 0.02158718928694725
Validation loss = 0.024156929925084114
Validation loss = 0.018320666626095772
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02379109337925911
Validation loss = 0.01754281297326088
Validation loss = 0.019875045865774155
Validation loss = 0.01881340704858303
Validation loss = 0.025602931156754494
Validation loss = 0.020228499546647072
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01965545117855072
Validation loss = 0.01695624366402626
Validation loss = 0.0261238906532526
Validation loss = 0.021638089790940285
Validation loss = 0.021532276645302773
Validation loss = 0.02062276192009449
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00135  |
| Iteration     | 6         |
| MaximumReturn | -0.000867 |
| MinimumReturn | -0.00226  |
| TotalSamples  | 13328     |
-----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04311911761760712
Validation loss = 0.023309139534831047
Validation loss = 0.01758645474910736
Validation loss = 0.02162995934486389
Validation loss = 0.017877239733934402
Validation loss = 0.013843239285051823
Validation loss = 0.0160058606415987
Validation loss = 0.01604844257235527
Validation loss = 0.013657748699188232
Validation loss = 0.02253906987607479
Validation loss = 0.019785145297646523
Validation loss = 0.01824493706226349
Validation loss = 0.016440855339169502
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025315439328551292
Validation loss = 0.014853755943477154
Validation loss = 0.021085543558001518
Validation loss = 0.016948800534009933
Validation loss = 0.020062927156686783
Validation loss = 0.018710508942604065
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020518779754638672
Validation loss = 0.01723845675587654
Validation loss = 0.018044212833046913
Validation loss = 0.021774007007479668
Validation loss = 0.016466354951262474
Validation loss = 0.020775364711880684
Validation loss = 0.015832817181944847
Validation loss = 0.016510123386979103
Validation loss = 0.01805851049721241
Validation loss = 0.021430745720863342
Validation loss = 0.017161402851343155
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023155471310019493
Validation loss = 0.016653943806886673
Validation loss = 0.017439179122447968
Validation loss = 0.01673477701842785
Validation loss = 0.01960144005715847
Validation loss = 0.015943380072712898
Validation loss = 0.026282548904418945
Validation loss = 0.018863309174776077
Validation loss = 0.02024470642209053
Validation loss = 0.016409676522016525
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017958790063858032
Validation loss = 0.019253330305218697
Validation loss = 0.017340077087283134
Validation loss = 0.016186391934752464
Validation loss = 0.017817102372646332
Validation loss = 0.01777459681034088
Validation loss = 0.016513507813215256
Validation loss = 0.017829565331339836
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00132  |
| Iteration     | 7         |
| MaximumReturn | -0.000864 |
| MinimumReturn | -0.00205  |
| TotalSamples  | 14994     |
-----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017473051324486732
Validation loss = 0.02137606218457222
Validation loss = 0.021131670102477074
Validation loss = 0.02009299397468567
Validation loss = 0.01926807127892971
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024748340249061584
Validation loss = 0.020672712475061417
Validation loss = 0.01558054331690073
Validation loss = 0.018752580508589745
Validation loss = 0.017606262117624283
Validation loss = 0.018076036125421524
Validation loss = 0.016023460775613785
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014841859228909016
Validation loss = 0.013127180747687817
Validation loss = 0.02687840536236763
Validation loss = 0.018468301743268967
Validation loss = 0.020169148221611977
Validation loss = 0.017089780420064926
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017365258187055588
Validation loss = 0.018098043277859688
Validation loss = 0.01830129884183407
Validation loss = 0.016542278230190277
Validation loss = 0.016288505867123604
Validation loss = 0.018140172585844994
Validation loss = 0.023089203983545303
Validation loss = 0.015400765463709831
Validation loss = 0.013738601468503475
Validation loss = 0.012590872123837471
Validation loss = 0.016653956845402718
Validation loss = 0.01665857993066311
Validation loss = 0.013662788085639477
Validation loss = 0.013965299353003502
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01719282753765583
Validation loss = 0.0195376705378294
Validation loss = 0.02022700011730194
Validation loss = 0.018700284883379936
Validation loss = 0.020228412002325058
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0015   |
| Iteration     | 8         |
| MaximumReturn | -0.000765 |
| MinimumReturn | -0.00343  |
| TotalSamples  | 16660     |
-----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01527412235736847
Validation loss = 0.01768648624420166
Validation loss = 0.013687207363545895
Validation loss = 0.02289384976029396
Validation loss = 0.022764593362808228
Validation loss = 0.01638667844235897
Validation loss = 0.018255513161420822
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019139111042022705
Validation loss = 0.017664991319179535
Validation loss = 0.01782464049756527
Validation loss = 0.01682160422205925
Validation loss = 0.016105998307466507
Validation loss = 0.01930573210120201
Validation loss = 0.014889666810631752
Validation loss = 0.015663422644138336
Validation loss = 0.015345579944550991
Validation loss = 0.014816561713814735
Validation loss = 0.016186406835913658
Validation loss = 0.015467032790184021
Validation loss = 0.018035728484392166
Validation loss = 0.01758819818496704
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02876230701804161
Validation loss = 0.01681712083518505
Validation loss = 0.016739584505558014
Validation loss = 0.017178591340780258
Validation loss = 0.016580870375037193
Validation loss = 0.015140701085329056
Validation loss = 0.01651456020772457
Validation loss = 0.017428286373615265
Validation loss = 0.016624202951788902
Validation loss = 0.019000457599759102
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016849692910909653
Validation loss = 0.03140166401863098
Validation loss = 0.018808677792549133
Validation loss = 0.017724405974149704
Validation loss = 0.015390031971037388
Validation loss = 0.01718716137111187
Validation loss = 0.01745787262916565
Validation loss = 0.014301426708698273
Validation loss = 0.01702379249036312
Validation loss = 0.01649295538663864
Validation loss = 0.018358949571847916
Validation loss = 0.013729594647884369
Validation loss = 0.015402128919959068
Validation loss = 0.014640764333307743
Validation loss = 0.014867081306874752
Validation loss = 0.013386107049882412
Validation loss = 0.016972530633211136
Validation loss = 0.017319848760962486
Validation loss = 0.014751838520169258
Validation loss = 0.017992176115512848
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018058152869343758
Validation loss = 0.01857077144086361
Validation loss = 0.01797349937260151
Validation loss = 0.02089228294789791
Validation loss = 0.01604335755109787
Validation loss = 0.016326425597071648
Validation loss = 0.01657617650926113
Validation loss = 0.01734267920255661
Validation loss = 0.01628716289997101
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00134  |
| Iteration     | 9         |
| MaximumReturn | -0.000821 |
| MinimumReturn | -0.00239  |
| TotalSamples  | 18326     |
-----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020562607795000076
Validation loss = 0.02051667496562004
Validation loss = 0.015527727082371712
Validation loss = 0.016723861917853355
Validation loss = 0.016904562711715698
Validation loss = 0.01829400844871998
Validation loss = 0.0158443171530962
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0203984584659338
Validation loss = 0.018700959160923958
Validation loss = 0.014833982102572918
Validation loss = 0.017132267355918884
Validation loss = 0.019228974357247353
Validation loss = 0.019501425325870514
Validation loss = 0.016272833570837975
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0197786632925272
Validation loss = 0.019002245739102364
Validation loss = 0.018318135291337967
Validation loss = 0.014254927635192871
Validation loss = 0.01519631315022707
Validation loss = 0.01297638937830925
Validation loss = 0.014379780739545822
Validation loss = 0.01939745806157589
Validation loss = 0.018789267167448997
Validation loss = 0.017957346513867378
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01654677279293537
Validation loss = 0.014667805284261703
Validation loss = 0.012385303154587746
Validation loss = 0.017747439444065094
Validation loss = 0.013868258334696293
Validation loss = 0.01333066076040268
Validation loss = 0.013273458927869797
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021660609170794487
Validation loss = 0.021188898012042046
Validation loss = 0.017823997884988785
Validation loss = 0.016205258667469025
Validation loss = 0.014409897848963737
Validation loss = 0.014630536548793316
Validation loss = 0.014354757033288479
Validation loss = 0.022026345133781433
Validation loss = 0.017111320048570633
Validation loss = 0.017333388328552246
Validation loss = 0.01768895424902439
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00181 |
| Iteration     | 10       |
| MaximumReturn | -0.00102 |
| MinimumReturn | -0.00311 |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014235367998480797
Validation loss = 0.020650852471590042
Validation loss = 0.014038478955626488
Validation loss = 0.016851026564836502
Validation loss = 0.014328849501907825
Validation loss = 0.023267332464456558
Validation loss = 0.013656087219715118
Validation loss = 0.015803486108779907
Validation loss = 0.013732333667576313
Validation loss = 0.017666153609752655
Validation loss = 0.01528525073081255
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017084237188100815
Validation loss = 0.01787373051047325
Validation loss = 0.018544159829616547
Validation loss = 0.016106002032756805
Validation loss = 0.016729110851883888
Validation loss = 0.014708496630191803
Validation loss = 0.015245738439261913
Validation loss = 0.01495637558400631
Validation loss = 0.012842826545238495
Validation loss = 0.014392459765076637
Validation loss = 0.015178804285824299
Validation loss = 0.013682225719094276
Validation loss = 0.014868268743157387
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014234615489840508
Validation loss = 0.02023911476135254
Validation loss = 0.01715998351573944
Validation loss = 0.0151632996276021
Validation loss = 0.013046758249402046
Validation loss = 0.017670050263404846
Validation loss = 0.01679190993309021
Validation loss = 0.01411149837076664
Validation loss = 0.01342833787202835
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011705870740115643
Validation loss = 0.014514018781483173
Validation loss = 0.014862501993775368
Validation loss = 0.013425866141915321
Validation loss = 0.017367523163557053
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01480359397828579
Validation loss = 0.013016825541853905
Validation loss = 0.01679648831486702
Validation loss = 0.018853003159165382
Validation loss = 0.022684816271066666
Validation loss = 0.015174562111496925
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00154  |
| Iteration     | 11        |
| MaximumReturn | -0.000781 |
| MinimumReturn | -0.00267  |
| TotalSamples  | 21658     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015232881531119347
Validation loss = 0.019312921911478043
Validation loss = 0.01470609474927187
Validation loss = 0.01475506741553545
Validation loss = 0.018787717446684837
Validation loss = 0.01694982871413231
Validation loss = 0.01987888291478157
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01693207584321499
Validation loss = 0.016133617609739304
Validation loss = 0.015740573406219482
Validation loss = 0.015622615814208984
Validation loss = 0.01371986884623766
Validation loss = 0.02401650883257389
Validation loss = 0.015767835080623627
Validation loss = 0.017320144921541214
Validation loss = 0.018369924277067184
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02110954001545906
Validation loss = 0.020309433341026306
Validation loss = 0.015161258168518543
Validation loss = 0.013245204463601112
Validation loss = 0.012880844064056873
Validation loss = 0.024648025631904602
Validation loss = 0.014917157590389252
Validation loss = 0.01911197602748871
Validation loss = 0.013697120361030102
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01812567189335823
Validation loss = 0.01496528834104538
Validation loss = 0.01664755865931511
Validation loss = 0.015458268113434315
Validation loss = 0.01439672987908125
Validation loss = 0.016715891659259796
Validation loss = 0.013960743322968483
Validation loss = 0.018150005489587784
Validation loss = 0.016003986820578575
Validation loss = 0.014569021761417389
Validation loss = 0.014150816015899181
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015975849702954292
Validation loss = 0.024270102381706238
Validation loss = 0.018558980897068977
Validation loss = 0.015271514654159546
Validation loss = 0.016440996900200844
Validation loss = 0.014441387727856636
Validation loss = 0.016390826553106308
Validation loss = 0.01848630979657173
Validation loss = 0.01638631522655487
Validation loss = 0.01768846996128559
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00157  |
| Iteration     | 12        |
| MaximumReturn | -0.000779 |
| MinimumReturn | -0.00272  |
| TotalSamples  | 23324     |
-----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015721842646598816
Validation loss = 0.013669844716787338
Validation loss = 0.01471515279263258
Validation loss = 0.017037300392985344
Validation loss = 0.021634360775351524
Validation loss = 0.014296215958893299
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01580321416258812
Validation loss = 0.017541611567139626
Validation loss = 0.020096518099308014
Validation loss = 0.016069231554865837
Validation loss = 0.014958524145185947
Validation loss = 0.021478096023201942
Validation loss = 0.016379551962018013
Validation loss = 0.01898723654448986
Validation loss = 0.021280258893966675
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01612098328769207
Validation loss = 0.01869284175336361
Validation loss = 0.016312185674905777
Validation loss = 0.014940843917429447
Validation loss = 0.01461928989738226
Validation loss = 0.015054073184728622
Validation loss = 0.01887732557952404
Validation loss = 0.01672237366437912
Validation loss = 0.015203829854726791
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013501741923391819
Validation loss = 0.017301497980952263
Validation loss = 0.013098777271807194
Validation loss = 0.012609542347490788
Validation loss = 0.017187219113111496
Validation loss = 0.014734885655343533
Validation loss = 0.01615322008728981
Validation loss = 0.014458772726356983
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01918376050889492
Validation loss = 0.01808263547718525
Validation loss = 0.016597693786025047
Validation loss = 0.017856266349554062
Validation loss = 0.017126144841313362
Validation loss = 0.017215771600604057
Validation loss = 0.01686449535191059
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00111  |
| Iteration     | 13        |
| MaximumReturn | -0.000771 |
| MinimumReturn | -0.00188  |
| TotalSamples  | 24990     |
-----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014882785268127918
Validation loss = 0.01426432654261589
Validation loss = 0.017832143232226372
Validation loss = 0.017576035112142563
Validation loss = 0.01846027933061123
Validation loss = 0.014727205969393253
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01764162816107273
Validation loss = 0.01744955964386463
Validation loss = 0.01852213405072689
Validation loss = 0.02157326228916645
Validation loss = 0.016488468274474144
Validation loss = 0.015045680105686188
Validation loss = 0.016829855740070343
Validation loss = 0.02016998827457428
Validation loss = 0.015786657109856606
Validation loss = 0.014450249262154102
Validation loss = 0.014995542354881763
Validation loss = 0.014859597198665142
Validation loss = 0.013925996609032154
Validation loss = 0.015165664255619049
Validation loss = 0.01699415035545826
Validation loss = 0.014123993925750256
Validation loss = 0.016344578936696053
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016573471948504448
Validation loss = 0.016665169969201088
Validation loss = 0.014429372735321522
Validation loss = 0.013778064399957657
Validation loss = 0.017206495627760887
Validation loss = 0.016474159434437752
Validation loss = 0.016743827611207962
Validation loss = 0.01707942970097065
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01521239336580038
Validation loss = 0.013774819672107697
Validation loss = 0.013849078677594662
Validation loss = 0.017020145431160927
Validation loss = 0.015805024653673172
Validation loss = 0.017761381343007088
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01761825568974018
Validation loss = 0.01647048629820347
Validation loss = 0.01637118309736252
Validation loss = 0.016479337587952614
Validation loss = 0.014739442616701126
Validation loss = 0.014508302323520184
Validation loss = 0.019381143152713776
Validation loss = 0.016943609341979027
Validation loss = 0.015459869988262653
Validation loss = 0.016525205224752426
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00147  |
| Iteration     | 14        |
| MaximumReturn | -0.000887 |
| MinimumReturn | -0.00291  |
| TotalSamples  | 26656     |
-----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014700396917760372
Validation loss = 0.020076964050531387
Validation loss = 0.017375105991959572
Validation loss = 0.01695229858160019
Validation loss = 0.017306020483374596
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014835746958851814
Validation loss = 0.015700368210673332
Validation loss = 0.017334362491965294
Validation loss = 0.013642719946801662
Validation loss = 0.014270540326833725
Validation loss = 0.016318485140800476
Validation loss = 0.013897443190217018
Validation loss = 0.015009009279310703
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016538437455892563
Validation loss = 0.017200186848640442
Validation loss = 0.01578821800649166
Validation loss = 0.014560122974216938
Validation loss = 0.013750827871263027
Validation loss = 0.016267480328679085
Validation loss = 0.02382551319897175
Validation loss = 0.014532714150846004
Validation loss = 0.017448831349611282
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016757028177380562
Validation loss = 0.01692393235862255
Validation loss = 0.01576279290020466
Validation loss = 0.01624305546283722
Validation loss = 0.015933658927679062
Validation loss = 0.019461793825030327
Validation loss = 0.017590196803212166
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02005731873214245
Validation loss = 0.019163206219673157
Validation loss = 0.016721850261092186
Validation loss = 0.0180356465280056
Validation loss = 0.02129434421658516
Validation loss = 0.020516423508524895
Validation loss = 0.018154578283429146
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00111 |
| Iteration     | 15       |
| MaximumReturn | -0.00067 |
| MinimumReturn | -0.00194 |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015705803409218788
Validation loss = 0.015357524156570435
Validation loss = 0.019185330718755722
Validation loss = 0.016248641535639763
Validation loss = 0.014664726331830025
Validation loss = 0.014698588289320469
Validation loss = 0.014175706543028355
Validation loss = 0.016117675229907036
Validation loss = 0.016053244471549988
Validation loss = 0.014286883175373077
Validation loss = 0.014433748088777065
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014791660942137241
Validation loss = 0.017443645745515823
Validation loss = 0.017882954329252243
Validation loss = 0.014517026953399181
Validation loss = 0.019281435757875443
Validation loss = 0.0155814653262496
Validation loss = 0.0188006404787302
Validation loss = 0.016156025230884552
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015641679987311363
Validation loss = 0.021772857755422592
Validation loss = 0.016419827938079834
Validation loss = 0.014268845319747925
Validation loss = 0.014145093970000744
Validation loss = 0.01601552404463291
Validation loss = 0.014884300529956818
Validation loss = 0.01808679662644863
Validation loss = 0.01463505532592535
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019739018753170967
Validation loss = 0.014206299558281898
Validation loss = 0.01346615981310606
Validation loss = 0.013729630038142204
Validation loss = 0.013036360032856464
Validation loss = 0.013390432111918926
Validation loss = 0.014668759889900684
Validation loss = 0.014083996415138245
Validation loss = 0.014937393367290497
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017186583951115608
Validation loss = 0.01547257136553526
Validation loss = 0.016443898901343346
Validation loss = 0.016449924558401108
Validation loss = 0.01614139787852764
Validation loss = 0.016327138990163803
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 16        |
| MaximumReturn | -0.000736 |
| MinimumReturn | -0.00152  |
| TotalSamples  | 29988     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014941608533263206
Validation loss = 0.013749828562140465
Validation loss = 0.0178376492112875
Validation loss = 0.015967385843396187
Validation loss = 0.017444318160414696
Validation loss = 0.01772809959948063
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01895863562822342
Validation loss = 0.01472966093569994
Validation loss = 0.015819793567061424
Validation loss = 0.015148298814892769
Validation loss = 0.0167985949665308
Validation loss = 0.01477815117686987
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01444915495812893
Validation loss = 0.015643998980522156
Validation loss = 0.019092822447419167
Validation loss = 0.01382102444767952
Validation loss = 0.01461584772914648
Validation loss = 0.017050419002771378
Validation loss = 0.016331886872649193
Validation loss = 0.0160385649651289
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019783470779657364
Validation loss = 0.01779535412788391
Validation loss = 0.014989091083407402
Validation loss = 0.01388541515916586
Validation loss = 0.014302386902272701
Validation loss = 0.019208285957574844
Validation loss = 0.013778913766145706
Validation loss = 0.015018796548247337
Validation loss = 0.012680369429290295
Validation loss = 0.01351089309900999
Validation loss = 0.013002399355173111
Validation loss = 0.014070590026676655
Validation loss = 0.013372914865612984
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01591494120657444
Validation loss = 0.021886685863137245
Validation loss = 0.014695036225020885
Validation loss = 0.014080408029258251
Validation loss = 0.015483151189982891
Validation loss = 0.015222851186990738
Validation loss = 0.01823696307837963
Validation loss = 0.013950806111097336
Validation loss = 0.014595086686313152
Validation loss = 0.016211988404393196
Validation loss = 0.01596822217106819
Validation loss = 0.01700170896947384
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00119  |
| Iteration     | 17        |
| MaximumReturn | -0.000795 |
| MinimumReturn | -0.00204  |
| TotalSamples  | 31654     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018397727981209755
Validation loss = 0.014302253723144531
Validation loss = 0.016122812405228615
Validation loss = 0.015030418522655964
Validation loss = 0.01576193980872631
Validation loss = 0.015193140134215355
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014303876087069511
Validation loss = 0.017908189445734024
Validation loss = 0.01775159314274788
Validation loss = 0.014947755262255669
Validation loss = 0.01590939797461033
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01552283763885498
Validation loss = 0.016715092584490776
Validation loss = 0.016678908839821815
Validation loss = 0.017293406650424004
Validation loss = 0.01389454398304224
Validation loss = 0.02054455317556858
Validation loss = 0.01599748060107231
Validation loss = 0.01592842862010002
Validation loss = 0.014425387606024742
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0159059576690197
Validation loss = 0.014974690042436123
Validation loss = 0.014180934987962246
Validation loss = 0.01390103343874216
Validation loss = 0.014590738341212273
Validation loss = 0.014373681508004665
Validation loss = 0.02410837635397911
Validation loss = 0.01829311065375805
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01583622395992279
Validation loss = 0.018899282440543175
Validation loss = 0.018002670258283615
Validation loss = 0.019271494820713997
Validation loss = 0.018006417900323868
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00114  |
| Iteration     | 18        |
| MaximumReturn | -0.000666 |
| MinimumReturn | -0.00176  |
| TotalSamples  | 33320     |
-----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017099525779485703
Validation loss = 0.016623640432953835
Validation loss = 0.017390448600053787
Validation loss = 0.014787822030484676
Validation loss = 0.015588724985718727
Validation loss = 0.014037790708243847
Validation loss = 0.018679041415452957
Validation loss = 0.022737473249435425
Validation loss = 0.013157161884009838
Validation loss = 0.01389253232628107
Validation loss = 0.015339864417910576
Validation loss = 0.012973412871360779
Validation loss = 0.016976242884993553
Validation loss = 0.01606045290827751
Validation loss = 0.015895236283540726
Validation loss = 0.01584456115961075
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016335949301719666
Validation loss = 0.016012461856007576
Validation loss = 0.014777095057070255
Validation loss = 0.023540109395980835
Validation loss = 0.018954714760184288
Validation loss = 0.016385572031140327
Validation loss = 0.016631586477160454
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02027231827378273
Validation loss = 0.015194566920399666
Validation loss = 0.014129620976746082
Validation loss = 0.01605803892016411
Validation loss = 0.018532397225499153
Validation loss = 0.01663115620613098
Validation loss = 0.01625276543200016
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018488314002752304
Validation loss = 0.015514849685132504
Validation loss = 0.01303087454289198
Validation loss = 0.015743635594844818
Validation loss = 0.01690586283802986
Validation loss = 0.014404351823031902
Validation loss = 0.013090908527374268
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018277162685990334
Validation loss = 0.015881488099694252
Validation loss = 0.017294220626354218
Validation loss = 0.016266096383333206
Validation loss = 0.014936361461877823
Validation loss = 0.015600807964801788
Validation loss = 0.01638444885611534
Validation loss = 0.015790283679962158
Validation loss = 0.015588468872010708
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0012   |
| Iteration     | 19        |
| MaximumReturn | -0.000728 |
| MinimumReturn | -0.00214  |
| TotalSamples  | 34986     |
-----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014982243068516254
Validation loss = 0.01595129258930683
Validation loss = 0.015967193990945816
Validation loss = 0.016596442088484764
Validation loss = 0.01547605637460947
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02105710096657276
Validation loss = 0.016913874074816704
Validation loss = 0.014168805442750454
Validation loss = 0.017112432047724724
Validation loss = 0.0157796461135149
Validation loss = 0.015712272375822067
Validation loss = 0.015197478234767914
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014382817782461643
Validation loss = 0.015378391370177269
Validation loss = 0.016677675768733025
Validation loss = 0.016366172581911087
Validation loss = 0.016229800879955292
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01619866117835045
Validation loss = 0.015930132940411568
Validation loss = 0.015070891007781029
Validation loss = 0.015103068202733994
Validation loss = 0.014880185946822166
Validation loss = 0.015702005475759506
Validation loss = 0.013451236300170422
Validation loss = 0.014353550970554352
Validation loss = 0.014191051945090294
Validation loss = 0.021932873874902725
Validation loss = 0.014127038419246674
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018817638978362083
Validation loss = 0.014403735287487507
Validation loss = 0.016354355961084366
Validation loss = 0.016050608828663826
Validation loss = 0.015171557664871216
Validation loss = 0.015309109352529049
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00118  |
| Iteration     | 20        |
| MaximumReturn | -0.000694 |
| MinimumReturn | -0.00212  |
| TotalSamples  | 36652     |
-----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016100069507956505
Validation loss = 0.014416668564081192
Validation loss = 0.015514860861003399
Validation loss = 0.016032535582780838
Validation loss = 0.0165412500500679
Validation loss = 0.01432676613330841
Validation loss = 0.015241637825965881
Validation loss = 0.01565307006239891
Validation loss = 0.018157854676246643
Validation loss = 0.01462616492062807
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017094383016228676
Validation loss = 0.015071926638484001
Validation loss = 0.014856587164103985
Validation loss = 0.014895222149789333
Validation loss = 0.014444585889577866
Validation loss = 0.014882074669003487
Validation loss = 0.01576823741197586
Validation loss = 0.01574021764099598
Validation loss = 0.01598750799894333
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017056329175829887
Validation loss = 0.01578478142619133
Validation loss = 0.0175225418061018
Validation loss = 0.016652749851346016
Validation loss = 0.01716151274740696
Validation loss = 0.015874512493610382
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014542771503329277
Validation loss = 0.02010858803987503
Validation loss = 0.01582004874944687
Validation loss = 0.013330768793821335
Validation loss = 0.01484359335154295
Validation loss = 0.012948360294103622
Validation loss = 0.014893503859639168
Validation loss = 0.01438865065574646
Validation loss = 0.01555357500910759
Validation loss = 0.014897625893354416
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017140449956059456
Validation loss = 0.017919734120368958
Validation loss = 0.015734300017356873
Validation loss = 0.017999567091464996
Validation loss = 0.015728076919913292
Validation loss = 0.016635365784168243
Validation loss = 0.0192368533462286
Validation loss = 0.018960218876600266
Validation loss = 0.0156647190451622
Validation loss = 0.018260136246681213
Validation loss = 0.01619645394384861
Validation loss = 0.015537033788859844
Validation loss = 0.018068231642246246
Validation loss = 0.017471708357334137
Validation loss = 0.018554098904132843
Validation loss = 0.015314970165491104
Validation loss = 0.015909813344478607
Validation loss = 0.016629626974463463
Validation loss = 0.015687301754951477
Validation loss = 0.015003602020442486
Validation loss = 0.016553912311792374
Validation loss = 0.01858683116734028
Validation loss = 0.017492225393652916
Validation loss = 0.016175055876374245
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0011   |
| Iteration     | 21        |
| MaximumReturn | -0.000802 |
| MinimumReturn | -0.00153  |
| TotalSamples  | 38318     |
-----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016720151528716087
Validation loss = 0.015149322338402271
Validation loss = 0.017863992601633072
Validation loss = 0.016993258148431778
Validation loss = 0.016297444701194763
Validation loss = 0.01938009448349476
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022341372445225716
Validation loss = 0.021209795027971268
Validation loss = 0.01803760603070259
Validation loss = 0.015322430990636349
Validation loss = 0.02192648872733116
Validation loss = 0.01799398846924305
Validation loss = 0.015656372532248497
Validation loss = 0.018214276060461998
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014531408436596394
Validation loss = 0.014733429066836834
Validation loss = 0.014405768364667892
Validation loss = 0.017087943851947784
Validation loss = 0.0167342871427536
Validation loss = 0.015504504553973675
Validation loss = 0.014307684265077114
Validation loss = 0.015822388231754303
Validation loss = 0.019984321668744087
Validation loss = 0.01781476102769375
Validation loss = 0.014670729637145996
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01578090712428093
Validation loss = 0.015175212174654007
Validation loss = 0.015162589028477669
Validation loss = 0.01395532675087452
Validation loss = 0.014704483561217785
Validation loss = 0.0137016661465168
Validation loss = 0.014616737142205238
Validation loss = 0.022736074402928352
Validation loss = 0.01578807272017002
Validation loss = 0.014432466588914394
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01590755581855774
Validation loss = 0.020217198878526688
Validation loss = 0.016403870657086372
Validation loss = 0.015895521268248558
Validation loss = 0.017643464729189873
Validation loss = 0.019704386591911316
Validation loss = 0.01547995861619711
Validation loss = 0.015491753816604614
Validation loss = 0.01462594885379076
Validation loss = 0.014546595513820648
Validation loss = 0.018707212060689926
Validation loss = 0.014534544199705124
Validation loss = 0.01808459870517254
Validation loss = 0.01453261449933052
Validation loss = 0.01562773808836937
Validation loss = 0.02014710009098053
Validation loss = 0.01494854036718607
Validation loss = 0.014867644757032394
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00105  |
| Iteration     | 22        |
| MaximumReturn | -0.000822 |
| MinimumReturn | -0.00147  |
| TotalSamples  | 39984     |
-----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016292579472064972
Validation loss = 0.01680978760123253
Validation loss = 0.01477873045951128
Validation loss = 0.014687496237456799
Validation loss = 0.014607766643166542
Validation loss = 0.0160207599401474
Validation loss = 0.017803458496928215
Validation loss = 0.01594213955104351
Validation loss = 0.015543247573077679
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015820089727640152
Validation loss = 0.01720171794295311
Validation loss = 0.022950327023863792
Validation loss = 0.015462091192603111
Validation loss = 0.014721943065524101
Validation loss = 0.015606708824634552
Validation loss = 0.014496171846985817
Validation loss = 0.01986512541770935
Validation loss = 0.01710352674126625
Validation loss = 0.01617530919611454
Validation loss = 0.015102429315447807
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016622353345155716
Validation loss = 0.01606985740363598
Validation loss = 0.01568726636469364
Validation loss = 0.015325494110584259
Validation loss = 0.015534093603491783
Validation loss = 0.014982404187321663
Validation loss = 0.01608097180724144
Validation loss = 0.016255706548690796
Validation loss = 0.016627401113510132
Validation loss = 0.01826036162674427
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014341725036501884
Validation loss = 0.01550764124840498
Validation loss = 0.01320474874228239
Validation loss = 0.013774551451206207
Validation loss = 0.01730998419225216
Validation loss = 0.013443881645798683
Validation loss = 0.015891075134277344
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014713441953063011
Validation loss = 0.016130659729242325
Validation loss = 0.01574847474694252
Validation loss = 0.014591957442462444
Validation loss = 0.017079217359423637
Validation loss = 0.015354211442172527
Validation loss = 0.01736827753484249
Validation loss = 0.014271277002990246
Validation loss = 0.014905218966305256
Validation loss = 0.01460342388600111
Validation loss = 0.014166993089020252
Validation loss = 0.014173324219882488
Validation loss = 0.014111382886767387
Validation loss = 0.016954215243458748
Validation loss = 0.015200553461909294
Validation loss = 0.016013549640774727
Validation loss = 0.016513243317604065
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0012   |
| Iteration     | 23        |
| MaximumReturn | -0.000759 |
| MinimumReturn | -0.00182  |
| TotalSamples  | 41650     |
-----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016865147277712822
Validation loss = 0.016502581536769867
Validation loss = 0.015435847453773022
Validation loss = 0.015572492964565754
Validation loss = 0.01457494217902422
Validation loss = 0.01874467357993126
Validation loss = 0.01624031737446785
Validation loss = 0.018111437559127808
Validation loss = 0.01639397442340851
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01505301333963871
Validation loss = 0.016365107148885727
Validation loss = 0.01733854040503502
Validation loss = 0.015194376930594444
Validation loss = 0.017787747085094452
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016480347141623497
Validation loss = 0.017130855470895767
Validation loss = 0.015444768592715263
Validation loss = 0.018771210685372353
Validation loss = 0.015771202743053436
Validation loss = 0.01606147363781929
Validation loss = 0.015408438630402088
Validation loss = 0.01755743846297264
Validation loss = 0.018580328673124313
Validation loss = 0.016446001827716827
Validation loss = 0.016053933650255203
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013582736253738403
Validation loss = 0.014833028428256512
Validation loss = 0.015684593468904495
Validation loss = 0.01863141544163227
Validation loss = 0.015611219219863415
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01824800856411457
Validation loss = 0.01860773004591465
Validation loss = 0.015388834290206432
Validation loss = 0.014422851614654064
Validation loss = 0.01618148759007454
Validation loss = 0.01554905902594328
Validation loss = 0.016633866354823112
Validation loss = 0.01514588762074709
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00138  |
| Iteration     | 24        |
| MaximumReturn | -0.000853 |
| MinimumReturn | -0.00271  |
| TotalSamples  | 43316     |
-----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016758671030402184
Validation loss = 0.01603069342672825
Validation loss = 0.01894942857325077
Validation loss = 0.018101198598742485
Validation loss = 0.018382444977760315
Validation loss = 0.015272704884409904
Validation loss = 0.01641579158604145
Validation loss = 0.014582466334104538
Validation loss = 0.018278315663337708
Validation loss = 0.014082984067499638
Validation loss = 0.01650981418788433
Validation loss = 0.01421923004090786
Validation loss = 0.015118434093892574
Validation loss = 0.01708853431046009
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016731426119804382
Validation loss = 0.015556370839476585
Validation loss = 0.015952106565237045
Validation loss = 0.01580987684428692
Validation loss = 0.015929775312542915
Validation loss = 0.014528260566294193
Validation loss = 0.015115110203623772
Validation loss = 0.01482325978577137
Validation loss = 0.022127242758870125
Validation loss = 0.017060235142707825
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016026897355914116
Validation loss = 0.016164902597665787
Validation loss = 0.016086861491203308
Validation loss = 0.018776951357722282
Validation loss = 0.01862175017595291
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013927597552537918
Validation loss = 0.015582208521664143
Validation loss = 0.014177198521792889
Validation loss = 0.020034203305840492
Validation loss = 0.013676274567842484
Validation loss = 0.0149649353697896
Validation loss = 0.013938204385340214
Validation loss = 0.013580628670752048
Validation loss = 0.01749364845454693
Validation loss = 0.014189314097166061
Validation loss = 0.016070052981376648
Validation loss = 0.019578581675887108
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014866052195429802
Validation loss = 0.017184797674417496
Validation loss = 0.015193234197795391
Validation loss = 0.01622087135910988
Validation loss = 0.014748936519026756
Validation loss = 0.01493114698678255
Validation loss = 0.01589823141694069
Validation loss = 0.015919553115963936
Validation loss = 0.02390911430120468
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00117  |
| Iteration     | 25        |
| MaximumReturn | -0.000875 |
| MinimumReturn | -0.00193  |
| TotalSamples  | 44982     |
-----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018000643700361252
Validation loss = 0.017194392159581184
Validation loss = 0.015403300523757935
Validation loss = 0.015427990816533566
Validation loss = 0.017790380865335464
Validation loss = 0.01622478850185871
Validation loss = 0.013604923151433468
Validation loss = 0.0140383904799819
Validation loss = 0.017271123826503754
Validation loss = 0.014250148087739944
Validation loss = 0.022082693874835968
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01793161779642105
Validation loss = 0.01600392535328865
Validation loss = 0.01546027883887291
Validation loss = 0.016277017071843147
Validation loss = 0.016756929457187653
Validation loss = 0.01693238504230976
Validation loss = 0.014202925376594067
Validation loss = 0.015184194780886173
Validation loss = 0.01525069773197174
Validation loss = 0.015171616338193417
Validation loss = 0.022894589230418205
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020576676353812218
Validation loss = 0.016245998442173004
Validation loss = 0.01616356521844864
Validation loss = 0.017767932265996933
Validation loss = 0.016727788373827934
Validation loss = 0.01705346070230007
Validation loss = 0.016529995948076248
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02384975738823414
Validation loss = 0.01509858388453722
Validation loss = 0.01567893847823143
Validation loss = 0.014370418153703213
Validation loss = 0.014727670699357986
Validation loss = 0.01543804444372654
Validation loss = 0.015970243141055107
Validation loss = 0.016834763810038567
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0182256568223238
Validation loss = 0.017190389335155487
Validation loss = 0.015196160413324833
Validation loss = 0.016298621892929077
Validation loss = 0.016163406893610954
Validation loss = 0.01729605533182621
Validation loss = 0.0152529776096344
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00108  |
| Iteration     | 26        |
| MaximumReturn | -0.000798 |
| MinimumReturn | -0.0017   |
| TotalSamples  | 46648     |
-----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017401644960045815
Validation loss = 0.016666145995259285
Validation loss = 0.014683270826935768
Validation loss = 0.01704423688352108
Validation loss = 0.0163327194750309
Validation loss = 0.016014564782381058
Validation loss = 0.014721170999109745
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01711026206612587
Validation loss = 0.016024809330701828
Validation loss = 0.01776917278766632
Validation loss = 0.016043219715356827
Validation loss = 0.017971603199839592
Validation loss = 0.015144544653594494
Validation loss = 0.016049403697252274
Validation loss = 0.01495503168553114
Validation loss = 0.015771832317113876
Validation loss = 0.014704645611345768
Validation loss = 0.015514438971877098
Validation loss = 0.015049248933792114
Validation loss = 0.01926407217979431
Validation loss = 0.016416549682617188
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015510639175772667
Validation loss = 0.017118869349360466
Validation loss = 0.020095061510801315
Validation loss = 0.01736246421933174
Validation loss = 0.017028843984007835
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015985438600182533
Validation loss = 0.01967143826186657
Validation loss = 0.01754908449947834
Validation loss = 0.017526816576719284
Validation loss = 0.018895594403147697
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015018570236861706
Validation loss = 0.015072142705321312
Validation loss = 0.01563916727900505
Validation loss = 0.019097428768873215
Validation loss = 0.01572607271373272
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00103 |
| Iteration     | 27       |
| MaximumReturn | -0.00061 |
| MinimumReturn | -0.00164 |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016075534746050835
Validation loss = 0.015541471540927887
Validation loss = 0.014678485691547394
Validation loss = 0.019552985206246376
Validation loss = 0.017406178638339043
Validation loss = 0.01610308699309826
Validation loss = 0.015090982429683208
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015970980748534203
Validation loss = 0.016505852341651917
Validation loss = 0.01718517392873764
Validation loss = 0.015433178283274174
Validation loss = 0.016054684296250343
Validation loss = 0.014157327823340893
Validation loss = 0.01599334552884102
Validation loss = 0.02134966105222702
Validation loss = 0.01537911593914032
Validation loss = 0.015575163066387177
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015386976301670074
Validation loss = 0.018521394580602646
Validation loss = 0.015223760157823563
Validation loss = 0.02205629087984562
Validation loss = 0.01526226382702589
Validation loss = 0.014943999238312244
Validation loss = 0.016280239447951317
Validation loss = 0.01698247157037258
Validation loss = 0.01499438751488924
Validation loss = 0.015276151709258556
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01743416115641594
Validation loss = 0.015210435725748539
Validation loss = 0.016519101336598396
Validation loss = 0.01764346845448017
Validation loss = 0.01736546866595745
Validation loss = 0.013534917496144772
Validation loss = 0.01622440479695797
Validation loss = 0.017166055738925934
Validation loss = 0.017481790855526924
Validation loss = 0.014856871217489243
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015483672730624676
Validation loss = 0.01609751768410206
Validation loss = 0.015655450522899628
Validation loss = 0.015517513267695904
Validation loss = 0.016349708661437035
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0012   |
| Iteration     | 28        |
| MaximumReturn | -0.000873 |
| MinimumReturn | -0.00176  |
| TotalSamples  | 49980     |
-----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01627182774245739
Validation loss = 0.015123286284506321
Validation loss = 0.015429113991558552
Validation loss = 0.018345896154642105
Validation loss = 0.016351625323295593
Validation loss = 0.01753848046064377
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01667451113462448
Validation loss = 0.015813127160072327
Validation loss = 0.015136628411710262
Validation loss = 0.01598023995757103
Validation loss = 0.018838126212358475
Validation loss = 0.01794133521616459
Validation loss = 0.015375643037259579
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016657954081892967
Validation loss = 0.019888535141944885
Validation loss = 0.0158858522772789
Validation loss = 0.017745904624462128
Validation loss = 0.01584313064813614
Validation loss = 0.015521351248025894
Validation loss = 0.016240045428276062
Validation loss = 0.017335042357444763
Validation loss = 0.01637384667992592
Validation loss = 0.01733533665537834
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013771643862128258
Validation loss = 0.014964344911277294
Validation loss = 0.01514429785311222
Validation loss = 0.014925260096788406
Validation loss = 0.017030121758580208
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016357891261577606
Validation loss = 0.01786145195364952
Validation loss = 0.018439749255776405
Validation loss = 0.01716175489127636
Validation loss = 0.016543101519346237
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00128  |
| Iteration     | 29        |
| MaximumReturn | -0.000806 |
| MinimumReturn | -0.00218  |
| TotalSamples  | 51646     |
-----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016612179577350616
Validation loss = 0.01969807595014572
Validation loss = 0.01659972406923771
Validation loss = 0.01754896156489849
Validation loss = 0.015193252824246883
Validation loss = 0.01639736257493496
Validation loss = 0.023269787430763245
Validation loss = 0.016416922211647034
Validation loss = 0.015977416187524796
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01541003305464983
Validation loss = 0.019081847742199898
Validation loss = 0.015582628548145294
Validation loss = 0.014965970069169998
Validation loss = 0.017841331660747528
Validation loss = 0.01563086360692978
Validation loss = 0.016034889966249466
Validation loss = 0.014808928593993187
Validation loss = 0.016676809638738632
Validation loss = 0.018785692751407623
Validation loss = 0.016049841418862343
Validation loss = 0.017155317589640617
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016691841185092926
Validation loss = 0.017395149916410446
Validation loss = 0.01895774155855179
Validation loss = 0.01790245808660984
Validation loss = 0.017205217853188515
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016444647684693336
Validation loss = 0.01705194264650345
Validation loss = 0.016676954925060272
Validation loss = 0.016485372558236122
Validation loss = 0.017989739775657654
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016557035967707634
Validation loss = 0.017215510830283165
Validation loss = 0.015431327745318413
Validation loss = 0.01820075325667858
Validation loss = 0.015952223911881447
Validation loss = 0.016547435894608498
Validation loss = 0.01659427210688591
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00106 |
| Iteration     | 30       |
| MaximumReturn | -0.00071 |
| MinimumReturn | -0.00153 |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017174139618873596
Validation loss = 0.017033644020557404
Validation loss = 0.01591097004711628
Validation loss = 0.0183638334274292
Validation loss = 0.016537301242351532
Validation loss = 0.019585149362683296
Validation loss = 0.017492543905973434
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016225509345531464
Validation loss = 0.015847578644752502
Validation loss = 0.016381816938519478
Validation loss = 0.015563740395009518
Validation loss = 0.0169764906167984
Validation loss = 0.01482460368424654
Validation loss = 0.014728765934705734
Validation loss = 0.015535213984549046
Validation loss = 0.01586015895009041
Validation loss = 0.015073055401444435
Validation loss = 0.014662349596619606
Validation loss = 0.015850555151700974
Validation loss = 0.01490761712193489
Validation loss = 0.016473017632961273
Validation loss = 0.015259236097335815
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016880568116903305
Validation loss = 0.01624150201678276
Validation loss = 0.016446683555841446
Validation loss = 0.016517523676156998
Validation loss = 0.016836222261190414
Validation loss = 0.016666758805513382
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018937768414616585
Validation loss = 0.015377689152956009
Validation loss = 0.01731451041996479
Validation loss = 0.0150606082752347
Validation loss = 0.015663502737879753
Validation loss = 0.01612413302063942
Validation loss = 0.01718522049486637
Validation loss = 0.017354337498545647
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01620500534772873
Validation loss = 0.01590956375002861
Validation loss = 0.019547024741768837
Validation loss = 0.01652194932103157
Validation loss = 0.016459934413433075
Validation loss = 0.016378212720155716
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00104  |
| Iteration     | 31        |
| MaximumReturn | -0.000658 |
| MinimumReturn | -0.00153  |
| TotalSamples  | 54978     |
-----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01767067052423954
Validation loss = 0.016124555841088295
Validation loss = 0.01993870921432972
Validation loss = 0.01830373890697956
Validation loss = 0.017098261043429375
Validation loss = 0.01796368509531021
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015092520043253899
Validation loss = 0.01760261133313179
Validation loss = 0.015650127083063126
Validation loss = 0.01609419286251068
Validation loss = 0.016199681907892227
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01757790707051754
Validation loss = 0.01829502359032631
Validation loss = 0.01764013059437275
Validation loss = 0.01565222628414631
Validation loss = 0.019548466429114342
Validation loss = 0.016613425686955452
Validation loss = 0.01732712984085083
Validation loss = 0.015376472845673561
Validation loss = 0.016257628798484802
Validation loss = 0.015598591417074203
Validation loss = 0.015625005587935448
Validation loss = 0.01737706921994686
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016168633475899696
Validation loss = 0.015014649368822575
Validation loss = 0.018741965293884277
Validation loss = 0.01596599631011486
Validation loss = 0.015331572853028774
Validation loss = 0.01749405823647976
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016404610127210617
Validation loss = 0.016581328585743904
Validation loss = 0.017812425270676613
Validation loss = 0.020262110978364944
Validation loss = 0.01788702793419361
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00113  |
| Iteration     | 32        |
| MaximumReturn | -0.000756 |
| MinimumReturn | -0.00163  |
| TotalSamples  | 56644     |
-----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01678539626300335
Validation loss = 0.0187989454716444
Validation loss = 0.016485681757330894
Validation loss = 0.017549194395542145
Validation loss = 0.01655077375471592
Validation loss = 0.018666502088308334
Validation loss = 0.024359429255127907
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016236012801527977
Validation loss = 0.01679919846355915
Validation loss = 0.0183310117572546
Validation loss = 0.015406702645123005
Validation loss = 0.014694745652377605
Validation loss = 0.015213553793728352
Validation loss = 0.015623963437974453
Validation loss = 0.01506317313760519
Validation loss = 0.014941596426069736
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018813980743288994
Validation loss = 0.018909616395831108
Validation loss = 0.015987519174814224
Validation loss = 0.01608450338244438
Validation loss = 0.017888130620121956
Validation loss = 0.016486074775457382
Validation loss = 0.015689987689256668
Validation loss = 0.016987858340144157
Validation loss = 0.017491843551397324
Validation loss = 0.01681850478053093
Validation loss = 0.016366014257073402
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017295992001891136
Validation loss = 0.01913880743086338
Validation loss = 0.016925964504480362
Validation loss = 0.018139520660042763
Validation loss = 0.016565904021263123
Validation loss = 0.017118971794843674
Validation loss = 0.01589244231581688
Validation loss = 0.01680630072951317
Validation loss = 0.020104901865124702
Validation loss = 0.0180721003562212
Validation loss = 0.01683867536485195
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016608359292149544
Validation loss = 0.015784593299031258
Validation loss = 0.01705711893737316
Validation loss = 0.01715846359729767
Validation loss = 0.016773436218500137
Validation loss = 0.015357278287410736
Validation loss = 0.021883543580770493
Validation loss = 0.016811123117804527
Validation loss = 0.01629061810672283
Validation loss = 0.016415635123848915
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00119  |
| Iteration     | 33        |
| MaximumReturn | -0.000872 |
| MinimumReturn | -0.00179  |
| TotalSamples  | 58310     |
-----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02055557258427143
Validation loss = 0.017343854531645775
Validation loss = 0.017591111361980438
Validation loss = 0.017587482929229736
Validation loss = 0.015935974195599556
Validation loss = 0.01686595380306244
Validation loss = 0.016343705356121063
Validation loss = 0.01649446040391922
Validation loss = 0.020847275853157043
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015155047178268433
Validation loss = 0.01661287620663643
Validation loss = 0.014791319146752357
Validation loss = 0.015300261788070202
Validation loss = 0.01658150926232338
Validation loss = 0.01908750645816326
Validation loss = 0.01717020943760872
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01599760912358761
Validation loss = 0.01561810728162527
Validation loss = 0.016695789992809296
Validation loss = 0.015924818813800812
Validation loss = 0.01629907451570034
Validation loss = 0.017620917409658432
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016036974266171455
Validation loss = 0.016222475096583366
Validation loss = 0.015831932425498962
Validation loss = 0.01559806801378727
Validation loss = 0.019363578408956528
Validation loss = 0.015800222754478455
Validation loss = 0.016952084377408028
Validation loss = 0.01916062831878662
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018043966963887215
Validation loss = 0.019005147740244865
Validation loss = 0.01721862703561783
Validation loss = 0.015924958512187004
Validation loss = 0.01541169360280037
Validation loss = 0.017537161707878113
Validation loss = 0.017862344160676003
Validation loss = 0.01800832338631153
Validation loss = 0.017119480296969414
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00111  |
| Iteration     | 34        |
| MaximumReturn | -0.000796 |
| MinimumReturn | -0.00185  |
| TotalSamples  | 59976     |
-----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017128687351942062
Validation loss = 0.015833571553230286
Validation loss = 0.015227759256958961
Validation loss = 0.017872996628284454
Validation loss = 0.018542807549238205
Validation loss = 0.015159999951720238
Validation loss = 0.015239391475915909
Validation loss = 0.01745533011853695
Validation loss = 0.016032809391617775
Validation loss = 0.015723824501037598
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017700912430882454
Validation loss = 0.017490824684500694
Validation loss = 0.014936892315745354
Validation loss = 0.015829186886548996
Validation loss = 0.016701750457286835
Validation loss = 0.015216920524835587
Validation loss = 0.0174260251224041
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018213029950857162
Validation loss = 0.017750373110175133
Validation loss = 0.017331155017018318
Validation loss = 0.016250623390078545
Validation loss = 0.017782757058739662
Validation loss = 0.016400380060076714
Validation loss = 0.018359549343585968
Validation loss = 0.016140878200531006
Validation loss = 0.017187882214784622
Validation loss = 0.01814233884215355
Validation loss = 0.015905486419796944
Validation loss = 0.017160382121801376
Validation loss = 0.019055096432566643
Validation loss = 0.016833007335662842
Validation loss = 0.016501212492585182
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019076470285654068
Validation loss = 0.015896929427981377
Validation loss = 0.016229186207056046
Validation loss = 0.01628822088241577
Validation loss = 0.017547735944390297
Validation loss = 0.015532598830759525
Validation loss = 0.018209120258688927
Validation loss = 0.017525605857372284
Validation loss = 0.016130071133375168
Validation loss = 0.016528939828276634
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016533169895410538
Validation loss = 0.0163675919175148
Validation loss = 0.01752198301255703
Validation loss = 0.01753249205648899
Validation loss = 0.018804753199219704
Validation loss = 0.016415664926171303
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 35        |
| MaximumReturn | -0.000641 |
| MinimumReturn | -0.00131  |
| TotalSamples  | 61642     |
-----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017258310690522194
Validation loss = 0.015799621120095253
Validation loss = 0.016253797337412834
Validation loss = 0.018573958426713943
Validation loss = 0.017363479360938072
Validation loss = 0.01667795516550541
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01811825856566429
Validation loss = 0.015363610349595547
Validation loss = 0.01565905101597309
Validation loss = 0.01838410273194313
Validation loss = 0.01694449968636036
Validation loss = 0.015478258952498436
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0200375784188509
Validation loss = 0.01745828241109848
Validation loss = 0.017606625333428383
Validation loss = 0.01665041781961918
Validation loss = 0.01947784796357155
Validation loss = 0.01782248355448246
Validation loss = 0.016022350639104843
Validation loss = 0.02079830691218376
Validation loss = 0.01900949515402317
Validation loss = 0.016540879383683205
Validation loss = 0.016033506020903587
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015980040654540062
Validation loss = 0.018313612788915634
Validation loss = 0.017277274280786514
Validation loss = 0.015581821091473103
Validation loss = 0.017810974270105362
Validation loss = 0.015906332060694695
Validation loss = 0.016153555363416672
Validation loss = 0.01487773284316063
Validation loss = 0.015894120559096336
Validation loss = 0.015235677361488342
Validation loss = 0.015887612476944923
Validation loss = 0.015656761825084686
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017041679471731186
Validation loss = 0.017228974029421806
Validation loss = 0.01795077696442604
Validation loss = 0.01654386706650257
Validation loss = 0.017663005739450455
Validation loss = 0.016948778182268143
Validation loss = 0.016562502831220627
Validation loss = 0.01992596872150898
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00105  |
| Iteration     | 36        |
| MaximumReturn | -0.000785 |
| MinimumReturn | -0.00134  |
| TotalSamples  | 63308     |
-----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016071071848273277
Validation loss = 0.017523063346743584
Validation loss = 0.01630249060690403
Validation loss = 0.018560517579317093
Validation loss = 0.0149719612672925
Validation loss = 0.017077947035431862
Validation loss = 0.016388705000281334
Validation loss = 0.016128960996866226
Validation loss = 0.016208602115511894
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016401363536715508
Validation loss = 0.015584305860102177
Validation loss = 0.016051335260272026
Validation loss = 0.016175327822566032
Validation loss = 0.015481161884963512
Validation loss = 0.017044242471456528
Validation loss = 0.01628449745476246
Validation loss = 0.018806973472237587
Validation loss = 0.015737049281597137
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016923701390624046
Validation loss = 0.02186387963593006
Validation loss = 0.015811946243047714
Validation loss = 0.017067711800336838
Validation loss = 0.01645713672041893
Validation loss = 0.017426852136850357
Validation loss = 0.01627306640148163
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01661110296845436
Validation loss = 0.018052207306027412
Validation loss = 0.01857306994497776
Validation loss = 0.015576337464153767
Validation loss = 0.01616939716041088
Validation loss = 0.015270231291651726
Validation loss = 0.01745637319982052
Validation loss = 0.019199810922145844
Validation loss = 0.016214555129408836
Validation loss = 0.01540950033813715
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016946325078606606
Validation loss = 0.016551781445741653
Validation loss = 0.016852179542183876
Validation loss = 0.01761659048497677
Validation loss = 0.01722905784845352
Validation loss = 0.015971196815371513
Validation loss = 0.016911091282963753
Validation loss = 0.016709450632333755
Validation loss = 0.020302554592490196
Validation loss = 0.016640620306134224
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00125  |
| Iteration     | 37        |
| MaximumReturn | -0.000814 |
| MinimumReturn | -0.00237  |
| TotalSamples  | 64974     |
-----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02372021973133087
Validation loss = 0.01808145083487034
Validation loss = 0.017609411850571632
Validation loss = 0.016244113445281982
Validation loss = 0.01702287420630455
Validation loss = 0.017537541687488556
Validation loss = 0.01542668230831623
Validation loss = 0.016466259956359863
Validation loss = 0.0184769406914711
Validation loss = 0.016371984034776688
Validation loss = 0.017361219972372055
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016972672194242477
Validation loss = 0.01568622514605522
Validation loss = 0.016843941062688828
Validation loss = 0.016039390116930008
Validation loss = 0.01579570397734642
Validation loss = 0.015783075243234634
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017184624448418617
Validation loss = 0.018677616491913795
Validation loss = 0.017623020336031914
Validation loss = 0.01557505689561367
Validation loss = 0.02030106633901596
Validation loss = 0.01766139268875122
Validation loss = 0.016861669719219208
Validation loss = 0.01694061979651451
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015745986253023148
Validation loss = 0.015492956154048443
Validation loss = 0.015771353617310524
Validation loss = 0.015683090314269066
Validation loss = 0.01882089488208294
Validation loss = 0.016226522624492645
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01819571852684021
Validation loss = 0.017521344125270844
Validation loss = 0.017454182729125023
Validation loss = 0.019538169726729393
Validation loss = 0.01736162230372429
Validation loss = 0.01672387309372425
Validation loss = 0.01674935221672058
Validation loss = 0.016623109579086304
Validation loss = 0.01730024442076683
Validation loss = 0.018038026988506317
Validation loss = 0.017307540401816368
Validation loss = 0.018866069614887238
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.001    |
| Iteration     | 38        |
| MaximumReturn | -0.000643 |
| MinimumReturn | -0.00136  |
| TotalSamples  | 66640     |
-----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016143610700964928
Validation loss = 0.016452332958579063
Validation loss = 0.015626246109604836
Validation loss = 0.018984559923410416
Validation loss = 0.017206627875566483
Validation loss = 0.015966717153787613
Validation loss = 0.015738146379590034
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01570843532681465
Validation loss = 0.017849484458565712
Validation loss = 0.017195124179124832
Validation loss = 0.015873482450842857
Validation loss = 0.018514791503548622
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01718294247984886
Validation loss = 0.016591038554906845
Validation loss = 0.01604989357292652
Validation loss = 0.01785709150135517
Validation loss = 0.018555164337158203
Validation loss = 0.01642460562288761
Validation loss = 0.020472083240747452
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017146749421954155
Validation loss = 0.01708613894879818
Validation loss = 0.015680335462093353
Validation loss = 0.017848806455731392
Validation loss = 0.016489302739501
Validation loss = 0.01610892452299595
Validation loss = 0.017679886892437935
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017255915328860283
Validation loss = 0.016839539632201195
Validation loss = 0.01595570147037506
Validation loss = 0.01729767955839634
Validation loss = 0.017584828659892082
Validation loss = 0.01895316317677498
Validation loss = 0.01770145073533058
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00122 |
| Iteration     | 39       |
| MaximumReturn | -0.0009  |
| MinimumReturn | -0.00201 |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015047837980091572
Validation loss = 0.016180574893951416
Validation loss = 0.015507064759731293
Validation loss = 0.016325175762176514
Validation loss = 0.02140291966497898
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016257548704743385
Validation loss = 0.01735800690948963
Validation loss = 0.0161521527916193
Validation loss = 0.01655535399913788
Validation loss = 0.015500430017709732
Validation loss = 0.01734522543847561
Validation loss = 0.016704659909009933
Validation loss = 0.016466811299324036
Validation loss = 0.016830027103424072
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018242906779050827
Validation loss = 0.01654113084077835
Validation loss = 0.016969606280326843
Validation loss = 0.01698824018239975
Validation loss = 0.016907358542084694
Validation loss = 0.0174053106456995
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02094339206814766
Validation loss = 0.01830330118536949
Validation loss = 0.016823889687657356
Validation loss = 0.019560644403100014
Validation loss = 0.01759599894285202
Validation loss = 0.016625452786684036
Validation loss = 0.016493692994117737
Validation loss = 0.01644984818994999
Validation loss = 0.01848268322646618
Validation loss = 0.016462918370962143
Validation loss = 0.016510922461748123
Validation loss = 0.016518326476216316
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016547145321965218
Validation loss = 0.01737990230321884
Validation loss = 0.016950339078903198
Validation loss = 0.018200570717453957
Validation loss = 0.017824813723564148
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00125 |
| Iteration     | 40       |
| MaximumReturn | -0.00082 |
| MinimumReturn | -0.00181 |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02286471612751484
Validation loss = 0.01565532013773918
Validation loss = 0.016653822734951973
Validation loss = 0.017164647579193115
Validation loss = 0.018890835344791412
Validation loss = 0.016093775629997253
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016216333955526352
Validation loss = 0.015581085346639156
Validation loss = 0.015436741523444653
Validation loss = 0.015615013428032398
Validation loss = 0.014900263398885727
Validation loss = 0.015078045427799225
Validation loss = 0.015672922134399414
Validation loss = 0.017066381871700287
Validation loss = 0.016692737117409706
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016827693209052086
Validation loss = 0.018228640779852867
Validation loss = 0.019089972600340843
Validation loss = 0.017439905554056168
Validation loss = 0.016363946720957756
Validation loss = 0.016270967200398445
Validation loss = 0.01674460433423519
Validation loss = 0.01795608177781105
Validation loss = 0.016475001350045204
Validation loss = 0.016824871301651
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0157171543687582
Validation loss = 0.015457376837730408
Validation loss = 0.01610962301492691
Validation loss = 0.015877434983849525
Validation loss = 0.021767139434814453
Validation loss = 0.018630478531122208
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02015441097319126
Validation loss = 0.017609987407922745
Validation loss = 0.01724022440612316
Validation loss = 0.018780894577503204
Validation loss = 0.017153728753328323
Validation loss = 0.017582494765520096
Validation loss = 0.016862541437149048
Validation loss = 0.01895902119576931
Validation loss = 0.01894226111471653
Validation loss = 0.0178914163261652
Validation loss = 0.016742993146181107
Validation loss = 0.01874535344541073
Validation loss = 0.0163949616253376
Validation loss = 0.01853950507938862
Validation loss = 0.01739703118801117
Validation loss = 0.016773194074630737
Validation loss = 0.020053919404745102
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00113 |
| Iteration     | 41       |
| MaximumReturn | -0.0008  |
| MinimumReturn | -0.00165 |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018143266439437866
Validation loss = 0.01698661781847477
Validation loss = 0.01755298115313053
Validation loss = 0.0161901842802763
Validation loss = 0.019074387848377228
Validation loss = 0.01672535017132759
Validation loss = 0.016449224203824997
Validation loss = 0.016160817816853523
Validation loss = 0.016121583059430122
Validation loss = 0.01766096241772175
Validation loss = 0.016678432002663612
Validation loss = 0.016297858208417892
Validation loss = 0.015759997069835663
Validation loss = 0.015499399043619633
Validation loss = 0.017930831760168076
Validation loss = 0.01631353795528412
Validation loss = 0.015438513830304146
Validation loss = 0.01612481288611889
Validation loss = 0.01632540300488472
Validation loss = 0.017498822882771492
Validation loss = 0.016158396378159523
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01725911535322666
Validation loss = 0.015815671533346176
Validation loss = 0.015445269644260406
Validation loss = 0.017593039199709892
Validation loss = 0.018331248313188553
Validation loss = 0.016131030395627022
Validation loss = 0.01656179502606392
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020329277962446213
Validation loss = 0.019646456465125084
Validation loss = 0.016722161322832108
Validation loss = 0.016681935638189316
Validation loss = 0.020048946142196655
Validation loss = 0.017774460837244987
Validation loss = 0.017360735684633255
Validation loss = 0.016710814088582993
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017924189567565918
Validation loss = 0.016831090673804283
Validation loss = 0.01730193756520748
Validation loss = 0.017405956983566284
Validation loss = 0.017430972307920456
Validation loss = 0.017446311190724373
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018025940284132957
Validation loss = 0.01806303858757019
Validation loss = 0.016256077215075493
Validation loss = 0.017744312062859535
Validation loss = 0.01744394190609455
Validation loss = 0.018899621441960335
Validation loss = 0.018426841124892235
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00108  |
| Iteration     | 42        |
| MaximumReturn | -0.000718 |
| MinimumReturn | -0.00165  |
| TotalSamples  | 73304     |
-----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01619698479771614
Validation loss = 0.015518824569880962
Validation loss = 0.01743176206946373
Validation loss = 0.017512381076812744
Validation loss = 0.016612904146313667
Validation loss = 0.01743978075683117
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01649116352200508
Validation loss = 0.016529226675629616
Validation loss = 0.015944829210639
Validation loss = 0.021779652684926987
Validation loss = 0.018176494166254997
Validation loss = 0.01606222242116928
Validation loss = 0.0166610237210989
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01706400141119957
Validation loss = 0.018116544932127
Validation loss = 0.017079345881938934
Validation loss = 0.018813667818903923
Validation loss = 0.02290460839867592
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01709551364183426
Validation loss = 0.017111197113990784
Validation loss = 0.017946934327483177
Validation loss = 0.017772626131772995
Validation loss = 0.016297584399580956
Validation loss = 0.01905573159456253
Validation loss = 0.01657002419233322
Validation loss = 0.016615040600299835
Validation loss = 0.017963316291570663
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018228625878691673
Validation loss = 0.01715836673974991
Validation loss = 0.01706172153353691
Validation loss = 0.018295060843229294
Validation loss = 0.017174692824482918
Validation loss = 0.017442908138036728
Validation loss = 0.016712045297026634
Validation loss = 0.0178462453186512
Validation loss = 0.017426902428269386
Validation loss = 0.019053325057029724
Validation loss = 0.019054673612117767
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0011   |
| Iteration     | 43        |
| MaximumReturn | -0.000779 |
| MinimumReturn | -0.00195  |
| TotalSamples  | 74970     |
-----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016526227816939354
Validation loss = 0.01619124598801136
Validation loss = 0.018588131293654442
Validation loss = 0.017832916229963303
Validation loss = 0.01729617267847061
Validation loss = 0.018839966505765915
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016254551708698273
Validation loss = 0.016618207097053528
Validation loss = 0.01621118001639843
Validation loss = 0.015367113053798676
Validation loss = 0.017514975741505623
Validation loss = 0.015450874343514442
Validation loss = 0.016616391018033028
Validation loss = 0.021296264603734016
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02962091751396656
Validation loss = 0.017301475629210472
Validation loss = 0.017658375203609467
Validation loss = 0.018506888300180435
Validation loss = 0.016841968521475792
Validation loss = 0.018336720764636993
Validation loss = 0.016792410984635353
Validation loss = 0.018417082726955414
Validation loss = 0.016958147287368774
Validation loss = 0.01750987023115158
Validation loss = 0.016879716888070107
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016192175447940826
Validation loss = 0.015608904883265495
Validation loss = 0.01560548972338438
Validation loss = 0.015796266496181488
Validation loss = 0.018202640116214752
Validation loss = 0.01867780089378357
Validation loss = 0.01607337035238743
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01725519448518753
Validation loss = 0.016854392364621162
Validation loss = 0.01733282580971718
Validation loss = 0.018289664760231972
Validation loss = 0.01670975051820278
Validation loss = 0.01858813315629959
Validation loss = 0.017430661246180534
Validation loss = 0.01705910451710224
Validation loss = 0.01791347935795784
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00107  |
| Iteration     | 44        |
| MaximumReturn | -0.000748 |
| MinimumReturn | -0.00168  |
| TotalSamples  | 76636     |
-----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017109934240579605
Validation loss = 0.016395919024944305
Validation loss = 0.016575120389461517
Validation loss = 0.018748411908745766
Validation loss = 0.017346328124403954
Validation loss = 0.017781931906938553
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015467576682567596
Validation loss = 0.01659640111029148
Validation loss = 0.022156592458486557
Validation loss = 0.01631881669163704
Validation loss = 0.01878819428384304
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017985576763749123
Validation loss = 0.017249777913093567
Validation loss = 0.016910670325160027
Validation loss = 0.018103625625371933
Validation loss = 0.017320197075605392
Validation loss = 0.016472885385155678
Validation loss = 0.01715727150440216
Validation loss = 0.01680006831884384
Validation loss = 0.01671808585524559
Validation loss = 0.01696271263062954
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017210306599736214
Validation loss = 0.01677034981548786
Validation loss = 0.016245555132627487
Validation loss = 0.017017725855112076
Validation loss = 0.019608207046985626
Validation loss = 0.019521644338965416
Validation loss = 0.01896885596215725
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017224477604031563
Validation loss = 0.017799563705921173
Validation loss = 0.021151460707187653
Validation loss = 0.01628298871219158
Validation loss = 0.018673833459615707
Validation loss = 0.018677806481719017
Validation loss = 0.016680553555488586
Validation loss = 0.017208825796842575
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00108 |
| Iteration     | 45       |
| MaximumReturn | -0.00073 |
| MinimumReturn | -0.00172 |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016839059069752693
Validation loss = 0.020006868988275528
Validation loss = 0.015888474881649017
Validation loss = 0.01598406210541725
Validation loss = 0.016063040122389793
Validation loss = 0.01669524423778057
Validation loss = 0.01653480902314186
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017409617081284523
Validation loss = 0.018277645111083984
Validation loss = 0.016785727813839912
Validation loss = 0.015640387311577797
Validation loss = 0.016742827370762825
Validation loss = 0.015922334045171738
Validation loss = 0.016981983557343483
Validation loss = 0.016682082787156105
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017793869599699974
Validation loss = 0.016498636454343796
Validation loss = 0.017890125513076782
Validation loss = 0.01700502075254917
Validation loss = 0.01677294448018074
Validation loss = 0.018851131200790405
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01638193428516388
Validation loss = 0.01775761879980564
Validation loss = 0.016135891899466515
Validation loss = 0.017117995768785477
Validation loss = 0.01659248396754265
Validation loss = 0.01615709438920021
Validation loss = 0.018276549875736237
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017630983144044876
Validation loss = 0.018267754465341568
Validation loss = 0.01767640933394432
Validation loss = 0.02053133025765419
Validation loss = 0.017986658960580826
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000983 |
| Iteration     | 46        |
| MaximumReturn | -0.000623 |
| MinimumReturn | -0.00147  |
| TotalSamples  | 79968     |
-----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01829034835100174
Validation loss = 0.015950920060276985
Validation loss = 0.018429560586810112
Validation loss = 0.017533784732222557
Validation loss = 0.01826895959675312
Validation loss = 0.016171608120203018
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01607980765402317
Validation loss = 0.015455600805580616
Validation loss = 0.015706222504377365
Validation loss = 0.018465479835867882
Validation loss = 0.016041293740272522
Validation loss = 0.015828397125005722
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016654713079333305
Validation loss = 0.017653927206993103
Validation loss = 0.018297353759407997
Validation loss = 0.01759282872080803
Validation loss = 0.016819743439555168
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01753733679652214
Validation loss = 0.020472930744290352
Validation loss = 0.017334293574094772
Validation loss = 0.01980612240731716
Validation loss = 0.017656724900007248
Validation loss = 0.016193576157093048
Validation loss = 0.01623079739511013
Validation loss = 0.017995838075876236
Validation loss = 0.016740236431360245
Validation loss = 0.016454610973596573
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016555069014430046
Validation loss = 0.01701183058321476
Validation loss = 0.019145483151078224
Validation loss = 0.017512323334813118
Validation loss = 0.01733991876244545
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00106 |
| Iteration     | 47       |
| MaximumReturn | -0.00079 |
| MinimumReturn | -0.00157 |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017562583088874817
Validation loss = 0.019738860428333282
Validation loss = 0.016323106363415718
Validation loss = 0.01649458333849907
Validation loss = 0.017663443461060524
Validation loss = 0.017087716609239578
Validation loss = 0.01632758602499962
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016684211790561676
Validation loss = 0.016838062554597855
Validation loss = 0.015831971541047096
Validation loss = 0.01691853441298008
Validation loss = 0.01605655811727047
Validation loss = 0.01591038890182972
Validation loss = 0.01574062928557396
Validation loss = 0.018493790179491043
Validation loss = 0.016835574060678482
Validation loss = 0.01657579280436039
Validation loss = 0.016003329306840897
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019704407081007957
Validation loss = 0.016743404790759087
Validation loss = 0.018865030258893967
Validation loss = 0.016793664544820786
Validation loss = 0.01709427312016487
Validation loss = 0.018509771674871445
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016994595527648926
Validation loss = 0.01618039608001709
Validation loss = 0.01742861606180668
Validation loss = 0.01647827960550785
Validation loss = 0.016174817457795143
Validation loss = 0.017365483567118645
Validation loss = 0.018277762457728386
Validation loss = 0.016857951879501343
Validation loss = 0.017334219068288803
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01737593300640583
Validation loss = 0.018869400024414062
Validation loss = 0.019281256943941116
Validation loss = 0.017846131697297096
Validation loss = 0.02051587775349617
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00108  |
| Iteration     | 48        |
| MaximumReturn | -0.000734 |
| MinimumReturn | -0.00153  |
| TotalSamples  | 83300     |
-----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018102280795574188
Validation loss = 0.01668178103864193
Validation loss = 0.018678301945328712
Validation loss = 0.016389645636081696
Validation loss = 0.016624413430690765
Validation loss = 0.017511390149593353
Validation loss = 0.016969768330454826
Validation loss = 0.016375144943594933
Validation loss = 0.020130665972828865
Validation loss = 0.016542091965675354
Validation loss = 0.016883403062820435
Validation loss = 0.016661297529935837
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015367290005087852
Validation loss = 0.016283394768834114
Validation loss = 0.016891883686184883
Validation loss = 0.01650809310376644
Validation loss = 0.018082384020090103
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01828901097178459
Validation loss = 0.016828132793307304
Validation loss = 0.017280522733926773
Validation loss = 0.017223244532942772
Validation loss = 0.01748635061085224
Validation loss = 0.018328813835978508
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01810801401734352
Validation loss = 0.015958184376358986
Validation loss = 0.01682489737868309
Validation loss = 0.01611470989882946
Validation loss = 0.016982311382889748
Validation loss = 0.018542496487498283
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01795177347958088
Validation loss = 0.016896961256861687
Validation loss = 0.018128057941794395
Validation loss = 0.01734047196805477
Validation loss = 0.01789533719420433
Validation loss = 0.016787784174084663
Validation loss = 0.01804073341190815
Validation loss = 0.016859428957104683
Validation loss = 0.01723669283092022
Validation loss = 0.017936212942004204
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00101  |
| Iteration     | 49        |
| MaximumReturn | -0.000629 |
| MinimumReturn | -0.00142  |
| TotalSamples  | 84966     |
-----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01628831960260868
Validation loss = 0.016174016520380974
Validation loss = 0.018496427685022354
Validation loss = 0.016171252354979515
Validation loss = 0.017650818452239037
Validation loss = 0.017727438360452652
Validation loss = 0.016641048714518547
Validation loss = 0.01766587793827057
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017110969871282578
Validation loss = 0.017922500148415565
Validation loss = 0.016218142583966255
Validation loss = 0.016006335616111755
Validation loss = 0.016607385128736496
Validation loss = 0.019808892160654068
Validation loss = 0.01621379889547825
Validation loss = 0.016362985596060753
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017338106408715248
Validation loss = 0.017591046169400215
Validation loss = 0.01877712272107601
Validation loss = 0.018118850886821747
Validation loss = 0.017292222008109093
Validation loss = 0.016504215076565742
Validation loss = 0.017790338024497032
Validation loss = 0.017878428101539612
Validation loss = 0.017407063394784927
Validation loss = 0.018418746069073677
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01573340967297554
Validation loss = 0.01604372449219227
Validation loss = 0.01748686097562313
Validation loss = 0.016629040241241455
Validation loss = 0.017662839964032173
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018384501338005066
Validation loss = 0.019518371671438217
Validation loss = 0.0177309587597847
Validation loss = 0.018549686297774315
Validation loss = 0.017717653885483742
Validation loss = 0.018731143325567245
Validation loss = 0.018268445506691933
Validation loss = 0.01920863799750805
Validation loss = 0.01989428699016571
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 50        |
| MaximumReturn | -0.000723 |
| MinimumReturn | -0.00155  |
| TotalSamples  | 86632     |
-----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016927944496273994
Validation loss = 0.016413426026701927
Validation loss = 0.016185559332370758
Validation loss = 0.01656404882669449
Validation loss = 0.018017413094639778
Validation loss = 0.017453745007514954
Validation loss = 0.015717295929789543
Validation loss = 0.01730259135365486
Validation loss = 0.016611373052001
Validation loss = 0.018983732908964157
Validation loss = 0.017968960106372833
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016052456572651863
Validation loss = 0.017712801694869995
Validation loss = 0.01922861859202385
Validation loss = 0.01685049757361412
Validation loss = 0.016327237710356712
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018163204193115234
Validation loss = 0.017260326072573662
Validation loss = 0.01788211427628994
Validation loss = 0.017305836081504822
Validation loss = 0.016967086121439934
Validation loss = 0.017758876085281372
Validation loss = 0.018496161326766014
Validation loss = 0.019056197255849838
Validation loss = 0.017194950953125954
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017846431583166122
Validation loss = 0.017327923327684402
Validation loss = 0.016143454238772392
Validation loss = 0.017921118065714836
Validation loss = 0.02727660723030567
Validation loss = 0.018792295828461647
Validation loss = 0.01647033542394638
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01753847673535347
Validation loss = 0.01775282621383667
Validation loss = 0.01796131767332554
Validation loss = 0.019607864320278168
Validation loss = 0.017686670646071434
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00103  |
| Iteration     | 51        |
| MaximumReturn | -0.000743 |
| MinimumReturn | -0.0015   |
| TotalSamples  | 88298     |
-----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01711323671042919
Validation loss = 0.01917925290763378
Validation loss = 0.016069011762738228
Validation loss = 0.016324369236826897
Validation loss = 0.017335988581180573
Validation loss = 0.01757904700934887
Validation loss = 0.01723453216254711
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01633298769593239
Validation loss = 0.018093878403306007
Validation loss = 0.01648082584142685
Validation loss = 0.015838252380490303
Validation loss = 0.017387667670845985
Validation loss = 0.01691153086721897
Validation loss = 0.016019461676478386
Validation loss = 0.017044000327587128
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019250303506851196
Validation loss = 0.017679007723927498
Validation loss = 0.018419397994875908
Validation loss = 0.017760828137397766
Validation loss = 0.018923604860901833
Validation loss = 0.01838776282966137
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017035437747836113
Validation loss = 0.01744174212217331
Validation loss = 0.020479969680309296
Validation loss = 0.016615770757198334
Validation loss = 0.01680181920528412
Validation loss = 0.017340708523988724
Validation loss = 0.018196873366832733
Validation loss = 0.017626721411943436
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017449595034122467
Validation loss = 0.017872508615255356
Validation loss = 0.018899239599704742
Validation loss = 0.017401374876499176
Validation loss = 0.01692349836230278
Validation loss = 0.017926253378391266
Validation loss = 0.01969342678785324
Validation loss = 0.01744491048157215
Validation loss = 0.016949307173490524
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00103  |
| Iteration     | 52        |
| MaximumReturn | -0.000752 |
| MinimumReturn | -0.00152  |
| TotalSamples  | 89964     |
-----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01730312593281269
Validation loss = 0.01626555249094963
Validation loss = 0.01831275410950184
Validation loss = 0.01644098572432995
Validation loss = 0.017355306074023247
Validation loss = 0.015783311799168587
Validation loss = 0.01630459539592266
Validation loss = 0.017581799998879433
Validation loss = 0.018212735652923584
Validation loss = 0.01622745208442211
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01678527146577835
Validation loss = 0.016761306673288345
Validation loss = 0.01661255955696106
Validation loss = 0.019845832139253616
Validation loss = 0.016821863129734993
Validation loss = 0.01774114929139614
Validation loss = 0.01871553063392639
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019944321364164352
Validation loss = 0.017700335010886192
Validation loss = 0.01935066096484661
Validation loss = 0.017242157831788063
Validation loss = 0.018296917900443077
Validation loss = 0.017362596467137337
Validation loss = 0.018324263393878937
Validation loss = 0.018987266346812248
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017620643600821495
Validation loss = 0.01773671805858612
Validation loss = 0.019150258973240852
Validation loss = 0.01799527183175087
Validation loss = 0.020558826625347137
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019060544669628143
Validation loss = 0.01776699535548687
Validation loss = 0.019179021939635277
Validation loss = 0.017304010689258575
Validation loss = 0.017323296517133713
Validation loss = 0.01789071038365364
Validation loss = 0.01757924072444439
Validation loss = 0.018783781677484512
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00114  |
| Iteration     | 53        |
| MaximumReturn | -0.000768 |
| MinimumReturn | -0.00153  |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016296902671456337
Validation loss = 0.01693328656256199
Validation loss = 0.016850562766194344
Validation loss = 0.01612655259668827
Validation loss = 0.017841855064034462
Validation loss = 0.01629941165447235
Validation loss = 0.018422428518533707
Validation loss = 0.0164202693849802
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017062507569789886
Validation loss = 0.01688600704073906
Validation loss = 0.016500644385814667
Validation loss = 0.016287021338939667
Validation loss = 0.017985552549362183
Validation loss = 0.017540259286761284
Validation loss = 0.016330070793628693
Validation loss = 0.017339935526251793
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018264608457684517
Validation loss = 0.018569661304354668
Validation loss = 0.017947111278772354
Validation loss = 0.017803631722927094
Validation loss = 0.01862083189189434
Validation loss = 0.0196489617228508
Validation loss = 0.017595844343304634
Validation loss = 0.01754448376595974
Validation loss = 0.01769399829208851
Validation loss = 0.017218900844454765
Validation loss = 0.017302371561527252
Validation loss = 0.018297936767339706
Validation loss = 0.020010700449347496
Validation loss = 0.01751965470612049
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018261048942804337
Validation loss = 0.016709797084331512
Validation loss = 0.0165999922901392
Validation loss = 0.016871007159352303
Validation loss = 0.018938744440674782
Validation loss = 0.016716262325644493
Validation loss = 0.01653374917805195
Validation loss = 0.016326453536748886
Validation loss = 0.017123740166425705
Validation loss = 0.017295127734541893
Validation loss = 0.01883588172495365
Validation loss = 0.016391925513744354
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019798826426267624
Validation loss = 0.01919127069413662
Validation loss = 0.01784728653728962
Validation loss = 0.01779594086110592
Validation loss = 0.018097586929798126
Validation loss = 0.018228428438305855
Validation loss = 0.01800210028886795
Validation loss = 0.017600540071725845
Validation loss = 0.021092239767313004
Validation loss = 0.01900217868387699
Validation loss = 0.017411425709724426
Validation loss = 0.017527040094137192
Validation loss = 0.018314776942133904
Validation loss = 0.017673654481768608
Validation loss = 0.017196033149957657
Validation loss = 0.018058080226182938
Validation loss = 0.01781197264790535
Validation loss = 0.017794813960790634
Validation loss = 0.017610745504498482
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.001    |
| Iteration     | 54        |
| MaximumReturn | -0.000646 |
| MinimumReturn | -0.00133  |
| TotalSamples  | 93296     |
-----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01640775240957737
Validation loss = 0.016831345856189728
Validation loss = 0.01634528674185276
Validation loss = 0.016247976571321487
Validation loss = 0.01836245134472847
Validation loss = 0.017773056402802467
Validation loss = 0.0169223565608263
Validation loss = 0.016956986859440804
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01720273308455944
Validation loss = 0.017219819128513336
Validation loss = 0.018914325162768364
Validation loss = 0.016690215095877647
Validation loss = 0.017722144722938538
Validation loss = 0.016536889597773552
Validation loss = 0.0171128511428833
Validation loss = 0.017840731889009476
Validation loss = 0.017087435349822044
Validation loss = 0.017536133527755737
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01793723925948143
Validation loss = 0.01741446927189827
Validation loss = 0.01720205694437027
Validation loss = 0.018150707706809044
Validation loss = 0.01776609942317009
Validation loss = 0.017161373049020767
Validation loss = 0.017680680379271507
Validation loss = 0.019670849665999413
Validation loss = 0.017638415098190308
Validation loss = 0.0181853286921978
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016099100932478905
Validation loss = 0.01709134876728058
Validation loss = 0.01893550716340542
Validation loss = 0.01694672554731369
Validation loss = 0.017827734351158142
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017816461622714996
Validation loss = 0.01837901398539543
Validation loss = 0.017552290111780167
Validation loss = 0.01729804091155529
Validation loss = 0.018910672515630722
Validation loss = 0.017978262156248093
Validation loss = 0.018043937161564827
Validation loss = 0.02062504179775715
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00102  |
| Iteration     | 55        |
| MaximumReturn | -0.000645 |
| MinimumReturn | -0.00142  |
| TotalSamples  | 94962     |
-----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01757587306201458
Validation loss = 0.01672436110675335
Validation loss = 0.0171671062707901
Validation loss = 0.016748465597629547
Validation loss = 0.019078485667705536
Validation loss = 0.01808028854429722
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016643354669213295
Validation loss = 0.01648990623652935
Validation loss = 0.02008511871099472
Validation loss = 0.016538085415959358
Validation loss = 0.01798490434885025
Validation loss = 0.016522806137800217
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01840435154736042
Validation loss = 0.017160940915346146
Validation loss = 0.017340850085020065
Validation loss = 0.017398279160261154
Validation loss = 0.01923554390668869
Validation loss = 0.01738569885492325
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01786218211054802
Validation loss = 0.017375638708472252
Validation loss = 0.019256670027971268
Validation loss = 0.01753772236406803
Validation loss = 0.01653098687529564
Validation loss = 0.018035007640719414
Validation loss = 0.01818242110311985
Validation loss = 0.019258275628089905
Validation loss = 0.017158273607492447
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017717871814966202
Validation loss = 0.017560428008437157
Validation loss = 0.022199876606464386
Validation loss = 0.018347468227148056
Validation loss = 0.018206916749477386
Validation loss = 0.018467344343662262
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000967 |
| Iteration     | 56        |
| MaximumReturn | -0.000735 |
| MinimumReturn | -0.00119  |
| TotalSamples  | 96628     |
-----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01716599054634571
Validation loss = 0.016676098108291626
Validation loss = 0.017825858667492867
Validation loss = 0.017167354002594948
Validation loss = 0.0176311694085598
Validation loss = 0.01807095855474472
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017610477283596992
Validation loss = 0.017622191458940506
Validation loss = 0.016548054292798042
Validation loss = 0.018034491688013077
Validation loss = 0.017082316800951958
Validation loss = 0.01870523951947689
Validation loss = 0.016833966597914696
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01728089340031147
Validation loss = 0.017883189022541046
Validation loss = 0.0181320458650589
Validation loss = 0.01757628656923771
Validation loss = 0.017753850668668747
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016901841387152672
Validation loss = 0.01801275834441185
Validation loss = 0.01862478069961071
Validation loss = 0.016812756657600403
Validation loss = 0.017618922516703606
Validation loss = 0.01986030302941799
Validation loss = 0.01757965423166752
Validation loss = 0.017099881544709206
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017520835623145103
Validation loss = 0.01775606907904148
Validation loss = 0.01796942949295044
Validation loss = 0.01792125403881073
Validation loss = 0.018110478296875954
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00113  |
| Iteration     | 57        |
| MaximumReturn | -0.000783 |
| MinimumReturn | -0.00162  |
| TotalSamples  | 98294     |
-----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016927948221564293
Validation loss = 0.017587214708328247
Validation loss = 0.018212828785181046
Validation loss = 0.017448261380195618
Validation loss = 0.017446348443627357
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016552172601222992
Validation loss = 0.017065681517124176
Validation loss = 0.016734911128878593
Validation loss = 0.01994410715997219
Validation loss = 0.016965458169579506
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018104711547493935
Validation loss = 0.017283199355006218
Validation loss = 0.017128434032201767
Validation loss = 0.01747487299144268
Validation loss = 0.0184065829962492
Validation loss = 0.018067074939608574
Validation loss = 0.017101945355534554
Validation loss = 0.018136296421289444
Validation loss = 0.0177377387881279
Validation loss = 0.017295964062213898
Validation loss = 0.01806837134063244
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01746974140405655
Validation loss = 0.01761520840227604
Validation loss = 0.019879130646586418
Validation loss = 0.017950402572751045
Validation loss = 0.017333943396806717
Validation loss = 0.01698053814470768
Validation loss = 0.017913220450282097
Validation loss = 0.01734895445406437
Validation loss = 0.017160966992378235
Validation loss = 0.01755993254482746
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018064875155687332
Validation loss = 0.018353432416915894
Validation loss = 0.017910784110426903
Validation loss = 0.018109343945980072
Validation loss = 0.01759476214647293
Validation loss = 0.01740935817360878
Validation loss = 0.01862507499754429
Validation loss = 0.018006350845098495
Validation loss = 0.018227431923151016
Validation loss = 0.017464958131313324
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00108  |
| Iteration     | 58        |
| MaximumReturn | -0.000535 |
| MinimumReturn | -0.00179  |
| TotalSamples  | 99960     |
-----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01690060645341873
Validation loss = 0.017182275652885437
Validation loss = 0.016773032024502754
Validation loss = 0.01703481748700142
Validation loss = 0.0178399495780468
Validation loss = 0.01688399165868759
Validation loss = 0.01708674058318138
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018367469310760498
Validation loss = 0.017982084304094315
Validation loss = 0.017051996663212776
Validation loss = 0.017014876008033752
Validation loss = 0.02425493486225605
Validation loss = 0.01699366420507431
Validation loss = 0.01654469221830368
Validation loss = 0.01640157587826252
Validation loss = 0.016910873353481293
Validation loss = 0.01696334406733513
Validation loss = 0.01685270667076111
Validation loss = 0.019233960658311844
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017758863046765327
Validation loss = 0.02228686586022377
Validation loss = 0.017528509721159935
Validation loss = 0.01844841055572033
Validation loss = 0.016732128337025642
Validation loss = 0.018292011693120003
Validation loss = 0.01738579571247101
Validation loss = 0.018172776326537132
Validation loss = 0.017550211399793625
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017331840470433235
Validation loss = 0.017443068325519562
Validation loss = 0.017375949770212173
Validation loss = 0.018201660364866257
Validation loss = 0.018204329535365105
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018448812887072563
Validation loss = 0.01810309663414955
Validation loss = 0.01769443415105343
Validation loss = 0.017536085098981857
Validation loss = 0.018033351749181747
Validation loss = 0.01921595260500908
Validation loss = 0.023635921999812126
Validation loss = 0.018390435725450516
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00118  |
| Iteration     | 59        |
| MaximumReturn | -0.000787 |
| MinimumReturn | -0.00209  |
| TotalSamples  | 101626    |
-----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01773514784872532
Validation loss = 0.017124902456998825
Validation loss = 0.016355959698557854
Validation loss = 0.01758214458823204
Validation loss = 0.01735801249742508
Validation loss = 0.017652088776230812
Validation loss = 0.0191541388630867
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02111429162323475
Validation loss = 0.017245030030608177
Validation loss = 0.017550434917211533
Validation loss = 0.018613522872328758
Validation loss = 0.017327837646007538
Validation loss = 0.018200671300292015
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01731910929083824
Validation loss = 0.018105395138263702
Validation loss = 0.020730208605527878
Validation loss = 0.020226988941431046
Validation loss = 0.018926605582237244
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016826098784804344
Validation loss = 0.01795821078121662
Validation loss = 0.017343081533908844
Validation loss = 0.018377898260951042
Validation loss = 0.0236973837018013
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018939904868602753
Validation loss = 0.017377059906721115
Validation loss = 0.01837783306837082
Validation loss = 0.018875354900956154
Validation loss = 0.018922582268714905
Validation loss = 0.017520077526569366
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00108 |
| Iteration     | 60       |
| MaximumReturn | -0.00073 |
| MinimumReturn | -0.00149 |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01689087226986885
Validation loss = 0.017149677500128746
Validation loss = 0.018396858125925064
Validation loss = 0.01682371273636818
Validation loss = 0.01818797178566456
Validation loss = 0.017265411093831062
Validation loss = 0.016971928998827934
Validation loss = 0.020015358924865723
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01812245510518551
Validation loss = 0.01767159067094326
Validation loss = 0.017422718927264214
Validation loss = 0.01864544488489628
Validation loss = 0.016911214217543602
Validation loss = 0.017650550231337547
Validation loss = 0.01963413693010807
Validation loss = 0.017432989552617073
Validation loss = 0.022156625986099243
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019797196611762047
Validation loss = 0.01877000369131565
Validation loss = 0.018498707562685013
Validation loss = 0.018600760027766228
Validation loss = 0.0187674630433321
Validation loss = 0.017659155651926994
Validation loss = 0.017716776579618454
Validation loss = 0.01835026405751705
Validation loss = 0.019311729818582535
Validation loss = 0.018642902374267578
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018866688013076782
Validation loss = 0.01691470481455326
Validation loss = 0.01774071715772152
Validation loss = 0.017264822497963905
Validation loss = 0.01692778989672661
Validation loss = 0.0167545136064291
Validation loss = 0.017698263749480247
Validation loss = 0.01723325625061989
Validation loss = 0.01724076457321644
Validation loss = 0.01763242483139038
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01799270510673523
Validation loss = 0.017870968207716942
Validation loss = 0.018507877364754677
Validation loss = 0.018366457894444466
Validation loss = 0.01811962015926838
Validation loss = 0.017953431233763695
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00126  |
| Iteration     | 61        |
| MaximumReturn | -0.000735 |
| MinimumReturn | -0.00191  |
| TotalSamples  | 104958    |
-----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018137039616703987
Validation loss = 0.017277495935559273
Validation loss = 0.017552006989717484
Validation loss = 0.017269808799028397
Validation loss = 0.021672390401363373
Validation loss = 0.017860470339655876
Validation loss = 0.016869954764842987
Validation loss = 0.018688715994358063
Validation loss = 0.018240880221128464
Validation loss = 0.017647339031100273
Validation loss = 0.01734740100800991
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01822718046605587
Validation loss = 0.018681276589632034
Validation loss = 0.017903797328472137
Validation loss = 0.01818467304110527
Validation loss = 0.017459291964769363
Validation loss = 0.02124672196805477
Validation loss = 0.017404530197381973
Validation loss = 0.017933115363121033
Validation loss = 0.017927449196577072
Validation loss = 0.01725131645798683
Validation loss = 0.017387164756655693
Validation loss = 0.018197666853666306
Validation loss = 0.017329903319478035
Validation loss = 0.01869899593293667
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01733686774969101
Validation loss = 0.018109971657395363
Validation loss = 0.017666921019554138
Validation loss = 0.01792067661881447
Validation loss = 0.019591612741351128
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0176831167191267
Validation loss = 0.018492750823497772
Validation loss = 0.02001473680138588
Validation loss = 0.017167650163173676
Validation loss = 0.018070969730615616
Validation loss = 0.01743110828101635
Validation loss = 0.01689818687736988
Validation loss = 0.017652185633778572
Validation loss = 0.017819788306951523
Validation loss = 0.018133223056793213
Validation loss = 0.017898332327604294
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01903132162988186
Validation loss = 0.017849287018179893
Validation loss = 0.01941184140741825
Validation loss = 0.017752327024936676
Validation loss = 0.017973972484469414
Validation loss = 0.02016647905111313
Validation loss = 0.017487449571490288
Validation loss = 0.018711509183049202
Validation loss = 0.01817634515464306
Validation loss = 0.020776022225618362
Validation loss = 0.018601467832922935
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00112 |
| Iteration     | 62       |
| MaximumReturn | -0.00066 |
| MinimumReturn | -0.00164 |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01820526458323002
Validation loss = 0.017332926392555237
Validation loss = 0.01734357699751854
Validation loss = 0.018548615276813507
Validation loss = 0.018433239310979843
Validation loss = 0.017191052436828613
Validation loss = 0.017182178795337677
Validation loss = 0.01727118529379368
Validation loss = 0.016921980306506157
Validation loss = 0.01789933070540428
Validation loss = 0.017306048423051834
Validation loss = 0.017392940819263458
Validation loss = 0.01740405336022377
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01822502166032791
Validation loss = 0.018025176599621773
Validation loss = 0.01772933453321457
Validation loss = 0.01749245636165142
Validation loss = 0.01799919828772545
Validation loss = 0.016709644347429276
Validation loss = 0.01901676133275032
Validation loss = 0.01749441586434841
Validation loss = 0.017297741025686264
Validation loss = 0.018671110272407532
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020146502181887627
Validation loss = 0.01813100464642048
Validation loss = 0.018370987847447395
Validation loss = 0.01721232943236828
Validation loss = 0.018596097826957703
Validation loss = 0.017733093351125717
Validation loss = 0.01856820657849312
Validation loss = 0.019255857914686203
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018634051084518433
Validation loss = 0.018820688128471375
Validation loss = 0.01852898858487606
Validation loss = 0.01742621697485447
Validation loss = 0.019191168248653412
Validation loss = 0.017561407759785652
Validation loss = 0.017538070678710938
Validation loss = 0.018146883696317673
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017919769510626793
Validation loss = 0.01751658320426941
Validation loss = 0.018487051129341125
Validation loss = 0.018841896206140518
Validation loss = 0.018309762701392174
Validation loss = 0.018275165930390358
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00114  |
| Iteration     | 63        |
| MaximumReturn | -0.000861 |
| MinimumReturn | -0.00188  |
| TotalSamples  | 108290    |
-----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01901164837181568
Validation loss = 0.01828184351325035
Validation loss = 0.017186442390084267
Validation loss = 0.017072768881917
Validation loss = 0.01620573364198208
Validation loss = 0.02480851300060749
Validation loss = 0.017504312098026276
Validation loss = 0.016816865652799606
Validation loss = 0.01986156776547432
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01979544572532177
Validation loss = 0.017436325550079346
Validation loss = 0.017749765887856483
Validation loss = 0.017867879942059517
Validation loss = 0.01750602014362812
Validation loss = 0.01750640571117401
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017948416993021965
Validation loss = 0.01829829066991806
Validation loss = 0.018085386604070663
Validation loss = 0.017690924927592278
Validation loss = 0.018428985029459
Validation loss = 0.01798243075609207
Validation loss = 0.018284671008586884
Validation loss = 0.017941655591130257
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021881088614463806
Validation loss = 0.023195067420601845
Validation loss = 0.017350753769278526
Validation loss = 0.017096539959311485
Validation loss = 0.017269592732191086
Validation loss = 0.01814560405910015
Validation loss = 0.01727348193526268
Validation loss = 0.017847197130322456
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018426647409796715
Validation loss = 0.018256783485412598
Validation loss = 0.019453980028629303
Validation loss = 0.018228353932499886
Validation loss = 0.0185453649610281
Validation loss = 0.017383333295583725
Validation loss = 0.02070596069097519
Validation loss = 0.017969250679016113
Validation loss = 0.020670073106884956
Validation loss = 0.017902877181768417
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00116  |
| Iteration     | 64        |
| MaximumReturn | -0.000749 |
| MinimumReturn | -0.00197  |
| TotalSamples  | 109956    |
-----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0182244461029768
Validation loss = 0.018576275557279587
Validation loss = 0.01727486215531826
Validation loss = 0.016872333362698555
Validation loss = 0.019148951396346092
Validation loss = 0.017184175550937653
Validation loss = 0.016971005126833916
Validation loss = 0.017239123582839966
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018309609964489937
Validation loss = 0.017475683242082596
Validation loss = 0.017978673800826073
Validation loss = 0.017616016790270805
Validation loss = 0.017424561083316803
Validation loss = 0.017319703474640846
Validation loss = 0.016857296228408813
Validation loss = 0.017518453299999237
Validation loss = 0.018950674682855606
Validation loss = 0.017797274515032768
Validation loss = 0.017916088923811913
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01753615215420723
Validation loss = 0.018692538142204285
Validation loss = 0.01786850206553936
Validation loss = 0.020214159041643143
Validation loss = 0.01842530071735382
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01906915195286274
Validation loss = 0.017559636384248734
Validation loss = 0.017390763387084007
Validation loss = 0.019714118912816048
Validation loss = 0.017547452822327614
Validation loss = 0.017076998949050903
Validation loss = 0.017887085676193237
Validation loss = 0.017067110165953636
Validation loss = 0.01767086237668991
Validation loss = 0.020053355023264885
Validation loss = 0.017502300441265106
Validation loss = 0.01783398538827896
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018442686647176743
Validation loss = 0.01829639822244644
Validation loss = 0.018146855756640434
Validation loss = 0.018581204116344452
Validation loss = 0.018879851326346397
Validation loss = 0.01833428256213665
Validation loss = 0.01821763440966606
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00107  |
| Iteration     | 65        |
| MaximumReturn | -0.000629 |
| MinimumReturn | -0.00149  |
| TotalSamples  | 111622    |
-----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01740751601755619
Validation loss = 0.01791256293654442
Validation loss = 0.018127670511603355
Validation loss = 0.01890432834625244
Validation loss = 0.017686471343040466
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017786797136068344
Validation loss = 0.017493413761258125
Validation loss = 0.01843741536140442
Validation loss = 0.0177681352943182
Validation loss = 0.01761692576110363
Validation loss = 0.017426040023565292
Validation loss = 0.018070971593260765
Validation loss = 0.018480395898222923
Validation loss = 0.017567148432135582
Validation loss = 0.018570002168416977
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017631778493523598
Validation loss = 0.017822975292801857
Validation loss = 0.018227469176054
Validation loss = 0.018441466614603996
Validation loss = 0.02037365734577179
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017079852521419525
Validation loss = 0.019945010542869568
Validation loss = 0.0190186258405447
Validation loss = 0.01722513511776924
Validation loss = 0.017217297106981277
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018502363935112953
Validation loss = 0.017846278846263885
Validation loss = 0.018478766083717346
Validation loss = 0.019897008314728737
Validation loss = 0.01841474138200283
Validation loss = 0.018233351409435272
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0011   |
| Iteration     | 66        |
| MaximumReturn | -0.000784 |
| MinimumReturn | -0.00162  |
| TotalSamples  | 113288    |
-----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017744531854987144
Validation loss = 0.01786855235695839
Validation loss = 0.020416831597685814
Validation loss = 0.016758708283305168
Validation loss = 0.017577147111296654
Validation loss = 0.01859026961028576
Validation loss = 0.01743139699101448
Validation loss = 0.017272859811782837
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017440859228372574
Validation loss = 0.01823439635336399
Validation loss = 0.018035054206848145
Validation loss = 0.017600243911147118
Validation loss = 0.018729809671640396
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02042478695511818
Validation loss = 0.018015563488006592
Validation loss = 0.018337616696953773
Validation loss = 0.018356960266828537
Validation loss = 0.019143754616379738
Validation loss = 0.01893797144293785
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018608633428812027
Validation loss = 0.018404057249426842
Validation loss = 0.025424184277653694
Validation loss = 0.017484266310930252
Validation loss = 0.017643200233578682
Validation loss = 0.018076160922646523
Validation loss = 0.018812015652656555
Validation loss = 0.018262075260281563
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018465092405676842
Validation loss = 0.0193394273519516
Validation loss = 0.018603762611746788
Validation loss = 0.021491875872015953
Validation loss = 0.018557297065854073
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000999 |
| Iteration     | 67        |
| MaximumReturn | -0.000763 |
| MinimumReturn | -0.00125  |
| TotalSamples  | 114954    |
-----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01886006072163582
Validation loss = 0.016610173508524895
Validation loss = 0.01673751324415207
Validation loss = 0.01914374716579914
Validation loss = 0.01758442260324955
Validation loss = 0.017482109367847443
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017173735424876213
Validation loss = 0.018310926854610443
Validation loss = 0.01769377291202545
Validation loss = 0.017560375854372978
Validation loss = 0.017363447695970535
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017890067771077156
Validation loss = 0.01790180243551731
Validation loss = 0.01793714612722397
Validation loss = 0.020435884594917297
Validation loss = 0.017800213769078255
Validation loss = 0.01755685545504093
Validation loss = 0.019135426729917526
Validation loss = 0.019281238317489624
Validation loss = 0.020042233169078827
Validation loss = 0.0183747299015522
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017656074836850166
Validation loss = 0.018233176320791245
Validation loss = 0.01838120073080063
Validation loss = 0.01708591729402542
Validation loss = 0.0181641336530447
Validation loss = 0.017642075195908546
Validation loss = 0.01924881897866726
Validation loss = 0.017454171553254128
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01844264380633831
Validation loss = 0.018562421202659607
Validation loss = 0.019391996785998344
Validation loss = 0.01940903253853321
Validation loss = 0.018598239868879318
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00118  |
| Iteration     | 68        |
| MaximumReturn | -0.000696 |
| MinimumReturn | -0.00199  |
| TotalSamples  | 116620    |
-----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01734950765967369
Validation loss = 0.018480835482478142
Validation loss = 0.01746171899139881
Validation loss = 0.016873303800821304
Validation loss = 0.017232421785593033
Validation loss = 0.017637090757489204
Validation loss = 0.01745515502989292
Validation loss = 0.017931640148162842
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01803945004940033
Validation loss = 0.01794319599866867
Validation loss = 0.01949497126042843
Validation loss = 0.0187804214656353
Validation loss = 0.018770603463053703
Validation loss = 0.018017588183283806
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017972899600863457
Validation loss = 0.017806926742196083
Validation loss = 0.017430663108825684
Validation loss = 0.019389623776078224
Validation loss = 0.022357186302542686
Validation loss = 0.018692709505558014
Validation loss = 0.01808161847293377
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01749761588871479
Validation loss = 0.017516985535621643
Validation loss = 0.02132180519402027
Validation loss = 0.018749523907899857
Validation loss = 0.01865593157708645
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019097518175840378
Validation loss = 0.018582113087177277
Validation loss = 0.01981489360332489
Validation loss = 0.018571536988019943
Validation loss = 0.018247967585921288
Validation loss = 0.01995227299630642
Validation loss = 0.018992437049746513
Validation loss = 0.01881720870733261
Validation loss = 0.01976429671049118
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00104  |
| Iteration     | 69        |
| MaximumReturn | -0.000635 |
| MinimumReturn | -0.00144  |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019534343853592873
Validation loss = 0.01729930378496647
Validation loss = 0.018449241295456886
Validation loss = 0.018319718539714813
Validation loss = 0.01809409074485302
Validation loss = 0.018555406481027603
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017934083938598633
Validation loss = 0.018264807760715485
Validation loss = 0.01872493512928486
Validation loss = 0.01830124296247959
Validation loss = 0.018313497304916382
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01801646128296852
Validation loss = 0.01807723194360733
Validation loss = 0.017929717898368835
Validation loss = 0.01844189688563347
Validation loss = 0.01827111840248108
Validation loss = 0.017887869849801064
Validation loss = 0.019108321517705917
Validation loss = 0.017312506213784218
Validation loss = 0.019561218097805977
Validation loss = 0.01769709587097168
Validation loss = 0.018617156893014908
Validation loss = 0.019167091697454453
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018377242609858513
Validation loss = 0.018065469339489937
Validation loss = 0.020327948033809662
Validation loss = 0.018776366487145424
Validation loss = 0.018509048968553543
Validation loss = 0.01852862350642681
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019999990239739418
Validation loss = 0.019143449142575264
Validation loss = 0.020399583503603935
Validation loss = 0.018800660967826843
Validation loss = 0.018931850790977478
Validation loss = 0.019331054762005806
Validation loss = 0.01872614398598671
Validation loss = 0.01800321228802204
Validation loss = 0.01891566812992096
Validation loss = 0.018556253984570503
Validation loss = 0.019436005502939224
Validation loss = 0.020959390327334404
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00112  |
| Iteration     | 70        |
| MaximumReturn | -0.000875 |
| MinimumReturn | -0.00159  |
| TotalSamples  | 119952    |
-----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017429055646061897
Validation loss = 0.018209079280495644
Validation loss = 0.01777997612953186
Validation loss = 0.017692383378744125
Validation loss = 0.01718214713037014
Validation loss = 0.018510732799768448
Validation loss = 0.018141653388738632
Validation loss = 0.017217746004462242
Validation loss = 0.01773839071393013
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02167201228439808
Validation loss = 0.01814025267958641
Validation loss = 0.018157361075282097
Validation loss = 0.017774002626538277
Validation loss = 0.0185739453881979
Validation loss = 0.01775907725095749
Validation loss = 0.017592377960681915
Validation loss = 0.018509553745388985
Validation loss = 0.017373165115714073
Validation loss = 0.017638228833675385
Validation loss = 0.01862475834786892
Validation loss = 0.01847412995994091
Validation loss = 0.01780153624713421
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017861831933259964
Validation loss = 0.01750645786523819
Validation loss = 0.018456626683473587
Validation loss = 0.019234629347920418
Validation loss = 0.018123511224985123
Validation loss = 0.018520932644605637
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01997283846139908
Validation loss = 0.01782449334859848
Validation loss = 0.01796032302081585
Validation loss = 0.019529463723301888
Validation loss = 0.01938261277973652
Validation loss = 0.018440475687384605
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01948828622698784
Validation loss = 0.018529137596488
Validation loss = 0.018471455201506615
Validation loss = 0.018990492448210716
Validation loss = 0.01869749277830124
Validation loss = 0.021119428798556328
Validation loss = 0.020127560943365097
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00106  |
| Iteration     | 71        |
| MaximumReturn | -0.000709 |
| MinimumReturn | -0.00147  |
| TotalSamples  | 121618    |
-----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018833119422197342
Validation loss = 0.01905187778174877
Validation loss = 0.017833678051829338
Validation loss = 0.019137699156999588
Validation loss = 0.0191293153911829
Validation loss = 0.017431026324629784
Validation loss = 0.018482785671949387
Validation loss = 0.01797090657055378
Validation loss = 0.017324335873126984
Validation loss = 0.01800825633108616
Validation loss = 0.018151992931962013
Validation loss = 0.018230659887194633
Validation loss = 0.018035242334008217
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019474131986498833
Validation loss = 0.02229720540344715
Validation loss = 0.01732625812292099
Validation loss = 0.01771891675889492
Validation loss = 0.01791127398610115
Validation loss = 0.018627868965268135
Validation loss = 0.017534540966153145
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01956414245069027
Validation loss = 0.018297506496310234
Validation loss = 0.02096669003367424
Validation loss = 0.01816316321492195
Validation loss = 0.018191298469901085
Validation loss = 0.018832603469491005
Validation loss = 0.01828097179532051
Validation loss = 0.018019985407590866
Validation loss = 0.018498748540878296
Validation loss = 0.018171392381191254
Validation loss = 0.01818622462451458
Validation loss = 0.018576420843601227
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02067491039633751
Validation loss = 0.018662454560399055
Validation loss = 0.017737161368131638
Validation loss = 0.020200256258249283
Validation loss = 0.018848421052098274
Validation loss = 0.020965030416846275
Validation loss = 0.019101649522781372
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018503356724977493
Validation loss = 0.019592873752117157
Validation loss = 0.018968384712934494
Validation loss = 0.019080154597759247
Validation loss = 0.019287561997771263
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0011   |
| Iteration     | 72        |
| MaximumReturn | -0.000793 |
| MinimumReturn | -0.00138  |
| TotalSamples  | 123284    |
-----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017911039292812347
Validation loss = 0.01830320805311203
Validation loss = 0.017340047284960747
Validation loss = 0.0179066751152277
Validation loss = 0.01941005513072014
Validation loss = 0.022234665229916573
Validation loss = 0.01960904523730278
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017911426723003387
Validation loss = 0.019008636474609375
Validation loss = 0.018086863681674004
Validation loss = 0.018372876569628716
Validation loss = 0.017586713656783104
Validation loss = 0.018244430422782898
Validation loss = 0.018634958192706108
Validation loss = 0.01795022562146187
Validation loss = 0.01843952015042305
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01817098632454872
Validation loss = 0.020992888137698174
Validation loss = 0.018017366528511047
Validation loss = 0.01817256398499012
Validation loss = 0.019732603803277016
Validation loss = 0.018016008660197258
Validation loss = 0.018057089298963547
Validation loss = 0.017782658338546753
Validation loss = 0.018126793205738068
Validation loss = 0.019182005897164345
Validation loss = 0.01793050207197666
Validation loss = 0.018251337110996246
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018987607210874557
Validation loss = 0.0188782699406147
Validation loss = 0.018990475684404373
Validation loss = 0.01870587468147278
Validation loss = 0.018603021278977394
Validation loss = 0.020118078216910362
Validation loss = 0.019047971814870834
Validation loss = 0.019733039662241936
Validation loss = 0.017453787848353386
Validation loss = 0.018139030784368515
Validation loss = 0.01894288882613182
Validation loss = 0.019065143540501595
Validation loss = 0.02043299376964569
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019129913300275803
Validation loss = 0.019408468157052994
Validation loss = 0.019167687743902206
Validation loss = 0.018557140603661537
Validation loss = 0.02002519555389881
Validation loss = 0.018592873588204384
Validation loss = 0.018773557618260384
Validation loss = 0.01955852098762989
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00114 |
| Iteration     | 73       |
| MaximumReturn | -0.00077 |
| MinimumReturn | -0.00188 |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01938755251467228
Validation loss = 0.02005627006292343
Validation loss = 0.0178380087018013
Validation loss = 0.018858803436160088
Validation loss = 0.018052630126476288
Validation loss = 0.018621761351823807
Validation loss = 0.01833515800535679
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017894748598337173
Validation loss = 0.01775968261063099
Validation loss = 0.01768701709806919
Validation loss = 0.018810756504535675
Validation loss = 0.017513100057840347
Validation loss = 0.01844959892332554
Validation loss = 0.021033525466918945
Validation loss = 0.01963028497993946
Validation loss = 0.017998212948441505
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018515978008508682
Validation loss = 0.01837850734591484
Validation loss = 0.01856260746717453
Validation loss = 0.02176911011338234
Validation loss = 0.01773337461054325
Validation loss = 0.018283149227499962
Validation loss = 0.020105089992284775
Validation loss = 0.018082521855831146
Validation loss = 0.018172865733504295
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01970328763127327
Validation loss = 0.025987515226006508
Validation loss = 0.017995085567235947
Validation loss = 0.01855192519724369
Validation loss = 0.018956590443849564
Validation loss = 0.01875426433980465
Validation loss = 0.018740788102149963
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01884574070572853
Validation loss = 0.020096907392144203
Validation loss = 0.01948997564613819
Validation loss = 0.018369989469647408
Validation loss = 0.02168388105928898
Validation loss = 0.021545957773923874
Validation loss = 0.018790915608406067
Validation loss = 0.018784044310450554
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00104  |
| Iteration     | 74        |
| MaximumReturn | -0.000725 |
| MinimumReturn | -0.00142  |
| TotalSamples  | 126616    |
-----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019277315586805344
Validation loss = 0.01772906444966793
Validation loss = 0.01783021353185177
Validation loss = 0.017812829464673996
Validation loss = 0.018898066133260727
Validation loss = 0.0179434884339571
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017890650779008865
Validation loss = 0.019100217148661613
Validation loss = 0.018376924097537994
Validation loss = 0.01797717623412609
Validation loss = 0.018501874059438705
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018081340938806534
Validation loss = 0.018998470157384872
Validation loss = 0.017429467290639877
Validation loss = 0.018015356734395027
Validation loss = 0.018739715218544006
Validation loss = 0.020687442272901535
Validation loss = 0.017700649797916412
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018206847831606865
Validation loss = 0.01856916770339012
Validation loss = 0.0186869278550148
Validation loss = 0.01831262744963169
Validation loss = 0.01884857937693596
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018561581149697304
Validation loss = 0.01921675354242325
Validation loss = 0.019209537655115128
Validation loss = 0.02012529969215393
Validation loss = 0.01874220184981823
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00111  |
| Iteration     | 75        |
| MaximumReturn | -0.000809 |
| MinimumReturn | -0.00203  |
| TotalSamples  | 128282    |
-----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018513115122914314
Validation loss = 0.01973101869225502
Validation loss = 0.01816575415432453
Validation loss = 0.018610142171382904
Validation loss = 0.01766565814614296
Validation loss = 0.018251214176416397
Validation loss = 0.018941650167107582
Validation loss = 0.01800202578306198
Validation loss = 0.019070297479629517
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020970290526747704
Validation loss = 0.01913907565176487
Validation loss = 0.01862785778939724
Validation loss = 0.018415240570902824
Validation loss = 0.01789843663573265
Validation loss = 0.020777219906449318
Validation loss = 0.018425649031996727
Validation loss = 0.01799311488866806
Validation loss = 0.019850771874189377
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01812065578997135
Validation loss = 0.018281154334545135
Validation loss = 0.01915835030376911
Validation loss = 0.017872720956802368
Validation loss = 0.01831229403614998
Validation loss = 0.01827995665371418
Validation loss = 0.018249453976750374
Validation loss = 0.01866167038679123
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01920430175960064
Validation loss = 0.02255595475435257
Validation loss = 0.0191025510430336
Validation loss = 0.018701255321502686
Validation loss = 0.018321558833122253
Validation loss = 0.019714975729584694
Validation loss = 0.018358709290623665
Validation loss = 0.01859758421778679
Validation loss = 0.020126836374402046
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018872657790780067
Validation loss = 0.019665459170937538
Validation loss = 0.018527228385210037
Validation loss = 0.01930391415953636
Validation loss = 0.018820593133568764
Validation loss = 0.018799612298607826
Validation loss = 0.018853066489100456
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0011   |
| Iteration     | 76        |
| MaximumReturn | -0.000817 |
| MinimumReturn | -0.00167  |
| TotalSamples  | 129948    |
-----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020335838198661804
Validation loss = 0.01831904985010624
Validation loss = 0.01830330677330494
Validation loss = 0.017855584621429443
Validation loss = 0.017908785492181778
Validation loss = 0.018159328028559685
Validation loss = 0.017733190208673477
Validation loss = 0.018383333459496498
Validation loss = 0.018197711557149887
Validation loss = 0.018637865781784058
Validation loss = 0.017896044999361038
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017622504383325577
Validation loss = 0.01766156405210495
Validation loss = 0.01830284669995308
Validation loss = 0.02012096717953682
Validation loss = 0.018290380015969276
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01929594948887825
Validation loss = 0.02154400758445263
Validation loss = 0.01893887110054493
Validation loss = 0.01833542063832283
Validation loss = 0.019212577491998672
Validation loss = 0.018703635782003403
Validation loss = 0.01856476627290249
Validation loss = 0.018639877438545227
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021678119897842407
Validation loss = 0.018625684082508087
Validation loss = 0.018747733905911446
Validation loss = 0.019001809880137444
Validation loss = 0.019993461668491364
Validation loss = 0.023737002164125443
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019569840282201767
Validation loss = 0.018724551424384117
Validation loss = 0.02132450044155121
Validation loss = 0.01935616135597229
Validation loss = 0.0197262205183506
Validation loss = 0.020016269758343697
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00107  |
| Iteration     | 77        |
| MaximumReturn | -0.000781 |
| MinimumReturn | -0.00152  |
| TotalSamples  | 131614    |
-----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01810133270919323
Validation loss = 0.018132252618670464
Validation loss = 0.018709499388933182
Validation loss = 0.018879296258091927
Validation loss = 0.019107578322291374
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018475325778126717
Validation loss = 0.018910564482212067
Validation loss = 0.017954468727111816
Validation loss = 0.018665149807929993
Validation loss = 0.01882585696876049
Validation loss = 0.01778021641075611
Validation loss = 0.017981605604290962
Validation loss = 0.018463212996721268
Validation loss = 0.01919710636138916
Validation loss = 0.01835959404706955
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019134489819407463
Validation loss = 0.023113001137971878
Validation loss = 0.01894305646419525
Validation loss = 0.01820891909301281
Validation loss = 0.01844651810824871
Validation loss = 0.01949033886194229
Validation loss = 0.01877828873693943
Validation loss = 0.018551411107182503
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02014351449906826
Validation loss = 0.018404645845294
Validation loss = 0.021952547132968903
Validation loss = 0.020245643332600594
Validation loss = 0.019045982509851456
Validation loss = 0.01930110901594162
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01912994123995304
Validation loss = 0.019539766013622284
Validation loss = 0.019891483709216118
Validation loss = 0.01891408860683441
Validation loss = 0.019719047471880913
Validation loss = 0.01911243237555027
Validation loss = 0.020427202805876732
Validation loss = 0.019256575033068657
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00102  |
| Iteration     | 78        |
| MaximumReturn | -0.000671 |
| MinimumReturn | -0.00158  |
| TotalSamples  | 133280    |
-----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018327679485082626
Validation loss = 0.018097102642059326
Validation loss = 0.017890997231006622
Validation loss = 0.018430670723319054
Validation loss = 0.017985887825489044
Validation loss = 0.018628554418683052
Validation loss = 0.01835457608103752
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01828957535326481
Validation loss = 0.018241554498672485
Validation loss = 0.018973458558321
Validation loss = 0.01807055063545704
Validation loss = 0.018189292401075363
Validation loss = 0.018755951896309853
Validation loss = 0.01891159825026989
Validation loss = 0.018122397363185883
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018272792920470238
Validation loss = 0.019171062856912613
Validation loss = 0.01833614893257618
Validation loss = 0.01908685639500618
Validation loss = 0.01890912652015686
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019104797393083572
Validation loss = 0.0215265192091465
Validation loss = 0.018690746277570724
Validation loss = 0.02062642201781273
Validation loss = 0.019416164606809616
Validation loss = 0.019193734973669052
Validation loss = 0.019391028210520744
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020360011607408524
Validation loss = 0.020891329273581505
Validation loss = 0.019612008705735207
Validation loss = 0.020044222474098206
Validation loss = 0.02002422884106636
Validation loss = 0.018820226192474365
Validation loss = 0.019455812871456146
Validation loss = 0.01961219124495983
Validation loss = 0.01870458573102951
Validation loss = 0.021186640486121178
Validation loss = 0.018837762996554375
Validation loss = 0.019268527626991272
Validation loss = 0.020959001034498215
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00109  |
| Iteration     | 79        |
| MaximumReturn | -0.000733 |
| MinimumReturn | -0.00192  |
| TotalSamples  | 134946    |
-----------------------------
