Logging to experiments/invertedPendulum/nov1/w350e3_seed2231
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7297577857971191
Validation loss = 0.3553062677383423
Validation loss = 0.33722352981567383
Validation loss = 0.30134907364845276
Validation loss = 0.2910212278366089
Validation loss = 0.26769089698791504
Validation loss = 0.26789090037345886
Validation loss = 0.24874365329742432
Validation loss = 0.23962987959384918
Validation loss = 0.23256461322307587
Validation loss = 0.21855507791042328
Validation loss = 0.2270888388156891
Validation loss = 0.22873316705226898
Validation loss = 0.19830094277858734
Validation loss = 0.2137814164161682
Validation loss = 0.2058703452348709
Validation loss = 0.24524399638175964
Validation loss = 0.22256897389888763
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7409034371376038
Validation loss = 0.35755857825279236
Validation loss = 0.3655088543891907
Validation loss = 0.3134356439113617
Validation loss = 0.28723177313804626
Validation loss = 0.2864062190055847
Validation loss = 0.28667744994163513
Validation loss = 0.2587026059627533
Validation loss = 0.24116772413253784
Validation loss = 0.238031804561615
Validation loss = 0.23234052956104279
Validation loss = 0.2139902412891388
Validation loss = 0.20800253748893738
Validation loss = 0.2117512971162796
Validation loss = 0.20920905470848083
Validation loss = 0.19815976917743683
Validation loss = 0.1872320920228958
Validation loss = 0.19193662703037262
Validation loss = 0.18810638785362244
Validation loss = 0.187391996383667
Validation loss = 0.16194383800029755
Validation loss = 0.17234238982200623
Validation loss = 0.16883690655231476
Validation loss = 0.1557285636663437
Validation loss = 0.16596373915672302
Validation loss = 0.152040496468544
Validation loss = 0.15622085332870483
Validation loss = 0.1526309847831726
Validation loss = 0.14867626130580902
Validation loss = 0.15088292956352234
Validation loss = 0.1591273993253708
Validation loss = 0.16020528972148895
Validation loss = 0.1590256541967392
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7470341324806213
Validation loss = 0.34823498129844666
Validation loss = 0.34190452098846436
Validation loss = 0.3196512460708618
Validation loss = 0.2917415499687195
Validation loss = 0.2745944559574127
Validation loss = 0.2596278786659241
Validation loss = 0.24890950322151184
Validation loss = 0.2320241630077362
Validation loss = 0.2337377667427063
Validation loss = 0.23450598120689392
Validation loss = 0.20108215510845184
Validation loss = 0.2318911850452423
Validation loss = 0.19084130227565765
Validation loss = 0.20548032224178314
Validation loss = 0.1932767927646637
Validation loss = 0.18962311744689941
Validation loss = 0.16552990674972534
Validation loss = 0.18191008269786835
Validation loss = 0.17470592260360718
Validation loss = 0.19504179060459137
Validation loss = 0.17092937231063843
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7309219241142273
Validation loss = 0.37618035078048706
Validation loss = 0.3465835750102997
Validation loss = 0.30601683259010315
Validation loss = 0.3012954890727997
Validation loss = 0.27816978096961975
Validation loss = 0.26798033714294434
Validation loss = 0.25204116106033325
Validation loss = 0.26073116064071655
Validation loss = 0.22530977427959442
Validation loss = 0.22986091673374176
Validation loss = 0.2029287964105606
Validation loss = 0.207039937376976
Validation loss = 0.2307669222354889
Validation loss = 0.22375191748142242
Validation loss = 0.2030954211950302
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7416689991950989
Validation loss = 0.3390973210334778
Validation loss = 0.34026309847831726
Validation loss = 0.3073101043701172
Validation loss = 0.2918955385684967
Validation loss = 0.27678629755973816
Validation loss = 0.26106730103492737
Validation loss = 0.2588766813278198
Validation loss = 0.23747800290584564
Validation loss = 0.22230786085128784
Validation loss = 0.2403876781463623
Validation loss = 0.24282336235046387
Validation loss = 0.23925741016864777
Validation loss = 0.22881373763084412
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.8    |
| Iteration     | 0        |
| MaximumReturn | -0.0337  |
| MinimumReturn | -72.8    |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.22565551102161407
Validation loss = 0.16469663381576538
Validation loss = 0.15659864246845245
Validation loss = 0.14622215926647186
Validation loss = 0.14797455072402954
Validation loss = 0.13082462549209595
Validation loss = 0.13746659457683563
Validation loss = 0.13167276978492737
Validation loss = 0.12008169293403625
Validation loss = 0.11595989763736725
Validation loss = 0.10410501807928085
Validation loss = 0.12586304545402527
Validation loss = 0.1072390079498291
Validation loss = 0.10648638755083084
Validation loss = 0.10163967311382294
Validation loss = 0.10176201164722443
Validation loss = 0.11918600648641586
Validation loss = 0.08227526396512985
Validation loss = 0.1051168218255043
Validation loss = 0.09739571064710617
Validation loss = 0.08620549738407135
Validation loss = 0.09643160551786423
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3117992579936981
Validation loss = 0.19029568135738373
Validation loss = 0.16247287392616272
Validation loss = 0.14939484000205994
Validation loss = 0.1422407031059265
Validation loss = 0.13878583908081055
Validation loss = 0.15010137856006622
Validation loss = 0.12699370086193085
Validation loss = 0.11935265362262726
Validation loss = 0.12016750872135162
Validation loss = 0.1270630955696106
Validation loss = 0.11080001294612885
Validation loss = 0.11832518130540848
Validation loss = 0.11641447246074677
Validation loss = 0.10142789781093597
Validation loss = 0.1078062579035759
Validation loss = 0.09168148785829544
Validation loss = 0.08903434872627258
Validation loss = 0.08911339938640594
Validation loss = 0.08214433491230011
Validation loss = 0.08291882276535034
Validation loss = 0.09181346744298935
Validation loss = 0.09173540025949478
Validation loss = 0.0824945718050003
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2943066358566284
Validation loss = 0.1764720380306244
Validation loss = 0.1429593861103058
Validation loss = 0.134272038936615
Validation loss = 0.12686726450920105
Validation loss = 0.1223691999912262
Validation loss = 0.1344335675239563
Validation loss = 0.1153702661395073
Validation loss = 0.11660632491111755
Validation loss = 0.11176115274429321
Validation loss = 0.11347493529319763
Validation loss = 0.10910981893539429
Validation loss = 0.11868882924318314
Validation loss = 0.09442760795354843
Validation loss = 0.09597926586866379
Validation loss = 0.09899315237998962
Validation loss = 0.10307108610868454
Validation loss = 0.095116525888443
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2139875739812851
Validation loss = 0.16368520259857178
Validation loss = 0.1433917135000229
Validation loss = 0.14941954612731934
Validation loss = 0.1528807431459427
Validation loss = 0.15504172444343567
Validation loss = 0.12207920104265213
Validation loss = 0.14982450008392334
Validation loss = 0.130543053150177
Validation loss = 0.1173098236322403
Validation loss = 0.1214836910367012
Validation loss = 0.10744790732860565
Validation loss = 0.12304384261369705
Validation loss = 0.11397115886211395
Validation loss = 0.10693135112524033
Validation loss = 0.1064646914601326
Validation loss = 0.1031758114695549
Validation loss = 0.09701602160930634
Validation loss = 0.09316672384738922
Validation loss = 0.08655280619859695
Validation loss = 0.09929046034812927
Validation loss = 0.15681281685829163
Validation loss = 0.0979740172624588
Validation loss = 0.08970567584037781
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.24557095766067505
Validation loss = 0.1724756509065628
Validation loss = 0.15563465654850006
Validation loss = 0.15586432814598083
Validation loss = 0.14686492085456848
Validation loss = 0.13479012250900269
Validation loss = 0.1379925161600113
Validation loss = 0.13946595788002014
Validation loss = 0.1263888031244278
Validation loss = 0.13891062140464783
Validation loss = 0.12790821492671967
Validation loss = 0.13849002122879028
Validation loss = 0.13796421885490417
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.759   |
| Iteration     | 1        |
| MaximumReturn | -0.0228  |
| MinimumReturn | -12.8    |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.18233639001846313
Validation loss = 0.11640636622905731
Validation loss = 0.09286931157112122
Validation loss = 0.08329784870147705
Validation loss = 0.08100278675556183
Validation loss = 0.07343793660402298
Validation loss = 0.08146405220031738
Validation loss = 0.0794391930103302
Validation loss = 0.07759259641170502
Validation loss = 0.08223672956228256
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.18944531679153442
Validation loss = 0.11029207706451416
Validation loss = 0.09591056406497955
Validation loss = 0.09224861860275269
Validation loss = 0.08055911213159561
Validation loss = 0.08215521275997162
Validation loss = 0.08327875286340714
Validation loss = 0.06796898692846298
Validation loss = 0.06818443536758423
Validation loss = 0.06652107834815979
Validation loss = 0.07006094604730606
Validation loss = 0.07781504839658737
Validation loss = 0.07698628306388855
Validation loss = 0.06836904585361481
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.18198376893997192
Validation loss = 0.11636082828044891
Validation loss = 0.0974583700299263
Validation loss = 0.10050347447395325
Validation loss = 0.08306540548801422
Validation loss = 0.08432920277118683
Validation loss = 0.08753422647714615
Validation loss = 0.07813195139169693
Validation loss = 0.07953842729330063
Validation loss = 0.0703616589307785
Validation loss = 0.08472739160060883
Validation loss = 0.06998921930789948
Validation loss = 0.07796887308359146
Validation loss = 0.07425769418478012
Validation loss = 0.07911619544029236
Validation loss = 0.07048717141151428
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1842462122440338
Validation loss = 0.10567142069339752
Validation loss = 0.08991990983486176
Validation loss = 0.0840248316526413
Validation loss = 0.07938208431005478
Validation loss = 0.07225023955106735
Validation loss = 0.07377646863460541
Validation loss = 0.07703536003828049
Validation loss = 0.07914219796657562
Validation loss = 0.07362395524978638
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.15346980094909668
Validation loss = 0.10326516628265381
Validation loss = 0.09527934342622757
Validation loss = 0.08658386766910553
Validation loss = 0.10151772201061249
Validation loss = 0.0779152661561966
Validation loss = 0.08296555280685425
Validation loss = 0.08193947374820709
Validation loss = 0.08066295832395554
Validation loss = 0.08901836723089218
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.011363636363636364
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011235955056179775
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.011111111111111112
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01098901098901099
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010869565217391304
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010752688172043012
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010638297872340425
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010526315789473684
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010416666666666666
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010309278350515464
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01020408163265306
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.010101010101010102
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.03
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.24    |
| Iteration     | 2        |
| MaximumReturn | -0.0661  |
| MinimumReturn | -27.8    |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08645708113908768
Validation loss = 0.06987034529447556
Validation loss = 0.06726596504449844
Validation loss = 0.06881304830312729
Validation loss = 0.06997652351856232
Validation loss = 0.07012905925512314
Validation loss = 0.06257781386375427
Validation loss = 0.06512371450662613
Validation loss = 0.06712917238473892
Validation loss = 0.058669429272413254
Validation loss = 0.059981439262628555
Validation loss = 0.06389457732439041
Validation loss = 0.06433556228876114
Validation loss = 0.06050233915448189
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08519826084375381
Validation loss = 0.06652636080980301
Validation loss = 0.06219087168574333
Validation loss = 0.05425022915005684
Validation loss = 0.05884777009487152
Validation loss = 0.0588761530816555
Validation loss = 0.06808203458786011
Validation loss = 0.05834914371371269
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09110689908266068
Validation loss = 0.0799274668097496
Validation loss = 0.06391391903162003
Validation loss = 0.06446357816457748
Validation loss = 0.062008995562791824
Validation loss = 0.05959286168217659
Validation loss = 0.05778945982456207
Validation loss = 0.06275733560323715
Validation loss = 0.06178176403045654
Validation loss = 0.05990668013691902
Validation loss = 0.07038614898920059
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07494872808456421
Validation loss = 0.062033575028181076
Validation loss = 0.06596213579177856
Validation loss = 0.05857723578810692
Validation loss = 0.06514402478933334
Validation loss = 0.06097917631268501
Validation loss = 0.06365777552127838
Validation loss = 0.07269861549139023
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09140186756849289
Validation loss = 0.08369643241167068
Validation loss = 0.07160069793462753
Validation loss = 0.06836114078760147
Validation loss = 0.07238393276929855
Validation loss = 0.07178064435720444
Validation loss = 0.07061557471752167
Validation loss = 0.07257594913244247
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0297029702970297
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.029411764705882353
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02912621359223301
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.028846153846153848
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02857142857142857
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02830188679245283
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.028037383177570093
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.027777777777777776
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.027522935779816515
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02727272727272727
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02702702702702703
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.026785714285714284
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02654867256637168
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02631578947368421
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02608695652173913
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02586206896551724
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02564102564102564
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.025423728813559324
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.025210084033613446
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.025
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024793388429752067
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02459016393442623
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024390243902439025
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024193548387096774
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0179  |
| Iteration     | 3        |
| MaximumReturn | -0.0118  |
| MinimumReturn | -0.0256  |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07255839556455612
Validation loss = 0.07729373872280121
Validation loss = 0.07775729894638062
Validation loss = 0.07010248303413391
Validation loss = 0.06653924286365509
Validation loss = 0.06894789636135101
Validation loss = 0.06931512802839279
Validation loss = 0.06566484272480011
Validation loss = 0.06567566096782684
Validation loss = 0.06437532603740692
Validation loss = 0.0695757120847702
Validation loss = 0.07463555783033371
Validation loss = 0.07635535299777985
Validation loss = 0.06354963779449463
Validation loss = 0.06388053297996521
Validation loss = 0.06311778724193573
Validation loss = 0.062421225011348724
Validation loss = 0.060772091150283813
Validation loss = 0.06103132665157318
Validation loss = 0.06895030289888382
Validation loss = 0.06564313918352127
Validation loss = 0.06209174543619156
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06260514259338379
Validation loss = 0.06208590418100357
Validation loss = 0.06383632868528366
Validation loss = 0.06959282606840134
Validation loss = 0.05973412096500397
Validation loss = 0.05845439061522484
Validation loss = 0.06087252125144005
Validation loss = 0.05731319636106491
Validation loss = 0.06013078987598419
Validation loss = 0.06789085268974304
Validation loss = 0.06205818057060242
Validation loss = 0.06046932190656662
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06974959373474121
Validation loss = 0.07156345248222351
Validation loss = 0.07023527473211288
Validation loss = 0.06843029707670212
Validation loss = 0.06631135195493698
Validation loss = 0.07057849317789078
Validation loss = 0.06576096266508102
Validation loss = 0.06786675751209259
Validation loss = 0.07668508589267731
Validation loss = 0.0660250261425972
Validation loss = 0.06946313381195068
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0684993639588356
Validation loss = 0.0676853284239769
Validation loss = 0.06450974196195602
Validation loss = 0.07104016095399857
Validation loss = 0.08031594753265381
Validation loss = 0.06927891820669174
Validation loss = 0.06721174716949463
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0729491263628006
Validation loss = 0.0723796933889389
Validation loss = 0.07008592039346695
Validation loss = 0.07424072176218033
Validation loss = 0.06958577036857605
Validation loss = 0.06603814661502838
Validation loss = 0.07800982892513275
Validation loss = 0.06990207731723785
Validation loss = 0.07024960964918137
Validation loss = 0.06437408179044724
Validation loss = 0.07750692963600159
Validation loss = 0.06832126528024673
Validation loss = 0.06882525980472565
Validation loss = 0.07220496237277985
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023809523809523808
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023622047244094488
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0234375
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023255813953488372
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023076923076923078
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022900763358778626
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022727272727272728
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022556390977443608
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022388059701492536
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022222222222222223
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022058823529411766
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.021897810218978103
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.021739130434782608
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02158273381294964
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02142857142857143
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02127659574468085
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02112676056338028
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02097902097902098
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.020833333333333332
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.020689655172413793
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02054794520547945
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02040816326530612
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02027027027027027
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.020134228187919462
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00428 |
| Iteration     | 4        |
| MaximumReturn | -0.00303 |
| MinimumReturn | -0.00538 |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05670700594782829
Validation loss = 0.05477680638432503
Validation loss = 0.05814151093363762
Validation loss = 0.05600907281041145
Validation loss = 0.05725310370326042
Validation loss = 0.057174719870090485
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0659068301320076
Validation loss = 0.05590425059199333
Validation loss = 0.061122387647628784
Validation loss = 0.056361157447099686
Validation loss = 0.048396967351436615
Validation loss = 0.052972663193941116
Validation loss = 0.05341382697224617
Validation loss = 0.05357098579406738
Validation loss = 0.048815853893756866
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06427953392267227
Validation loss = 0.060345448553562164
Validation loss = 0.05796869471669197
Validation loss = 0.059232693165540695
Validation loss = 0.06543513387441635
Validation loss = 0.058185916393995285
Validation loss = 0.060068726539611816
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05940815061330795
Validation loss = 0.06036344915628433
Validation loss = 0.055806033313274384
Validation loss = 0.06207694485783577
Validation loss = 0.0684904083609581
Validation loss = 0.05702042579650879
Validation loss = 0.05712991952896118
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05824791640043259
Validation loss = 0.06069788336753845
Validation loss = 0.06336577236652374
Validation loss = 0.06228896230459213
Validation loss = 0.0618191659450531
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019867549668874173
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019736842105263157
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0196078431372549
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01948051948051948
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01935483870967742
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019230769230769232
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01910828025477707
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0189873417721519
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018867924528301886
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01875
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018633540372670808
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018518518518518517
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018404907975460124
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018292682926829267
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01818181818181818
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018072289156626505
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017964071856287425
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017857142857142856
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01775147928994083
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01764705882352941
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017543859649122806
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01744186046511628
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017341040462427744
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017241379310344827
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017142857142857144
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0212  |
| Iteration     | 5        |
| MaximumReturn | -0.0135  |
| MinimumReturn | -0.0411  |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05511146038770676
Validation loss = 0.04749319702386856
Validation loss = 0.05887351557612419
Validation loss = 0.052628450095653534
Validation loss = 0.051144469529390335
Validation loss = 0.05014413595199585
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.055852364748716354
Validation loss = 0.04549859091639519
Validation loss = 0.048317402601242065
Validation loss = 0.052360646426677704
Validation loss = 0.04671873897314072
Validation loss = 0.048236504197120667
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06355514377355576
Validation loss = 0.0566624216735363
Validation loss = 0.05107604339718819
Validation loss = 0.04963906854391098
Validation loss = 0.04995785281062126
Validation loss = 0.0522739514708519
Validation loss = 0.05090496689081192
Validation loss = 0.051354534924030304
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05828012153506279
Validation loss = 0.05215025693178177
Validation loss = 0.05644650384783745
Validation loss = 0.054533522576093674
Validation loss = 0.05221487954258919
Validation loss = 0.05468006059527397
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05396146699786186
Validation loss = 0.052344340831041336
Validation loss = 0.05381008982658386
Validation loss = 0.05161849409341812
Validation loss = 0.05140671133995056
Validation loss = 0.05217108875513077
Validation loss = 0.05253916233778
Validation loss = 0.04914924129843712
Validation loss = 0.0543072447180748
Validation loss = 0.056609343737363815
Validation loss = 0.051809556782245636
Validation loss = 0.05577123910188675
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.017045454545454544
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01694915254237288
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016853932584269662
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01675977653631285
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016666666666666666
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016574585635359115
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016483516483516484
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01639344262295082
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016304347826086956
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016216216216216217
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016129032258064516
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016042780748663103
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015957446808510637
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015873015873015872
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015789473684210527
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015706806282722512
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015625
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015544041450777202
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015463917525773196
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015384615384615385
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015306122448979591
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015228426395939087
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015151515151515152
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01507537688442211
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0585  |
| Iteration     | 6        |
| MaximumReturn | -0.0343  |
| MinimumReturn | -0.117   |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04645194485783577
Validation loss = 0.04306477680802345
Validation loss = 0.04345284774899483
Validation loss = 0.045406389981508255
Validation loss = 0.04150771722197533
Validation loss = 0.04029993340373039
Validation loss = 0.04130610078573227
Validation loss = 0.03960169106721878
Validation loss = 0.04349815472960472
Validation loss = 0.04181500896811485
Validation loss = 0.04409024119377136
Validation loss = 0.04637309908866882
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04453031346201897
Validation loss = 0.037181705236434937
Validation loss = 0.04049438238143921
Validation loss = 0.04229695722460747
Validation loss = 0.03963545337319374
Validation loss = 0.043244779109954834
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.049503978341817856
Validation loss = 0.04423828795552254
Validation loss = 0.04853862151503563
Validation loss = 0.04790051653981209
Validation loss = 0.05433182418346405
Validation loss = 0.04206113889813423
Validation loss = 0.054450687021017075
Validation loss = 0.042759764939546585
Validation loss = 0.0447782538831234
Validation loss = 0.042126238346099854
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04528301954269409
Validation loss = 0.042859096080064774
Validation loss = 0.04425733909010887
Validation loss = 0.047191500663757324
Validation loss = 0.048118945211172104
Validation loss = 0.04641084000468254
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04379643127322197
Validation loss = 0.04583628475666046
Validation loss = 0.041795264929533005
Validation loss = 0.04695923998951912
Validation loss = 0.04395224153995514
Validation loss = 0.04502025619149208
Validation loss = 0.03879178687930107
Validation loss = 0.03723565489053726
Validation loss = 0.04136950150132179
Validation loss = 0.04314443841576576
Validation loss = 0.041101325303316116
Validation loss = 0.041110049933195114
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014925373134328358
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01485148514851485
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014778325123152709
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014705882352941176
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014634146341463415
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014563106796116505
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014492753623188406
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014423076923076924
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014354066985645933
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014285714285714285
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014218009478672985
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014150943396226415
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014084507042253521
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014018691588785047
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013953488372093023
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013888888888888888
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013824884792626729
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013761467889908258
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0136986301369863
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013636363636363636
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013574660633484163
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013513513513513514
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013452914798206279
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013392857142857142
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00622 |
| Iteration     | 7        |
| MaximumReturn | -0.00442 |
| MinimumReturn | -0.00897 |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04153035208582878
Validation loss = 0.03919085115194321
Validation loss = 0.03966512531042099
Validation loss = 0.03966931253671646
Validation loss = 0.03765347972512245
Validation loss = 0.03766915574669838
Validation loss = 0.04366811737418175
Validation loss = 0.04335319623351097
Validation loss = 0.04432747885584831
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.042854685336351395
Validation loss = 0.037945833057165146
Validation loss = 0.03954324871301651
Validation loss = 0.03391057625412941
Validation loss = 0.044798482209444046
Validation loss = 0.04471748694777489
Validation loss = 0.039230283349752426
Validation loss = 0.038563311100006104
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.041832804679870605
Validation loss = 0.03773796185851097
Validation loss = 0.03807209059596062
Validation loss = 0.03483080863952637
Validation loss = 0.040209319442510605
Validation loss = 0.04199507087469101
Validation loss = 0.043659310787916183
Validation loss = 0.03712887316942215
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.040112484246492386
Validation loss = 0.03718576207756996
Validation loss = 0.037368323653936386
Validation loss = 0.038931719958782196
Validation loss = 0.04523969069123268
Validation loss = 0.03696485236287117
Validation loss = 0.0481366291642189
Validation loss = 0.03930390253663063
Validation loss = 0.038240235298871994
Validation loss = 0.043481554836034775
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03920978680253029
Validation loss = 0.04280689358711243
Validation loss = 0.04130636528134346
Validation loss = 0.040770042687654495
Validation loss = 0.036824099719524384
Validation loss = 0.03886791318655014
Validation loss = 0.03745654970407486
Validation loss = 0.047431815415620804
Validation loss = 0.04481562227010727
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.022123893805309734
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.03524229074889868
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.039473684210526314
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.039301310043668124
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.043478260869565216
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.05627705627705628
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.0603448275862069
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.07296137339055794
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.07692307692307693
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08085106382978724
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.11016949152542373
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.12658227848101267
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12605042016806722
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12552301255230125
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.1375
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.14937759336099585
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.1652892561983471
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16872427983539096
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.18442622950819673
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18775510204081633
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.2032520325203252
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.22267206477732793
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.24193548387096775
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.26506024096385544
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.276
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -62.5    |
| Iteration     | 8        |
| MaximumReturn | -24.9    |
| MinimumReturn | -103     |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.053509995341300964
Validation loss = 0.04488223418593407
Validation loss = 0.038400422781705856
Validation loss = 0.039749544113874435
Validation loss = 0.03744646534323692
Validation loss = 0.035861581563949585
Validation loss = 0.03785456717014313
Validation loss = 0.03693649172782898
Validation loss = 0.03348316624760628
Validation loss = 0.0353250578045845
Validation loss = 0.0313212089240551
Validation loss = 0.03271191939711571
Validation loss = 0.029912585392594337
Validation loss = 0.03122679702937603
Validation loss = 0.030161593109369278
Validation loss = 0.02917284145951271
Validation loss = 0.03723158687353134
Validation loss = 0.03201846033334732
Validation loss = 0.02869398519396782
Validation loss = 0.0294969342648983
Validation loss = 0.03354257345199585
Validation loss = 0.03182658180594444
Validation loss = 0.0285630039870739
Validation loss = 0.03842299431562424
Validation loss = 0.03142189234495163
Validation loss = 0.033051133155822754
Validation loss = 0.029894238337874413
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07078877091407776
Validation loss = 0.053255319595336914
Validation loss = 0.03476114571094513
Validation loss = 0.03539252281188965
Validation loss = 0.034097958356142044
Validation loss = 0.034260135143995285
Validation loss = 0.03684735298156738
Validation loss = 0.03983873501420021
Validation loss = 0.034089624881744385
Validation loss = 0.03150152415037155
Validation loss = 0.029853172600269318
Validation loss = 0.038151584565639496
Validation loss = 0.029858719557523727
Validation loss = 0.028589589521288872
Validation loss = 0.02892429381608963
Validation loss = 0.030709929764270782
Validation loss = 0.03514140471816063
Validation loss = 0.03185137361288071
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.050557319074869156
Validation loss = 0.039766788482666016
Validation loss = 0.043764933943748474
Validation loss = 0.038232095539569855
Validation loss = 0.04033457487821579
Validation loss = 0.03383674845099449
Validation loss = 0.03187721222639084
Validation loss = 0.03754555433988571
Validation loss = 0.03820357471704483
Validation loss = 0.03583690896630287
Validation loss = 0.03562002629041672
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06427544355392456
Validation loss = 0.043790899217128754
Validation loss = 0.03926721215248108
Validation loss = 0.038516830652952194
Validation loss = 0.03709687292575836
Validation loss = 0.035658158361911774
Validation loss = 0.031558677554130554
Validation loss = 0.03967636823654175
Validation loss = 0.03685835748910904
Validation loss = 0.032647766172885895
Validation loss = 0.03498826175928116
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.053450629115104675
Validation loss = 0.04303000867366791
Validation loss = 0.037121810019016266
Validation loss = 0.03520743548870087
Validation loss = 0.04088709503412247
Validation loss = 0.04119550809264183
Validation loss = 0.03392864763736725
Validation loss = 0.03253238648176193
Validation loss = 0.035541921854019165
Validation loss = 0.03493621572852135
Validation loss = 0.03060409612953663
Validation loss = 0.03328632563352585
Validation loss = 0.03064972534775734
Validation loss = 0.030693039298057556
Validation loss = 0.029354114085435867
Validation loss = 0.03149087354540825
Validation loss = 0.02873155102133751
Validation loss = 0.029558788985013962
Validation loss = 0.03745634853839874
Validation loss = 0.03282591700553894
Validation loss = 0.030399678274989128
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2788844621513944
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.28174603174603174
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.28063241106719367
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2795275590551181
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2784313725490196
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27734375
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27626459143968873
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2751937984496124
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.277992277992278
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27692307692307694
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27586206896551724
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2786259541984733
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27756653992395436
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2765151515151515
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27547169811320754
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2744360902255639
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.27715355805243447
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27611940298507465
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.275092936802974
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2740740740740741
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2730627306273063
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2757352941176471
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27472527472527475
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2737226277372263
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2727272727272727
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.88    |
| Iteration     | 9        |
| MaximumReturn | -0.0386  |
| MinimumReturn | -44.4    |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.032506201416254044
Validation loss = 0.03654978424310684
Validation loss = 0.0345023050904274
Validation loss = 0.03735531121492386
Validation loss = 0.03377422317862511
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.034734755754470825
Validation loss = 0.030111657455563545
Validation loss = 0.03101513534784317
Validation loss = 0.03363644704222679
Validation loss = 0.03175748139619827
Validation loss = 0.03456927835941315
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04242315515875816
Validation loss = 0.03865961730480194
Validation loss = 0.033403798937797546
Validation loss = 0.034439101815223694
Validation loss = 0.03653941676020622
Validation loss = 0.03572840988636017
Validation loss = 0.03319608420133591
Validation loss = 0.03346383571624756
Validation loss = 0.035720691084861755
Validation loss = 0.031944599002599716
Validation loss = 0.03483936935663223
Validation loss = 0.03451934829354286
Validation loss = 0.03419380635023117
Validation loss = 0.03114967606961727
Validation loss = 0.04192603752017021
Validation loss = 0.03600316867232323
Validation loss = 0.03549087792634964
Validation loss = 0.033273592591285706
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.037352561950683594
Validation loss = 0.03605698049068451
Validation loss = 0.03413192555308342
Validation loss = 0.03315211832523346
Validation loss = 0.033126410096883774
Validation loss = 0.03820990398526192
Validation loss = 0.035080719739198685
Validation loss = 0.041465874761343
Validation loss = 0.040018536150455475
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03254300355911255
Validation loss = 0.03242386505007744
Validation loss = 0.038386955857276917
Validation loss = 0.03594158589839935
Validation loss = 0.031941261142492294
Validation loss = 0.03030134178698063
Validation loss = 0.030676981434226036
Validation loss = 0.033293671905994415
Validation loss = 0.03217856213450432
Validation loss = 0.031722817569971085
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2717391304347826
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27075812274368233
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2697841726618705
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26881720430107525
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26785714285714285
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2669039145907473
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26595744680851063
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26501766784452296
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2640845070422535
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2631578947368421
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26223776223776224
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2613240418118467
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2604166666666667
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25951557093425603
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25862068965517243
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25773195876288657
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2568493150684932
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25597269624573377
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25510204081632654
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2542372881355932
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2533783783783784
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25252525252525254
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2516778523489933
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2508361204013378
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0172  |
| Iteration     | 10       |
| MaximumReturn | -0.0124  |
| MinimumReturn | -0.0226  |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03695829585194588
Validation loss = 0.03232327476143837
Validation loss = 0.03166797012090683
Validation loss = 0.03029554709792137
Validation loss = 0.03448403999209404
Validation loss = 0.030961086973547935
Validation loss = 0.033313289284706116
Validation loss = 0.031147155910730362
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.033923108130693436
Validation loss = 0.031068000942468643
Validation loss = 0.030910277739167213
Validation loss = 0.031829409301280975
Validation loss = 0.02966928854584694
Validation loss = 0.02901546098291874
Validation loss = 0.0320420004427433
Validation loss = 0.03216971829533577
Validation loss = 0.030181962996721268
Validation loss = 0.03233575075864792
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03178771585226059
Validation loss = 0.030501332134008408
Validation loss = 0.0319523960351944
Validation loss = 0.029638726264238358
Validation loss = 0.03088662400841713
Validation loss = 0.031890299171209335
Validation loss = 0.03021695837378502
Validation loss = 0.030424777418375015
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03663690388202667
Validation loss = 0.037731628865003586
Validation loss = 0.037037335336208344
Validation loss = 0.03354751691222191
Validation loss = 0.029782582074403763
Validation loss = 0.03024851717054844
Validation loss = 0.03224775940179825
Validation loss = 0.030199084430933
Validation loss = 0.03090279921889305
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.032805126160383224
Validation loss = 0.03135453909635544
Validation loss = 0.0339161641895771
Validation loss = 0.029083525761961937
Validation loss = 0.028552580624818802
Validation loss = 0.030158236622810364
Validation loss = 0.03165680170059204
Validation loss = 0.03284139186143875
Validation loss = 0.03388623148202896
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24916943521594684
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24834437086092714
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24752475247524752
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24671052631578946
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2459016393442623
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24509803921568626
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24429967426710097
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2435064935064935
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24271844660194175
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24193548387096775
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24115755627009647
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2403846153846154
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23961661341853036
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23885350318471338
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23809523809523808
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23734177215189872
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23659305993690852
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2358490566037736
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23510971786833856
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.234375
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2336448598130841
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2329192546583851
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23219814241486067
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23148148148148148
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23076923076923078
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0525  |
| Iteration     | 11       |
| MaximumReturn | -0.0361  |
| MinimumReturn | -0.0792  |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03265174478292465
Validation loss = 0.03645196184515953
Validation loss = 0.032275352627038956
Validation loss = 0.03164732828736305
Validation loss = 0.034791797399520874
Validation loss = 0.031724315136671066
Validation loss = 0.028933942317962646
Validation loss = 0.03185846656560898
Validation loss = 0.02920307219028473
Validation loss = 0.030075978487730026
Validation loss = 0.03233712166547775
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03161546587944031
Validation loss = 0.05451660230755806
Validation loss = 0.03217947110533714
Validation loss = 0.03612026944756508
Validation loss = 0.030663376674056053
Validation loss = 0.036603983491659164
Validation loss = 0.028166836127638817
Validation loss = 0.03337489813566208
Validation loss = 0.029511306434869766
Validation loss = 0.029510561376810074
Validation loss = 0.02860790491104126
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03353724628686905
Validation loss = 0.03961791470646858
Validation loss = 0.03941741958260536
Validation loss = 0.037098478525877
Validation loss = 0.03365253657102585
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.032740019261837006
Validation loss = 0.03096255101263523
Validation loss = 0.03452112525701523
Validation loss = 0.03432341292500496
Validation loss = 0.03300639986991882
Validation loss = 0.03160945698618889
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03746422380208969
Validation loss = 0.031151464208960533
Validation loss = 0.027257734909653664
Validation loss = 0.028479833155870438
Validation loss = 0.028037821874022484
Validation loss = 0.03363949805498123
Validation loss = 0.030796747654676437
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23006134969325154
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22935779816513763
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22865853658536586
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22796352583586627
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22727272727272727
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22658610271903323
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22590361445783133
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22522522522522523
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2245508982035928
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22388059701492538
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22321428571428573
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22255192878338279
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22189349112426035
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22123893805309736
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22058823529411764
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21994134897360704
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21929824561403508
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21865889212827988
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2180232558139535
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21739130434782608
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21676300578034682
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21613832853025935
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21551724137931033
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2148997134670487
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21428571428571427
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00306 |
| Iteration     | 12       |
| MaximumReturn | -0.00214 |
| MinimumReturn | -0.00482 |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.033907968550920486
Validation loss = 0.029185758903622627
Validation loss = 0.032154589891433716
Validation loss = 0.029408644884824753
Validation loss = 0.030911386013031006
Validation loss = 0.028906196355819702
Validation loss = 0.026995880529284477
Validation loss = 0.028343696147203445
Validation loss = 0.03242497146129608
Validation loss = 0.033866606652736664
Validation loss = 0.02737250365316868
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02979224920272827
Validation loss = 0.03582397848367691
Validation loss = 0.041924864053726196
Validation loss = 0.029681062325835228
Validation loss = 0.029480205848813057
Validation loss = 0.027068132534623146
Validation loss = 0.02836683951318264
Validation loss = 0.03731663525104523
Validation loss = 0.03317200019955635
Validation loss = 0.029424380511045456
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03249926492571831
Validation loss = 0.03305847942829132
Validation loss = 0.028361845761537552
Validation loss = 0.028819115832448006
Validation loss = 0.032421886920928955
Validation loss = 0.02907535433769226
Validation loss = 0.02908029593527317
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03276287019252777
Validation loss = 0.03913259878754616
Validation loss = 0.030356789007782936
Validation loss = 0.02996337227523327
Validation loss = 0.02744494006037712
Validation loss = 0.029208123683929443
Validation loss = 0.030206231400370598
Validation loss = 0.02981635183095932
Validation loss = 0.03307923674583435
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026840923354029655
Validation loss = 0.032006535679101944
Validation loss = 0.027082640677690506
Validation loss = 0.028238583356142044
Validation loss = 0.027452409267425537
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21367521367521367
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21306818181818182
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21246458923512748
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.211864406779661
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2112676056338028
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21067415730337077
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21008403361344538
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20949720670391062
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20891364902506965
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20833333333333334
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2077562326869806
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20718232044198895
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2066115702479339
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20604395604395603
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2054794520547945
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20491803278688525
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20435967302452315
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20380434782608695
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2032520325203252
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20270270270270271
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20215633423180593
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20161290322580644
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20107238605898123
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20053475935828877
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00176 |
| Iteration     | 13       |
| MaximumReturn | -0.00113 |
| MinimumReturn | -0.00362 |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03329954668879509
Validation loss = 0.02836870402097702
Validation loss = 0.02831835299730301
Validation loss = 0.028487056493759155
Validation loss = 0.027374016121029854
Validation loss = 0.026634523645043373
Validation loss = 0.030576879158616066
Validation loss = 0.029161794111132622
Validation loss = 0.0402754582464695
Validation loss = 0.03505402430891991
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.027311036363244057
Validation loss = 0.03221157193183899
Validation loss = 0.02800297923386097
Validation loss = 0.028680304065346718
Validation loss = 0.028021903708577156
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02728448249399662
Validation loss = 0.030319401994347572
Validation loss = 0.02695847861468792
Validation loss = 0.028910843655467033
Validation loss = 0.028620220720767975
Validation loss = 0.03722682595252991
Validation loss = 0.027712909504771233
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03363979607820511
Validation loss = 0.029950907453894615
Validation loss = 0.028888484463095665
Validation loss = 0.02913934923708439
Validation loss = 0.030045002698898315
Validation loss = 0.027713872492313385
Validation loss = 0.02734565921127796
Validation loss = 0.03129136934876442
Validation loss = 0.027221865952014923
Validation loss = 0.029694104567170143
Validation loss = 0.028015943244099617
Validation loss = 0.028606511652469635
Validation loss = 0.026915624737739563
Validation loss = 0.028063667938113213
Validation loss = 0.0277713555842638
Validation loss = 0.033478934317827225
Validation loss = 0.02681865356862545
Validation loss = 0.027406349778175354
Validation loss = 0.03207062929868698
Validation loss = 0.032190531492233276
Validation loss = 0.03517570346593857
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02642064541578293
Validation loss = 0.026585258543491364
Validation loss = 0.03043406270444393
Validation loss = 0.029796848073601723
Validation loss = 0.026741676032543182
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19946808510638298
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20159151193633953
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.20634920634920634
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.21108179419525067
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21052631578947367
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2178477690288714
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21727748691099477
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21671018276762402
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.22395833333333334
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22337662337662337
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22538860103626943
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2248062015503876
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22422680412371135
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2236503856041131
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2230769230769231
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22506393861892582
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22448979591836735
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.2340966921119593
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.233502538071066
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23544303797468355
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.24242424242424243
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24181360201511334
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24371859296482412
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24310776942355888
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2425
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.423   |
| Iteration     | 14       |
| MaximumReturn | -0.0353  |
| MinimumReturn | -7.98    |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.032611314207315445
Validation loss = 0.02742793969810009
Validation loss = 0.026768740266561508
Validation loss = 0.028099797666072845
Validation loss = 0.02911340817809105
Validation loss = 0.029003547504544258
Validation loss = 0.027045365422964096
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.028836799785494804
Validation loss = 0.026082603260874748
Validation loss = 0.028281638398766518
Validation loss = 0.027399199083447456
Validation loss = 0.027513932436704636
Validation loss = 0.02730626054108143
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.030260462313890457
Validation loss = 0.028197815641760826
Validation loss = 0.028799396008253098
Validation loss = 0.025319013744592667
Validation loss = 0.027371538802981377
Validation loss = 0.036583781242370605
Validation loss = 0.02626045234501362
Validation loss = 0.028304554522037506
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03617192059755325
Validation loss = 0.028980176895856857
Validation loss = 0.02758040279150009
Validation loss = 0.0272551067173481
Validation loss = 0.026586923748254776
Validation loss = 0.02653263881802559
Validation loss = 0.02574589103460312
Validation loss = 0.0290562491863966
Validation loss = 0.029442569240927696
Validation loss = 0.02806865982711315
Validation loss = 0.028599288314580917
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.028109710663557053
Validation loss = 0.02835475653409958
Validation loss = 0.028890566900372505
Validation loss = 0.02744738943874836
Validation loss = 0.03253680840134621
Validation loss = 0.026312772184610367
Validation loss = 0.02639058046042919
Validation loss = 0.031565453857183456
Validation loss = 0.028797268867492676
Validation loss = 0.0300542414188385
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24189526184538654
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24129353233830847
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24069478908188585
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2400990099009901
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23950617283950618
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23891625615763548
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23832923832923833
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23774509803921567
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2371638141809291
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23658536585365852
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2360097323600973
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2354368932038835
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23486682808716708
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23429951690821257
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23373493975903614
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23317307692307693
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23261390887290168
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23205741626794257
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2315035799522673
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23095238095238096
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23040380047505937
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22985781990521326
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2293144208037825
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22877358490566038
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22823529411764706
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00248 |
| Iteration     | 15       |
| MaximumReturn | -0.00175 |
| MinimumReturn | -0.00332 |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.027670836076140404
Validation loss = 0.0295468270778656
Validation loss = 0.03153834491968155
Validation loss = 0.03254636377096176
Validation loss = 0.030367771163582802
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0293434988707304
Validation loss = 0.030998429283499718
Validation loss = 0.02616680972278118
Validation loss = 0.02818818762898445
Validation loss = 0.026398727670311928
Validation loss = 0.0278206504881382
Validation loss = 0.02681174874305725
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02925477921962738
Validation loss = 0.03196512535214424
Validation loss = 0.03415191173553467
Validation loss = 0.02691224031150341
Validation loss = 0.03254319354891777
Validation loss = 0.029642147943377495
Validation loss = 0.02757894992828369
Validation loss = 0.030489226803183556
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027536572888493538
Validation loss = 0.026515964418649673
Validation loss = 0.027076279744505882
Validation loss = 0.028661036863923073
Validation loss = 0.026382314041256905
Validation loss = 0.0280524343252182
Validation loss = 0.031547848135232925
Validation loss = 0.02665526606142521
Validation loss = 0.028140829876065254
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.028981221839785576
Validation loss = 0.02916937880218029
Validation loss = 0.029343757778406143
Validation loss = 0.025803906843066216
Validation loss = 0.026609083637595177
Validation loss = 0.025800956413149834
Validation loss = 0.03073590062558651
Validation loss = 0.029437731951475143
Validation loss = 0.03228496387600899
Validation loss = 0.026961764320731163
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2347417840375587
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.234192037470726
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.24065420560747663
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2400932400932401
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24186046511627907
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24361948955916474
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.2523148148148148
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2517321016166282
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2511520737327189
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25057471264367814
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2562929061784897
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2579908675799087
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25740318906605925
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2636363636363636
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26303854875283444
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2647058823529412
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.26636568848758463
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26576576576576577
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2651685393258427
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.26905829596412556
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2684563758389262
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26785714285714285
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.27616926503340755
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2777777777777778
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.15    |
| Iteration     | 16       |
| MaximumReturn | -0.0158  |
| MinimumReturn | -24.6    |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029054217040538788
Validation loss = 0.026796739548444748
Validation loss = 0.026354987174272537
Validation loss = 0.027440138161182404
Validation loss = 0.02629600279033184
Validation loss = 0.026893990114331245
Validation loss = 0.026889009401202202
Validation loss = 0.028000030666589737
Validation loss = 0.029365815222263336
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0275400560349226
Validation loss = 0.02725604549050331
Validation loss = 0.027795957401394844
Validation loss = 0.028872570022940636
Validation loss = 0.02722407691180706
Validation loss = 0.035069506615400314
Validation loss = 0.025703294202685356
Validation loss = 0.028264474123716354
Validation loss = 0.029172642156481743
Validation loss = 0.02773236483335495
Validation loss = 0.028308052569627762
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.032291874289512634
Validation loss = 0.028693072497844696
Validation loss = 0.027802521362900734
Validation loss = 0.031111879274249077
Validation loss = 0.02899700403213501
Validation loss = 0.02658298797905445
Validation loss = 0.02905745431780815
Validation loss = 0.027659524232149124
Validation loss = 0.027713190764188766
Validation loss = 0.03419797495007515
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.029117252677679062
Validation loss = 0.02866499312222004
Validation loss = 0.02759236842393875
Validation loss = 0.02666528895497322
Validation loss = 0.028312785550951958
Validation loss = 0.03154383972287178
Validation loss = 0.029874421656131744
Validation loss = 0.027256734669208527
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.029509836807847023
Validation loss = 0.028865540400147438
Validation loss = 0.025895332917571068
Validation loss = 0.02622094936668873
Validation loss = 0.027446206659078598
Validation loss = 0.02838880568742752
Validation loss = 0.028120314702391624
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2771618625277162
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27654867256637167
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27593818984547464
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2753303964757709
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27472527472527475
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2741228070175439
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2735229759299781
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27292576419213976
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27233115468409586
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2717391304347826
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27114967462039047
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27056277056277056
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26997840172786175
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26939655172413796
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26881720430107525
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26824034334763946
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2676659528907923
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2670940170940171
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26652452025586354
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26595744680851063
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2653927813163482
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2648305084745763
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2642706131078224
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26371308016877637
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2631578947368421
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0413  |
| Iteration     | 17       |
| MaximumReturn | -0.0238  |
| MinimumReturn | -0.0592  |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03412390127778053
Validation loss = 0.029151378199458122
Validation loss = 0.0264822319149971
Validation loss = 0.02673349715769291
Validation loss = 0.026627419516444206
Validation loss = 0.028671156615018845
Validation loss = 0.029071813449263573
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02903071418404579
Validation loss = 0.02980782464146614
Validation loss = 0.026950793340802193
Validation loss = 0.03505313768982887
Validation loss = 0.042806196957826614
Validation loss = 0.02995813451707363
Validation loss = 0.027079850435256958
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.031917087733745575
Validation loss = 0.029886972159147263
Validation loss = 0.03100481815636158
Validation loss = 0.033987872302532196
Validation loss = 0.028133679181337357
Validation loss = 0.02864113822579384
Validation loss = 0.028260547667741776
Validation loss = 0.029281653463840485
Validation loss = 0.028142068535089493
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02870246022939682
Validation loss = 0.02804376557469368
Validation loss = 0.028051698580384254
Validation loss = 0.029399730265140533
Validation loss = 0.028405483812093735
Validation loss = 0.029805997386574745
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03978622704744339
Validation loss = 0.02758309803903103
Validation loss = 0.027907848358154297
Validation loss = 0.03169698640704155
Validation loss = 0.028286246582865715
Validation loss = 0.03030957467854023
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26260504201680673
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2620545073375262
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2615062761506276
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2609603340292276
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2604166666666667
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2598752598752599
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25933609958506226
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2587991718426501
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25826446280991733
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25773195876288657
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.257201646090535
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25667351129363447
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25614754098360654
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2556237218813906
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25510204081632654
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2545824847250509
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2540650406504065
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2535496957403651
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25303643724696356
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25252525252525254
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25201612903225806
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2515090543259557
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25100401606425704
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.250501002004008
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00269 |
| Iteration     | 18       |
| MaximumReturn | -0.00193 |
| MinimumReturn | -0.00362 |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029174715280532837
Validation loss = 0.032474666833877563
Validation loss = 0.031109077855944633
Validation loss = 0.02618309110403061
Validation loss = 0.027809154242277145
Validation loss = 0.029321875423192978
Validation loss = 0.02789529412984848
Validation loss = 0.03011966310441494
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0287238247692585
Validation loss = 0.02904648333787918
Validation loss = 0.03030245378613472
Validation loss = 0.027426347136497498
Validation loss = 0.0287917610257864
Validation loss = 0.027972470968961716
Validation loss = 0.028979724273085594
Validation loss = 0.02892998978495598
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03195784240961075
Validation loss = 0.035649821162223816
Validation loss = 0.028876103460788727
Validation loss = 0.02868718095123768
Validation loss = 0.028884008526802063
Validation loss = 0.032777559012174606
Validation loss = 0.029031312093138695
Validation loss = 0.03387442231178284
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.030597850680351257
Validation loss = 0.027480652555823326
Validation loss = 0.032434917986392975
Validation loss = 0.028905469924211502
Validation loss = 0.028589550405740738
Validation loss = 0.03259621188044548
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.027213148772716522
Validation loss = 0.025971416383981705
Validation loss = 0.02991877682507038
Validation loss = 0.028074387460947037
Validation loss = 0.026924261823296547
Validation loss = 0.02942546457052231
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.249500998003992
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2549800796812749
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2584493041749503
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.25992063492063494
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2594059405940594
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25889328063241107
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2583826429980276
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2578740157480315
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25736738703339884
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.26666666666666666
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26614481409001955
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.26953125
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26900584795321636
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26848249027237353
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.27184466019417475
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2751937984496124
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2746615087040619
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27413127413127414
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27360308285163776
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27307692307692305
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.272552783109405
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2720306513409962
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27151051625239003
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27099236641221375
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2704761904761905
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.132   |
| Iteration     | 19       |
| MaximumReturn | -0.0307  |
| MinimumReturn | -2.22    |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.027188075706362724
Validation loss = 0.027847185730934143
Validation loss = 0.027703920379281044
Validation loss = 0.030368486419320107
Validation loss = 0.026171350851655006
Validation loss = 0.029167452827095985
Validation loss = 0.02736165188252926
Validation loss = 0.026999253779649734
Validation loss = 0.029350804165005684
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02821916714310646
Validation loss = 0.028503036126494408
Validation loss = 0.029423730447888374
Validation loss = 0.028319839388132095
Validation loss = 0.027930861338973045
Validation loss = 0.029046684503555298
Validation loss = 0.029508128762245178
Validation loss = 0.026284806430339813
Validation loss = 0.027369491755962372
Validation loss = 0.028124844655394554
Validation loss = 0.028288526460528374
Validation loss = 0.02779281511902809
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.031463563442230225
Validation loss = 0.029483869671821594
Validation loss = 0.029431462287902832
Validation loss = 0.02757767029106617
Validation loss = 0.030540799722075462
Validation loss = 0.0273471437394619
Validation loss = 0.02958623133599758
Validation loss = 0.02854299359023571
Validation loss = 0.03188374638557434
Validation loss = 0.0294576957821846
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02792121097445488
Validation loss = 0.02787400595843792
Validation loss = 0.05886698514223099
Validation loss = 0.029725566506385803
Validation loss = 0.02764064446091652
Validation loss = 0.02925749123096466
Validation loss = 0.028244387358427048
Validation loss = 0.025661462917923927
Validation loss = 0.026697779074311256
Validation loss = 0.03119477443397045
Validation loss = 0.027402278035879135
Validation loss = 0.02753935009241104
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02677260898053646
Validation loss = 0.03072795458137989
Validation loss = 0.026928627863526344
Validation loss = 0.026457035914063454
Validation loss = 0.026941213756799698
Validation loss = 0.02844162844121456
Validation loss = 0.027931641787290573
Validation loss = 0.027951201424002647
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26996197718631176
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.269449715370019
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2689393939393939
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2684310018903592
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2679245283018868
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2674199623352166
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2669172932330827
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26641651031894936
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26591760299625467
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26542056074766357
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26492537313432835
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2644320297951583
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26394052044609667
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2634508348794063
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26296296296296295
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26247689463955637
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26199261992619927
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26151012891344383
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2610294117647059
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26055045871559634
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2600732600732601
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2595978062157221
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2591240875912409
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2586520947176685
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2581818181818182
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00337 |
| Iteration     | 20       |
| MaximumReturn | -0.00244 |
| MinimumReturn | -0.00483 |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026934577152132988
Validation loss = 0.028198417276144028
Validation loss = 0.027494050562381744
Validation loss = 0.028001010417938232
Validation loss = 0.028308182954788208
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.028635088354349136
Validation loss = 0.026644162833690643
Validation loss = 0.03546523675322533
Validation loss = 0.029967311769723892
Validation loss = 0.02670520916581154
Validation loss = 0.028335323557257652
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025901906192302704
Validation loss = 0.028751663863658905
Validation loss = 0.027579190209507942
Validation loss = 0.03022555448114872
Validation loss = 0.02790261059999466
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027973724529147148
Validation loss = 0.027506321668624878
Validation loss = 0.028747988864779472
Validation loss = 0.028759025037288666
Validation loss = 0.028582490980625153
Validation loss = 0.02692437544465065
Validation loss = 0.028531959280371666
Validation loss = 0.02705434523522854
Validation loss = 0.026325004175305367
Validation loss = 0.027473896741867065
Validation loss = 0.02551056630909443
Validation loss = 0.02737986110150814
Validation loss = 0.02922895923256874
Validation loss = 0.026047833263874054
Validation loss = 0.027360670268535614
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02607075124979019
Validation loss = 0.02928839437663555
Validation loss = 0.030929425731301308
Validation loss = 0.028814997524023056
Validation loss = 0.027721313759684563
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2577132486388385
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2572463768115942
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25678119349005424
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2563176895306859
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25585585585585585
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25539568345323743
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25493716337522443
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25448028673835127
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25402504472271914
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25357142857142856
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2531194295900178
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2526690391459075
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2522202486678508
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25177304964539005
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2513274336283186
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2508833922261484
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25044091710758376
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24956063268892795
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24912280701754386
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2486865148861646
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24825174825174826
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24781849912739964
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24738675958188153
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24695652173913044
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00188 |
| Iteration     | 21       |
| MaximumReturn | -0.00144 |
| MinimumReturn | -0.00242 |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.030448302626609802
Validation loss = 0.038010790944099426
Validation loss = 0.02496318146586418
Validation loss = 0.027369994670152664
Validation loss = 0.027580086141824722
Validation loss = 0.025524994358420372
Validation loss = 0.026226378977298737
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026038330048322678
Validation loss = 0.02867380529642105
Validation loss = 0.026721684262156487
Validation loss = 0.026508428156375885
Validation loss = 0.025156959891319275
Validation loss = 0.031071597710251808
Validation loss = 0.02733842097222805
Validation loss = 0.02610802836716175
Validation loss = 0.029030149802565575
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.027337945997714996
Validation loss = 0.028070224449038506
Validation loss = 0.02935909293591976
Validation loss = 0.03286835551261902
Validation loss = 0.027184175327420235
Validation loss = 0.027135148644447327
Validation loss = 0.025602543726563454
Validation loss = 0.02701173722743988
Validation loss = 0.02708800509572029
Validation loss = 0.026933958753943443
Validation loss = 0.025898735970258713
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026819627732038498
Validation loss = 0.02731240727007389
Validation loss = 0.02696234919130802
Validation loss = 0.02366546355187893
Validation loss = 0.02599158324301243
Validation loss = 0.028656762093305588
Validation loss = 0.02687821537256241
Validation loss = 0.030169812962412834
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.027143560349941254
Validation loss = 0.0271519273519516
Validation loss = 0.025752894580364227
Validation loss = 0.030757831409573555
Validation loss = 0.031159237027168274
Validation loss = 0.024663252755999565
Validation loss = 0.02521679736673832
Validation loss = 0.023808544501662254
Validation loss = 0.02732739970088005
Validation loss = 0.028236985206604004
Validation loss = 0.026434998959302902
Validation loss = 0.025716593489050865
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2465277777777778
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24610051993067592
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24567474048442905
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2452504317789292
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24482758620689654
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24440619621342513
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24398625429553264
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24356775300171526
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24315068493150685
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24273504273504273
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24232081911262798
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24190800681431004
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24149659863945577
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24108658743633277
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24067796610169492
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24027072758037224
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23986486486486486
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23946037099494097
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23905723905723905
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23865546218487396
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23825503355704697
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23785594639865998
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23745819397993312
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2370617696160267
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23666666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0417  |
| Iteration     | 22       |
| MaximumReturn | -0.0288  |
| MinimumReturn | -0.0606  |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024967806413769722
Validation loss = 0.0288570336997509
Validation loss = 0.02635056897997856
Validation loss = 0.027506908401846886
Validation loss = 0.025777745991945267
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.028680866584181786
Validation loss = 0.02497236430644989
Validation loss = 0.02501138113439083
Validation loss = 0.024821657687425613
Validation loss = 0.026168540120124817
Validation loss = 0.02508310042321682
Validation loss = 0.02502576634287834
Validation loss = 0.02969990111887455
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.026764526963233948
Validation loss = 0.030965536832809448
Validation loss = 0.028147676959633827
Validation loss = 0.028761431574821472
Validation loss = 0.025275656953454018
Validation loss = 0.025655727833509445
Validation loss = 0.027655482292175293
Validation loss = 0.025442000478506088
Validation loss = 0.024609241634607315
Validation loss = 0.027028117328882217
Validation loss = 0.025963803753256798
Validation loss = 0.025525670498609543
Validation loss = 0.029801124706864357
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02990240976214409
Validation loss = 0.02535044588148594
Validation loss = 0.026455912739038467
Validation loss = 0.025284294039011
Validation loss = 0.024385489523410797
Validation loss = 0.02626645937561989
Validation loss = 0.024574419483542442
Validation loss = 0.02402679994702339
Validation loss = 0.02627737447619438
Validation loss = 0.027652855962514877
Validation loss = 0.02464492805302143
Validation loss = 0.02411906234920025
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023604998365044594
Validation loss = 0.023957891389727592
Validation loss = 0.025602031499147415
Validation loss = 0.026359379291534424
Validation loss = 0.02419775351881981
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23627287853577372
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23588039867109634
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23548922056384744
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23509933774834438
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23471074380165288
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23432343234323433
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23393739703459637
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23355263157894737
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23316912972085385
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23278688524590163
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23240589198036007
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23202614379084968
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23164763458401305
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23127035830618892
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23089430894308943
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2305194805194805
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23014586709886548
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2297734627831715
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2294022617124394
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22903225806451613
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2286634460547504
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2282958199356913
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22792937399678972
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22756410256410256
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2272
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0186  |
| Iteration     | 23       |
| MaximumReturn | -0.0126  |
| MinimumReturn | -0.0269  |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025675544515252113
Validation loss = 0.027865435928106308
Validation loss = 0.031076708808541298
Validation loss = 0.02506800927221775
Validation loss = 0.02781842276453972
Validation loss = 0.02433686889708042
Validation loss = 0.02591606043279171
Validation loss = 0.027126524597406387
Validation loss = 0.028391171246767044
Validation loss = 0.02592594362795353
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0316302515566349
Validation loss = 0.02510964497923851
Validation loss = 0.0245467871427536
Validation loss = 0.02580535039305687
Validation loss = 0.026064738631248474
Validation loss = 0.025514155626296997
Validation loss = 0.02732047066092491
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.028705071657896042
Validation loss = 0.027473438531160355
Validation loss = 0.027839835733175278
Validation loss = 0.026546508073806763
Validation loss = 0.024923212826251984
Validation loss = 0.02703477442264557
Validation loss = 0.030937200412154198
Validation loss = 0.029239390045404434
Validation loss = 0.02596861496567726
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027215618640184402
Validation loss = 0.02477823570370674
Validation loss = 0.024677995592355728
Validation loss = 0.023742251098155975
Validation loss = 0.023763032630085945
Validation loss = 0.024537205696105957
Validation loss = 0.024569425731897354
Validation loss = 0.025676686316728592
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023700037971138954
Validation loss = 0.026058802381157875
Validation loss = 0.024854427203536034
Validation loss = 0.024251442402601242
Validation loss = 0.026699315756559372
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2268370607028754
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22647527910685805
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22611464968152867
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22575516693163752
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2253968253968254
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22503961965134706
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22468354430379747
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22432859399684044
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22397476340694006
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22362204724409449
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22327044025157233
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22291993720565148
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2225705329153605
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2222222222222222
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.221875
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22152886115444617
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22118380062305296
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2208398133748056
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2204968944099379
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22015503875968992
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21981424148606812
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21947449768160743
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2191358024691358
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21879815100154082
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21846153846153846
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00475 |
| Iteration     | 24       |
| MaximumReturn | -0.00316 |
| MinimumReturn | -0.0118  |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025245266035199165
Validation loss = 0.024897754192352295
Validation loss = 0.024366389960050583
Validation loss = 0.026083175092935562
Validation loss = 0.026866110041737556
Validation loss = 0.02667105197906494
Validation loss = 0.02480240724980831
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024961229413747787
Validation loss = 0.024350479245185852
Validation loss = 0.02447585202753544
Validation loss = 0.024552389979362488
Validation loss = 0.02518349327147007
Validation loss = 0.026629822328686714
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024305321276187897
Validation loss = 0.02370917797088623
Validation loss = 0.024423059076070786
Validation loss = 0.026115277782082558
Validation loss = 0.025123681873083115
Validation loss = 0.023897258564829826
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0260164737701416
Validation loss = 0.025017553940415382
Validation loss = 0.022653182968497276
Validation loss = 0.023851074278354645
Validation loss = 0.024891842156648636
Validation loss = 0.02281590923666954
Validation loss = 0.025182966142892838
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022612709552049637
Validation loss = 0.022735176607966423
Validation loss = 0.022813966497778893
Validation loss = 0.02454591915011406
Validation loss = 0.023702269420027733
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21812596006144394
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21779141104294478
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21745788667687596
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21712538226299694
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.216793893129771
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21646341463414634
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2161339421613394
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21580547112462006
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21547799696509864
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21515151515151515
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21482602118003025
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21450151057401812
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21417797888386123
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21385542168674698
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21353383458646616
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2132132132132132
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2128935532233883
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2125748502994012
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21225710014947682
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21194029850746268
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21162444113263784
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2113095238095238
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21099554234769688
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21068249258160238
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21037037037037037
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.133   |
| Iteration     | 25       |
| MaximumReturn | -0.0174  |
| MinimumReturn | -2.7     |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022907035425305367
Validation loss = 0.024368654936552048
Validation loss = 0.025902612134814262
Validation loss = 0.025551551952958107
Validation loss = 0.024784885346889496
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02661849372088909
Validation loss = 0.024000244215130806
Validation loss = 0.022181399166584015
Validation loss = 0.02424818091094494
Validation loss = 0.0237263310700655
Validation loss = 0.022219639271497726
Validation loss = 0.023494087159633636
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024212175980210304
Validation loss = 0.024898655712604523
Validation loss = 0.02655491791665554
Validation loss = 0.027395425364375114
Validation loss = 0.023738481104373932
Validation loss = 0.026701154187321663
Validation loss = 0.02482668310403824
Validation loss = 0.025512533262372017
Validation loss = 0.023542409762740135
Validation loss = 0.023987168446183205
Validation loss = 0.024046938866376877
Validation loss = 0.025482676923274994
Validation loss = 0.02381485141813755
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02179574780166149
Validation loss = 0.028000492602586746
Validation loss = 0.023387519642710686
Validation loss = 0.023283714428544044
Validation loss = 0.021811356768012047
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023286234587430954
Validation loss = 0.025983238592743874
Validation loss = 0.022384723648428917
Validation loss = 0.024277638643980026
Validation loss = 0.023170847445726395
Validation loss = 0.02306564897298813
Validation loss = 0.02369760349392891
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21005917159763313
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20974889217134415
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20943952802359883
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20913107511045656
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2088235294117647
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20851688693098386
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20821114369501467
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20790629575402636
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20760233918128654
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2072992700729927
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20699708454810495
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2066957787481805
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2063953488372093
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20609579100145137
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20579710144927535
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20549927641099855
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20520231213872833
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2049062049062049
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20461095100864554
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20431654676258992
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20402298850574713
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20373027259684362
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2034383954154728
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20314735336194564
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20285714285714285
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00116  |
| Iteration     | 26        |
| MaximumReturn | -0.000834 |
| MinimumReturn | -0.00156  |
| TotalSamples  | 46648     |
-----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026073424145579338
Validation loss = 0.0246988944709301
Validation loss = 0.026589319109916687
Validation loss = 0.022987138479948044
Validation loss = 0.026860838755965233
Validation loss = 0.02390674687922001
Validation loss = 0.02405111864209175
Validation loss = 0.02746599167585373
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022863607853651047
Validation loss = 0.02538677118718624
Validation loss = 0.02511412464082241
Validation loss = 0.02242302894592285
Validation loss = 0.023223716765642166
Validation loss = 0.023575512692332268
Validation loss = 0.023682529106736183
Validation loss = 0.02725411206483841
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024201538413763046
Validation loss = 0.022798046469688416
Validation loss = 0.022758033126592636
Validation loss = 0.023612791672348976
Validation loss = 0.023327277973294258
Validation loss = 0.023305919021368027
Validation loss = 0.024346698075532913
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02218620851635933
Validation loss = 0.02264167182147503
Validation loss = 0.021944763138890266
Validation loss = 0.024797147139906883
Validation loss = 0.022997083142399788
Validation loss = 0.023100515827536583
Validation loss = 0.022046955302357674
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02335529960691929
Validation loss = 0.02384369820356369
Validation loss = 0.025943079963326454
Validation loss = 0.022225501015782356
Validation loss = 0.024848612025380135
Validation loss = 0.0243703480809927
Validation loss = 0.022365644574165344
Validation loss = 0.023956801742315292
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20256776034236804
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2022792022792023
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2019914651493599
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20170454545454544
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20141843971631207
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20113314447592068
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20084865629420084
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20056497175141244
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2002820874471086
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19971870604781997
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.199438202247191
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19915848527349228
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19887955182072828
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1986013986013986
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19832402234636873
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19804741980474197
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1977715877437326
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19749652294853964
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19722222222222222
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19694868238557559
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19667590027700832
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19640387275242047
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19613259668508287
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19586206896551725
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0373  |
| Iteration     | 27       |
| MaximumReturn | -0.0202  |
| MinimumReturn | -0.058   |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021632961928844452
Validation loss = 0.02319403924047947
Validation loss = 0.02149115316569805
Validation loss = 0.022805437445640564
Validation loss = 0.0246952623128891
Validation loss = 0.02407062239944935
Validation loss = 0.022661060094833374
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022843776270747185
Validation loss = 0.025972561910748482
Validation loss = 0.02868816815316677
Validation loss = 0.022816145792603493
Validation loss = 0.021371489390730858
Validation loss = 0.022091373801231384
Validation loss = 0.025185713544487953
Validation loss = 0.02297123335301876
Validation loss = 0.02255738526582718
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02339913696050644
Validation loss = 0.02152320183813572
Validation loss = 0.021134018898010254
Validation loss = 0.021692002192139626
Validation loss = 0.022965790703892708
Validation loss = 0.02215978503227234
Validation loss = 0.022382795810699463
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02385018579661846
Validation loss = 0.022094180807471275
Validation loss = 0.021745355799794197
Validation loss = 0.02223152481019497
Validation loss = 0.02254846692085266
Validation loss = 0.022046992555260658
Validation loss = 0.02239055745303631
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02154555730521679
Validation loss = 0.021347882226109505
Validation loss = 0.021271347999572754
Validation loss = 0.022502286359667778
Validation loss = 0.02094658464193344
Validation loss = 0.023700490593910217
Validation loss = 0.023187952116131783
Validation loss = 0.02646687626838684
Validation loss = 0.022372595965862274
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19559228650137742
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1953232462173315
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19505494505494506
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19478737997256515
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19452054794520549
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19425444596443228
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19398907103825136
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1937244201909959
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19346049046321526
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19319727891156463
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19293478260869565
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1926729986431479
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19241192411924118
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19215155615696888
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1918918918918919
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19163292847503374
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19137466307277629
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1911170928667564
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19086021505376344
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1906040268456376
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1903485254691689
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19009370816599733
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18983957219251338
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18958611481975968
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18933333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0404  |
| Iteration     | 28       |
| MaximumReturn | -0.0263  |
| MinimumReturn | -0.0577  |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026813190430402756
Validation loss = 0.021762952208518982
Validation loss = 0.022122956812381744
Validation loss = 0.021993493661284447
Validation loss = 0.0289776474237442
Validation loss = 0.02216828055679798
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02197159454226494
Validation loss = 0.021201156079769135
Validation loss = 0.020303724333643913
Validation loss = 0.020569313317537308
Validation loss = 0.025563828647136688
Validation loss = 0.02963985502719879
Validation loss = 0.021673550829291344
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02177376113831997
Validation loss = 0.02374870330095291
Validation loss = 0.022148458287119865
Validation loss = 0.020995711907744408
Validation loss = 0.022125281393527985
Validation loss = 0.02289452590048313
Validation loss = 0.022658102214336395
Validation loss = 0.0225130096077919
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02048262394964695
Validation loss = 0.020921731367707253
Validation loss = 0.02317962795495987
Validation loss = 0.021605854853987694
Validation loss = 0.019419867545366287
Validation loss = 0.021623874083161354
Validation loss = 0.020668964833021164
Validation loss = 0.01972994953393936
Validation loss = 0.02170821651816368
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021382978186011314
Validation loss = 0.02130790986120701
Validation loss = 0.020365117117762566
Validation loss = 0.022709613665938377
Validation loss = 0.021829016506671906
Validation loss = 0.020418211817741394
Validation loss = 0.02069808915257454
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18908122503328895
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18882978723404256
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18857901726427623
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1883289124668435
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1880794701986755
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18783068783068782
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18758256274768825
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18733509234828497
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18708827404479578
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1868421052631579
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18659658344283836
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18635170603674542
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18610747051114024
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18586387434554974
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18562091503267975
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.185378590078329
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18513689700130379
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18489583333333334
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1846553966189857
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18441558441558442
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18417639429312582
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18393782383419688
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18369987063389392
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1834625322997416
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1832258064516129
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0023  |
| Iteration     | 29       |
| MaximumReturn | -0.00143 |
| MinimumReturn | -0.00412 |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02175416797399521
Validation loss = 0.023850901052355766
Validation loss = 0.02242964319884777
Validation loss = 0.022338315844535828
Validation loss = 0.02359003573656082
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021451057866215706
Validation loss = 0.021623894572257996
Validation loss = 0.02204432338476181
Validation loss = 0.023805245757102966
Validation loss = 0.02224419265985489
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021140657365322113
Validation loss = 0.02108399197459221
Validation loss = 0.022246256470680237
Validation loss = 0.024943595752120018
Validation loss = 0.02184131182730198
Validation loss = 0.023401014506816864
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02193763107061386
Validation loss = 0.022750847041606903
Validation loss = 0.02217428758740425
Validation loss = 0.021342646330595016
Validation loss = 0.02087060734629631
Validation loss = 0.020267339423298836
Validation loss = 0.022494496777653694
Validation loss = 0.02219078503549099
Validation loss = 0.02078656479716301
Validation loss = 0.021232202649116516
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02207479625940323
Validation loss = 0.02279571257531643
Validation loss = 0.023952340707182884
Validation loss = 0.021186452358961105
Validation loss = 0.0204719677567482
Validation loss = 0.022845862433314323
Validation loss = 0.02125869318842888
Validation loss = 0.022148044779896736
Validation loss = 0.02095736190676689
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18298969072164947
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18275418275418276
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18251928020565553
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1822849807445443
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18205128205128204
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18181818181818182
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1815856777493606
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18135376756066413
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18112244897959184
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18089171974522292
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1806615776081425
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18043202033036848
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1802030456852792
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17997465145754118
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17974683544303796
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.179519595448799
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17929292929292928
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17906683480453972
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17884130982367757
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17861635220125785
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17839195979899497
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.178168130489335
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17794486215538846
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17772215269086358
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1775
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0108  |
| Iteration     | 30       |
| MaximumReturn | -0.00628 |
| MinimumReturn | -0.0334  |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021013803780078888
Validation loss = 0.02078590914607048
Validation loss = 0.024125132709741592
Validation loss = 0.022683020681142807
Validation loss = 0.02204183116555214
Validation loss = 0.021358907222747803
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.027253927662968636
Validation loss = 0.021489031612873077
Validation loss = 0.02153942920267582
Validation loss = 0.022825289517641068
Validation loss = 0.022223036736249924
Validation loss = 0.02177552506327629
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02239304594695568
Validation loss = 0.021010883152484894
Validation loss = 0.02319716103374958
Validation loss = 0.0235784649848938
Validation loss = 0.02109963819384575
Validation loss = 0.022496985271573067
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02155381254851818
Validation loss = 0.020414777100086212
Validation loss = 0.02101930044591427
Validation loss = 0.018557140603661537
Validation loss = 0.02011512964963913
Validation loss = 0.02020418271422386
Validation loss = 0.020237883552908897
Validation loss = 0.0204202551394701
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0205776896327734
Validation loss = 0.02094237133860588
Validation loss = 0.021539418026804924
Validation loss = 0.020535795018076897
Validation loss = 0.020460963249206543
Validation loss = 0.023668933659791946
Validation loss = 0.019813720136880875
Validation loss = 0.021499060094356537
Validation loss = 0.020015856251120567
Validation loss = 0.021257253363728523
Validation loss = 0.02089863270521164
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1772784019975031
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1770573566084788
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17683686176836863
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17661691542288557
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1763975155279503
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1761786600496278
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17596034696406443
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17574257425742573
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17552533992583436
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17530864197530865
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17509247842170161
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1748768472906404
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17466174661746617
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17444717444717445
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17423312883435582
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17401960784313725
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17380660954712362
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17359413202933985
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17338217338217338
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17317073170731706
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17295980511571254
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17274939172749393
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17253948967193194
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17233009708737865
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17212121212121212
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000833 |
| Iteration     | 31        |
| MaximumReturn | -0.000657 |
| MinimumReturn | -0.00109  |
| TotalSamples  | 54978     |
-----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02027888223528862
Validation loss = 0.025722134858369827
Validation loss = 0.020963840186595917
Validation loss = 0.021596739068627357
Validation loss = 0.0207049623131752
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02071686089038849
Validation loss = 0.023637091740965843
Validation loss = 0.020285626873373985
Validation loss = 0.02221451699733734
Validation loss = 0.02049979753792286
Validation loss = 0.020448116585612297
Validation loss = 0.020920587703585625
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021001173183321953
Validation loss = 0.021641911938786507
Validation loss = 0.023117471486330032
Validation loss = 0.02218194119632244
Validation loss = 0.021624112501740456
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019686613231897354
Validation loss = 0.02081158198416233
Validation loss = 0.02066059596836567
Validation loss = 0.02244810201227665
Validation loss = 0.022015614435076714
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019907129928469658
Validation loss = 0.020969906821846962
Validation loss = 0.019821705296635628
Validation loss = 0.020523643121123314
Validation loss = 0.02129368670284748
Validation loss = 0.020215323194861412
Validation loss = 0.021456148475408554
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17312348668280872
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.18137847642079807
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.1859903381642512
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18697225572979492
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.19036144578313252
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.19133574007220217
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.1935096153846154
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.19447779111644659
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.19544364508393286
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.2
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2021531100478469
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2019115890083632
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.2064439140811456
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.20858164481525626
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.2154761904761905
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.22235434007134364
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22209026128266032
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22301304863582444
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.22985781990521326
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2319526627218935
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.24113475177304963
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2408500590318772
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.24764150943396226
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24734982332155478
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.2564705882352941
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.84    |
| Iteration     | 32       |
| MaximumReturn | -0.0884  |
| MinimumReturn | -49.5    |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022754592821002007
Validation loss = 0.02227051369845867
Validation loss = 0.02320881374180317
Validation loss = 0.02184167131781578
Validation loss = 0.02460298500955105
Validation loss = 0.02715376392006874
Validation loss = 0.022809484973549843
Validation loss = 0.021000953391194344
Validation loss = 0.02264087088406086
Validation loss = 0.021898144856095314
Validation loss = 0.021785253658890724
Validation loss = 0.022601639851927757
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02257383055984974
Validation loss = 0.021833511069417
Validation loss = 0.020497890189290047
Validation loss = 0.025949666276574135
Validation loss = 0.022347507998347282
Validation loss = 0.020490970462560654
Validation loss = 0.022273099049925804
Validation loss = 0.0215739943087101
Validation loss = 0.02235981449484825
Validation loss = 0.023509519174695015
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022247767075896263
Validation loss = 0.021904438734054565
Validation loss = 0.022316258400678635
Validation loss = 0.021366527304053307
Validation loss = 0.02104322798550129
Validation loss = 0.0207359679043293
Validation loss = 0.022881165146827698
Validation loss = 0.025220680981874466
Validation loss = 0.02211252972483635
Validation loss = 0.02266220934689045
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02187979780137539
Validation loss = 0.022162726148962975
Validation loss = 0.02090446837246418
Validation loss = 0.022010406479239464
Validation loss = 0.021306639537215233
Validation loss = 0.021413322538137436
Validation loss = 0.022514699026942253
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022760208696126938
Validation loss = 0.023616058751940727
Validation loss = 0.021140748634934425
Validation loss = 0.02154647186398506
Validation loss = 0.022504309192299843
Validation loss = 0.023292647674679756
Validation loss = 0.021393099799752235
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25616921269095183
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25586854460093894
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2555685814771395
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25526932084309134
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2549707602339181
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2546728971962617
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2543757292882147
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2540792540792541
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25378346915017463
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2534883720930233
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25319396051103366
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2529002320185615
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2526071842410197
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2523148148148148
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2520231213872832
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2517321016166282
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25144175317185696
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2511520737327189
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2508630609896433
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25057471264367814
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2502870264064294
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24971363115693013
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2494279176201373
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24914285714285714
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0132  |
| Iteration     | 33       |
| MaximumReturn | -0.0104  |
| MinimumReturn | -0.0195  |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02510853111743927
Validation loss = 0.023631805554032326
Validation loss = 0.030083095654845238
Validation loss = 0.02510552480816841
Validation loss = 0.02425050176680088
Validation loss = 0.024993840605020523
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022816959768533707
Validation loss = 0.023729152977466583
Validation loss = 0.022431612014770508
Validation loss = 0.023844733834266663
Validation loss = 0.025962159037590027
Validation loss = 0.02415253035724163
Validation loss = 0.024665363132953644
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02402522973716259
Validation loss = 0.024445103481411934
Validation loss = 0.02504921518266201
Validation loss = 0.024573976173996925
Validation loss = 0.024600869044661522
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023068472743034363
Validation loss = 0.02365243434906006
Validation loss = 0.02434602938592434
Validation loss = 0.023341897875070572
Validation loss = 0.02551824413239956
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023663293570280075
Validation loss = 0.0245071891695261
Validation loss = 0.024720951914787292
Validation loss = 0.025249991565942764
Validation loss = 0.0240899920463562
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24885844748858446
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24857468643101482
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24829157175398633
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24800910125142206
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24772727272727274
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2474460839954597
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2471655328798186
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24688561721404303
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24660633484162897
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24632768361581922
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24604966139954854
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24577226606538896
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24549549549549549
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2452193475815523
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2449438202247191
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.244668911335578
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24439461883408073
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24412094064949608
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24384787472035793
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2435754189944134
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24330357142857142
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24303232998885171
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24276169265033407
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2424916573971079
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24222222222222223
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00124  |
| Iteration     | 34        |
| MaximumReturn | -0.000898 |
| MinimumReturn | -0.00172  |
| TotalSamples  | 59976     |
-----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02418828196823597
Validation loss = 0.024917995557188988
Validation loss = 0.025014929473400116
Validation loss = 0.028362754732370377
Validation loss = 0.02525147795677185
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022980500012636185
Validation loss = 0.023984050378203392
Validation loss = 0.024379046633839607
Validation loss = 0.023230982944369316
Validation loss = 0.02418212778866291
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024166787043213844
Validation loss = 0.02450123056769371
Validation loss = 0.025111760944128036
Validation loss = 0.02464914321899414
Validation loss = 0.025574486702680588
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023437535390257835
Validation loss = 0.02787172608077526
Validation loss = 0.022887347266077995
Validation loss = 0.02320794016122818
Validation loss = 0.02282657101750374
Validation loss = 0.024536898359656334
Validation loss = 0.02332114800810814
Validation loss = 0.023977071046829224
Validation loss = 0.024151170626282692
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024361399933695793
Validation loss = 0.023713363334536552
Validation loss = 0.027355287224054337
Validation loss = 0.027729017660021782
Validation loss = 0.024536190554499626
Validation loss = 0.023612551391124725
Validation loss = 0.024088088423013687
Validation loss = 0.02451099269092083
Validation loss = 0.025266049429774284
Validation loss = 0.024246972054243088
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24195338512763595
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24168514412416853
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24141749723145073
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2411504424778761
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2408839779005525
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24061810154525387
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24035281146637266
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24008810572687225
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23982398239823982
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23956043956043957
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23929747530186607
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23903508771929824
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23877327491785322
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23851203501094093
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23825136612021858
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23799126637554585
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23773173391494
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2374727668845316
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23721436343852012
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23695652173913043
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23669923995656894
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23644251626898047
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2361863488624052
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23593073593073594
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23567567567567568
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00533 |
| Iteration     | 35       |
| MaximumReturn | -0.00397 |
| MinimumReturn | -0.00713 |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024361412972211838
Validation loss = 0.02523368038237095
Validation loss = 0.02532998099923134
Validation loss = 0.026171846315264702
Validation loss = 0.02629021368920803
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024172630161046982
Validation loss = 0.023971041664481163
Validation loss = 0.022031884640455246
Validation loss = 0.024208566173911095
Validation loss = 0.024116400629281998
Validation loss = 0.026576722040772438
Validation loss = 0.02318117581307888
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02419429086148739
Validation loss = 0.024979254230856895
Validation loss = 0.02529253251850605
Validation loss = 0.025745155289769173
Validation loss = 0.025426488369703293
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023674534633755684
Validation loss = 0.024231228977441788
Validation loss = 0.023139549419283867
Validation loss = 0.024459857493638992
Validation loss = 0.024186749011278152
Validation loss = 0.02403012104332447
Validation loss = 0.024478880688548088
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02618660219013691
Validation loss = 0.025060558691620827
Validation loss = 0.025723682716488838
Validation loss = 0.02583770826458931
Validation loss = 0.024227073416113853
Validation loss = 0.023658551275730133
Validation loss = 0.02557251788675785
Validation loss = 0.02398787997663021
Validation loss = 0.023421816527843475
Validation loss = 0.023782605305314064
Validation loss = 0.02459954097867012
Validation loss = 0.024913078173995018
Validation loss = 0.023811237886548042
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23542116630669546
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23516720604099245
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2349137931034483
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23466092572658773
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23440860215053763
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23415682062298604
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23390557939914164
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23365487674169347
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2334047109207709
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23315508021390374
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2329059829059829
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23265741728922093
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.232409381663113
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2321618743343983
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23191489361702128
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2316684378320935
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23142250530785563
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23117709437963946
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2309322033898305
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2306878306878307
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23044397463002114
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23020063357972545
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.229957805907173
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2297154899894626
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2294736842105263
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00389 |
| Iteration     | 36       |
| MaximumReturn | -0.0023  |
| MinimumReturn | -0.00554 |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025625105947256088
Validation loss = 0.023979833349585533
Validation loss = 0.023960169404745102
Validation loss = 0.025026835501194
Validation loss = 0.024343788623809814
Validation loss = 0.025042271241545677
Validation loss = 0.025798823684453964
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024257397279143333
Validation loss = 0.02402595803141594
Validation loss = 0.022843172773718834
Validation loss = 0.02284492366015911
Validation loss = 0.023320980370044708
Validation loss = 0.023939909413456917
Validation loss = 0.02630811743438244
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024278739467263222
Validation loss = 0.025049395859241486
Validation loss = 0.025415580719709396
Validation loss = 0.024968845769762993
Validation loss = 0.024939166381955147
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022245442494750023
Validation loss = 0.027136944234371185
Validation loss = 0.022326722741127014
Validation loss = 0.023288028314709663
Validation loss = 0.02293069288134575
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025802936404943466
Validation loss = 0.02618669532239437
Validation loss = 0.023740844801068306
Validation loss = 0.0225653238594532
Validation loss = 0.025393668562173843
Validation loss = 0.022718723863363266
Validation loss = 0.025553379207849503
Validation loss = 0.022741178050637245
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22923238696109358
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23004201680672268
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.229800629590766
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22955974842767296
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2293193717277487
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2290794979079498
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22884012539184953
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22964509394572025
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22940563086548488
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22916666666666666
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2299687825182102
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23076923076923078
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23052959501557632
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23132780082987553
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23108808290155441
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2318840579710145
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23164426059979318
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23140495867768596
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23116615067079463
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2309278350515464
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23069001029866118
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23045267489711935
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2312435765673176
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23100616016427106
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23076923076923078
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -17.4    |
| Iteration     | 37       |
| MaximumReturn | -0.0549  |
| MinimumReturn | -81.2    |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025583909824490547
Validation loss = 0.025533586740493774
Validation loss = 0.026934616267681122
Validation loss = 0.024258621037006378
Validation loss = 0.025692246854305267
Validation loss = 0.02399902604520321
Validation loss = 0.0252744872123003
Validation loss = 0.02195053920149803
Validation loss = 0.023413214832544327
Validation loss = 0.02423667162656784
Validation loss = 0.027229873463511467
Validation loss = 0.0240712258964777
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02466728538274765
Validation loss = 0.025422044098377228
Validation loss = 0.022750213742256165
Validation loss = 0.022057197988033295
Validation loss = 0.023515092208981514
Validation loss = 0.023034943267703056
Validation loss = 0.02440308965742588
Validation loss = 0.022541599348187447
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025342945009469986
Validation loss = 0.02663922868669033
Validation loss = 0.024087395519018173
Validation loss = 0.023845603689551353
Validation loss = 0.023983463644981384
Validation loss = 0.024771660566329956
Validation loss = 0.02453731559216976
Validation loss = 0.02354545332491398
Validation loss = 0.023897990584373474
Validation loss = 0.02405484765768051
Validation loss = 0.02282760664820671
Validation loss = 0.02380973845720291
Validation loss = 0.024564797058701515
Validation loss = 0.023987919092178345
Validation loss = 0.022628553211688995
Validation loss = 0.023264644667506218
Validation loss = 0.02342836558818817
Validation loss = 0.02347410097718239
Validation loss = 0.02362184040248394
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02284027263522148
Validation loss = 0.022809971123933792
Validation loss = 0.023238655179739
Validation loss = 0.023740099743008614
Validation loss = 0.022527310997247696
Validation loss = 0.022824538871645927
Validation loss = 0.022465229034423828
Validation loss = 0.02158714458346367
Validation loss = 0.021955206990242004
Validation loss = 0.02138734795153141
Validation loss = 0.022325146943330765
Validation loss = 0.022785600274801254
Validation loss = 0.022175848484039307
Validation loss = 0.02191842719912529
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025321338325738907
Validation loss = 0.025171758607029915
Validation loss = 0.024477068334817886
Validation loss = 0.023288995027542114
Validation loss = 0.02295331284403801
Validation loss = 0.02531611919403076
Validation loss = 0.02381136268377304
Validation loss = 0.02336670458316803
Validation loss = 0.02455044537782669
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2305327868852459
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23029682702149437
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23006134969325154
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22982635342185903
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22959183673469388
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22935779816513763
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22912423625254583
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2288911495422177
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22865853658536586
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22842639593908629
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2281947261663286
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22796352583586627
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22773279352226722
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2275025278058645
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22727272727272727
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22704339051463168
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22681451612903225
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22658610271903323
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22635814889336017
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22613065326633167
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22590361445783133
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22567703109327983
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22545090180360722
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22522522522522523
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.225
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0321  |
| Iteration     | 38       |
| MaximumReturn | -0.0208  |
| MinimumReturn | -0.0454  |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024673426523804665
Validation loss = 0.02514440380036831
Validation loss = 0.028534501791000366
Validation loss = 0.024230564013123512
Validation loss = 0.023879025131464005
Validation loss = 0.02654748223721981
Validation loss = 0.023751763626933098
Validation loss = 0.02353404462337494
Validation loss = 0.02509564906358719
Validation loss = 0.02473975345492363
Validation loss = 0.02362643927335739
Validation loss = 0.024322543293237686
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02402198687195778
Validation loss = 0.02368347719311714
Validation loss = 0.02268640138208866
Validation loss = 0.025002408772706985
Validation loss = 0.022984998300671577
Validation loss = 0.022967441007494926
Validation loss = 0.021856168285012245
Validation loss = 0.02322981134057045
Validation loss = 0.024295968934893608
Validation loss = 0.023210475221276283
Validation loss = 0.022929206490516663
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0243071261793375
Validation loss = 0.0264093279838562
Validation loss = 0.0260895024985075
Validation loss = 0.024992555379867554
Validation loss = 0.026312891393899918
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02271002158522606
Validation loss = 0.023555060848593712
Validation loss = 0.022193854674696922
Validation loss = 0.02118157036602497
Validation loss = 0.023340988904237747
Validation loss = 0.022524358704686165
Validation loss = 0.02360418811440468
Validation loss = 0.02372370846569538
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024166012182831764
Validation loss = 0.023790957406163216
Validation loss = 0.02494461089372635
Validation loss = 0.023289088159799576
Validation loss = 0.022304482758045197
Validation loss = 0.02402741089463234
Validation loss = 0.02380269579589367
Validation loss = 0.025292661041021347
Validation loss = 0.025170817971229553
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22477522477522477
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2245508982035928
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22432701894317048
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2241035856573705
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22388059701492538
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22365805168986083
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2234359483614697
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22321428571428573
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22299306243805747
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22277227722772278
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22255192878338279
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22233201581027667
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22211253701875616
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22189349112426035
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22167487684729065
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22145669291338582
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22123893805309736
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22102161100196463
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22080471050049066
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22058823529411764
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22037218413320275
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22015655577299412
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21994134897360704
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2197265625
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21951219512195122
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00216 |
| Iteration     | 39       |
| MaximumReturn | -0.00144 |
| MinimumReturn | -0.00733 |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02569986879825592
Validation loss = 0.026008062064647675
Validation loss = 0.025584332644939423
Validation loss = 0.02256094664335251
Validation loss = 0.022910377010703087
Validation loss = 0.026959972456097603
Validation loss = 0.023623283952474594
Validation loss = 0.02511722408235073
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02271009236574173
Validation loss = 0.023182176053524017
Validation loss = 0.02073133923113346
Validation loss = 0.02241058647632599
Validation loss = 0.02239224500954151
Validation loss = 0.02296292781829834
Validation loss = 0.022861596196889877
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.027862627059221268
Validation loss = 0.024334361776709557
Validation loss = 0.02528483234345913
Validation loss = 0.02474060095846653
Validation loss = 0.025442050769925117
Validation loss = 0.02571612410247326
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023392101749777794
Validation loss = 0.02158711850643158
Validation loss = 0.021991277113556862
Validation loss = 0.021675722673535347
Validation loss = 0.025959795340895653
Validation loss = 0.022959569469094276
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02204756811261177
Validation loss = 0.02241791971027851
Validation loss = 0.022759996354579926
Validation loss = 0.02345235086977482
Validation loss = 0.02461475506424904
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21929824561403508
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21908471275559882
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2188715953307393
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21865889212827988
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21844660194174756
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21823472356935014
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2180232558139535
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21781219748305905
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21760154738878143
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21739130434782608
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2171814671814672
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21697203471552556
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21676300578034682
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2165543792107796
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21634615384615385
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21613832853025935
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21593090211132437
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21572387344199426
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21551724137931033
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.215311004784689
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21510516252390058
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2148997134670487
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21469465648854963
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21448999046711154
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21428571428571427
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00091  |
| Iteration     | 40        |
| MaximumReturn | -0.000712 |
| MinimumReturn | -0.00155  |
| TotalSamples  | 69972     |
-----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026598356664180756
Validation loss = 0.023376349359750748
Validation loss = 0.026402631774544716
Validation loss = 0.024551428854465485
Validation loss = 0.023378141224384308
Validation loss = 0.024599207565188408
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024392006918787956
Validation loss = 0.02219507470726967
Validation loss = 0.02362695522606373
Validation loss = 0.02093326300382614
Validation loss = 0.022368930280208588
Validation loss = 0.022780291736125946
Validation loss = 0.022162916138768196
Validation loss = 0.024723635986447334
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023892253637313843
Validation loss = 0.02421189285814762
Validation loss = 0.023048898205161095
Validation loss = 0.024328220635652542
Validation loss = 0.023508956655859947
Validation loss = 0.025074932724237442
Validation loss = 0.022684751078486443
Validation loss = 0.024686863645911217
Validation loss = 0.02422558329999447
Validation loss = 0.02401120588183403
Validation loss = 0.023407122120261192
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02444206364452839
Validation loss = 0.021868588402867317
Validation loss = 0.02241511456668377
Validation loss = 0.021634310483932495
Validation loss = 0.02134564146399498
Validation loss = 0.024697987362742424
Validation loss = 0.02220866270363331
Validation loss = 0.021356526762247086
Validation loss = 0.023415587842464447
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021936606615781784
Validation loss = 0.024188190698623657
Validation loss = 0.024150090292096138
Validation loss = 0.022501779720187187
Validation loss = 0.022891616448760033
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21408182683158897
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21387832699619772
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21367521367521367
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21347248576850095
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2132701421800948
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21306818181818182
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21286660359508042
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21266540642722118
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21246458923512748
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21226415094339623
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21206409048067862
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.211864406779661
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2116650987770461
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21146616541353383
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2112676056338028
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21106941838649157
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21087160262417995
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21067415730337077
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21047708138447146
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2102803738317757
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21008403361344538
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20988805970149255
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2096924510717614
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20949720670391062
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20930232558139536
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00725 |
| Iteration     | 41       |
| MaximumReturn | -0.00543 |
| MinimumReturn | -0.0118  |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02433188445866108
Validation loss = 0.024387767538428307
Validation loss = 0.02719810977578163
Validation loss = 0.02414892427623272
Validation loss = 0.023295655846595764
Validation loss = 0.024847248569130898
Validation loss = 0.024697182700037956
Validation loss = 0.025998085737228394
Validation loss = 0.023942584171891212
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0236069206148386
Validation loss = 0.022592952474951744
Validation loss = 0.024033881723880768
Validation loss = 0.022453732788562775
Validation loss = 0.022686995565891266
Validation loss = 0.021346643567085266
Validation loss = 0.022855119779706
Validation loss = 0.02220589481294155
Validation loss = 0.02353663742542267
Validation loss = 0.02352731302380562
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025004282593727112
Validation loss = 0.026978887617588043
Validation loss = 0.02397174946963787
Validation loss = 0.02474917657673359
Validation loss = 0.025872597470879555
Validation loss = 0.0232526957988739
Validation loss = 0.024615315720438957
Validation loss = 0.024797450751066208
Validation loss = 0.023789698258042336
Validation loss = 0.02456054836511612
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022117899730801582
Validation loss = 0.02199702523648739
Validation loss = 0.02248675748705864
Validation loss = 0.02219805121421814
Validation loss = 0.02124084159731865
Validation loss = 0.02340150997042656
Validation loss = 0.02202605828642845
Validation loss = 0.021518515422940254
Validation loss = 0.021145286038517952
Validation loss = 0.024201765656471252
Validation loss = 0.022246943786740303
Validation loss = 0.02229943871498108
Validation loss = 0.022437483072280884
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02491755038499832
Validation loss = 0.02341291308403015
Validation loss = 0.022555554285645485
Validation loss = 0.02306724525988102
Validation loss = 0.024189313873648643
Validation loss = 0.02539558708667755
Validation loss = 0.022249281406402588
Validation loss = 0.023133952170610428
Validation loss = 0.023173164576292038
Validation loss = 0.023124339058995247
Validation loss = 0.027487564831972122
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20910780669144982
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20891364902506965
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20871985157699444
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20852641334569044
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20833333333333334
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20814061054579094
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20794824399260628
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2077562326869806
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20756457564575645
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2073732718894009
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20718232044198895
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20699172033118676
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20680147058823528
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2066115702479339
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20642201834862386
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20623281393217233
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20604395604395603
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20585544373284537
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2056672760511883
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2054794520547945
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20529197080291972
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20510483135824978
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20491803278688525
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20473157415832574
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20454545454545456
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000878 |
| Iteration     | 42        |
| MaximumReturn | -0.000575 |
| MinimumReturn | -0.00136  |
| TotalSamples  | 73304     |
-----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023529620841145515
Validation loss = 0.024955665692687035
Validation loss = 0.023756040260195732
Validation loss = 0.027401160448789597
Validation loss = 0.025388184934854507
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021032385528087616
Validation loss = 0.02296346426010132
Validation loss = 0.022633515298366547
Validation loss = 0.02093290165066719
Validation loss = 0.02185320481657982
Validation loss = 0.02271493896842003
Validation loss = 0.022613225504755974
Validation loss = 0.020965436473488808
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023094957694411278
Validation loss = 0.02370174042880535
Validation loss = 0.023079028353095055
Validation loss = 0.02387358993291855
Validation loss = 0.02286355011165142
Validation loss = 0.023970888927578926
Validation loss = 0.026358462870121002
Validation loss = 0.023683879524469376
Validation loss = 0.023677682504057884
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02228572778403759
Validation loss = 0.023406438529491425
Validation loss = 0.022373326122760773
Validation loss = 0.02447156421840191
Validation loss = 0.02176530286669731
Validation loss = 0.0213158018887043
Validation loss = 0.023450467735528946
Validation loss = 0.021770291030406952
Validation loss = 0.021320587024092674
Validation loss = 0.022554831579327583
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.027816209942102432
Validation loss = 0.023020872846245766
Validation loss = 0.023336362093687057
Validation loss = 0.02314750663936138
Validation loss = 0.023168936371803284
Validation loss = 0.02266215905547142
Validation loss = 0.021450284868478775
Validation loss = 0.02214762754738331
Validation loss = 0.022229457274079323
Validation loss = 0.023219561204314232
Validation loss = 0.024047188460826874
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20435967302452315
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20417422867513613
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20398912058023572
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20380434782608695
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20361990950226244
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20343580470162748
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2032520325203252
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20306859205776173
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20288548241659152
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20270270270270271
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20252025202520252
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20233812949640287
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20215633423180593
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20197486535008977
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20179372197309417
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20161290322580644
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20143240823634737
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20125223613595708
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20107238605898123
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20089285714285715
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20071364852809992
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20053475935828877
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20035618878005343
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2001779359430605
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00359 |
| Iteration     | 43       |
| MaximumReturn | -0.00261 |
| MinimumReturn | -0.00519 |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02228805422782898
Validation loss = 0.02531955949962139
Validation loss = 0.023092489689588547
Validation loss = 0.024848943576216698
Validation loss = 0.030246641486883163
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02149108797311783
Validation loss = 0.022118516266345978
Validation loss = 0.022083178162574768
Validation loss = 0.02185663767158985
Validation loss = 0.025542479008436203
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0252176932990551
Validation loss = 0.02377491630613804
Validation loss = 0.023423420265316963
Validation loss = 0.023990988731384277
Validation loss = 0.022689590230584145
Validation loss = 0.02265859581530094
Validation loss = 0.023177316412329674
Validation loss = 0.024639759212732315
Validation loss = 0.024081839248538017
Validation loss = 0.022989466786384583
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023096946999430656
Validation loss = 0.022130491212010384
Validation loss = 0.023534661158919334
Validation loss = 0.02220645733177662
Validation loss = 0.020133284851908684
Validation loss = 0.02130395919084549
Validation loss = 0.02388869784772396
Validation loss = 0.02116239443421364
Validation loss = 0.021447960287332535
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02087227627635002
Validation loss = 0.024269528687000275
Validation loss = 0.023204369470477104
Validation loss = 0.021340103819966316
Validation loss = 0.022058885544538498
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19982238010657194
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19964507542147295
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19946808510638298
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19929140832595216
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19911504424778761
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1989389920424403
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19876325088339222
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19858781994704325
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1984126984126984
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19823788546255505
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19806338028169015
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19788918205804748
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1977152899824253
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19754170324846357
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19736842105263158
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1971954425942156
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19702276707530647
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1968503937007874
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19667832167832167
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1965065502183406
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19633507853403143
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1961639058413252
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19599303135888502
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.195822454308094
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1956521739130435
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000841 |
| Iteration     | 44        |
| MaximumReturn | -0.000612 |
| MinimumReturn | -0.00122  |
| TotalSamples  | 76636     |
-----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02833244390785694
Validation loss = 0.025212926790118217
Validation loss = 0.022926554083824158
Validation loss = 0.023052312433719635
Validation loss = 0.024366725236177444
Validation loss = 0.022425197064876556
Validation loss = 0.023899158462882042
Validation loss = 0.023472974076867104
Validation loss = 0.023352842777967453
Validation loss = 0.023777062073349953
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023816930130124092
Validation loss = 0.02212405949831009
Validation loss = 0.02120673656463623
Validation loss = 0.026163162663578987
Validation loss = 0.021398527547717094
Validation loss = 0.02237745188176632
Validation loss = 0.023318324238061905
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023190410807728767
Validation loss = 0.022939790040254593
Validation loss = 0.023546872660517693
Validation loss = 0.02435893379151821
Validation loss = 0.023526832461357117
Validation loss = 0.022798193618655205
Validation loss = 0.02275589480996132
Validation loss = 0.024409057572484016
Validation loss = 0.027050748467445374
Validation loss = 0.022739693522453308
Validation loss = 0.022443847730755806
Validation loss = 0.02279684506356716
Validation loss = 0.02272496186196804
Validation loss = 0.022884542122483253
Validation loss = 0.02379361167550087
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020754417404532433
Validation loss = 0.020530737936496735
Validation loss = 0.023629166185855865
Validation loss = 0.020956341177225113
Validation loss = 0.02204783819615841
Validation loss = 0.021926114335656166
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021810339763760567
Validation loss = 0.022980835288763046
Validation loss = 0.02257358841598034
Validation loss = 0.02345709502696991
Validation loss = 0.023701339960098267
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19548218940052128
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1953125
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19514310494362533
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1949740034662045
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19480519480519481
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19463667820069205
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1944684528954192
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19430051813471502
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19413287316652286
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1939655172413793
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1937984496124031
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.193631669535284
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1934651762682717
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19329896907216496
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19313304721030042
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19296740994854203
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1928020565552699
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19263698630136986
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1924721984602224
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19230769230769232
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19214346712211786
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19197952218430034
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1918158567774936
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19165247018739354
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19148936170212766
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00258 |
| Iteration     | 45       |
| MaximumReturn | -0.00188 |
| MinimumReturn | -0.00395 |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023490695282816887
Validation loss = 0.024581648409366608
Validation loss = 0.022839099168777466
Validation loss = 0.023441174998879433
Validation loss = 0.021852225065231323
Validation loss = 0.028939494863152504
Validation loss = 0.02267824485898018
Validation loss = 0.02436229959130287
Validation loss = 0.021690186113119125
Validation loss = 0.023289714008569717
Validation loss = 0.02243616245687008
Validation loss = 0.022645775228738785
Validation loss = 0.02392386458814144
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02073623798787594
Validation loss = 0.021578751504421234
Validation loss = 0.020881250500679016
Validation loss = 0.022899560630321503
Validation loss = 0.02093520760536194
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02236800454556942
Validation loss = 0.023135198280215263
Validation loss = 0.023348623886704445
Validation loss = 0.02251448668539524
Validation loss = 0.022685157135128975
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02084214612841606
Validation loss = 0.02013978734612465
Validation loss = 0.021641666069626808
Validation loss = 0.021016797050833702
Validation loss = 0.021537048742175102
Validation loss = 0.02220635674893856
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022536084055900574
Validation loss = 0.022723974660038948
Validation loss = 0.022745942696928978
Validation loss = 0.022323699668049812
Validation loss = 0.020820187404751778
Validation loss = 0.022441621869802475
Validation loss = 0.02252228558063507
Validation loss = 0.02299495041370392
Validation loss = 0.021469831466674805
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1921768707482993
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19201359388275277
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19185059422750425
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19168787107718405
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1923728813559322
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1922099915325995
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19204737732656516
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19188503803888418
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19172297297297297
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19156118143459916
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19139966273187184
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19123841617523168
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19107744107744107
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19091673675357443
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1907563025210084
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19059613769941225
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19043624161073824
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.19279128248113997
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19262981574539365
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19246861924686193
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19230769230769232
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1921470342522974
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19198664440734559
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19182652210175147
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.19666666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0419  |
| Iteration     | 46       |
| MaximumReturn | -0.0148  |
| MinimumReturn | -0.4     |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025303691625595093
Validation loss = 0.02294507436454296
Validation loss = 0.02229355275630951
Validation loss = 0.022480007261037827
Validation loss = 0.022342834621667862
Validation loss = 0.023256151005625725
Validation loss = 0.02551114559173584
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020838763564825058
Validation loss = 0.02128206565976143
Validation loss = 0.022068586200475693
Validation loss = 0.021114520728588104
Validation loss = 0.022873828187584877
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024777699261903763
Validation loss = 0.023167109116911888
Validation loss = 0.02379801496863365
Validation loss = 0.022780273109674454
Validation loss = 0.023416025564074516
Validation loss = 0.022916480898857117
Validation loss = 0.023558884859085083
Validation loss = 0.026108086109161377
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020579656586050987
Validation loss = 0.02109014429152012
Validation loss = 0.020136913284659386
Validation loss = 0.02147086337208748
Validation loss = 0.021203141659498215
Validation loss = 0.021110549569129944
Validation loss = 0.020961524918675423
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.032254140824079514
Validation loss = 0.020753536373376846
Validation loss = 0.020926084369421005
Validation loss = 0.023024696856737137
Validation loss = 0.02144629880785942
Validation loss = 0.021929634734988213
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1965029142381349
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19633943427620631
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19617622610141314
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19601328903654486
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.195850622406639
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1956882255389718
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19552609776304888
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19536423841059603
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19520264681555005
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19504132231404958
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1948802642444261
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19471947194719472
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19455894476504534
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19439868204283361
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.194238683127572
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19407894736842105
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19391947411668037
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19376026272577998
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19360131255127153
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19344262295081968
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19328419328419327
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19312602291325695
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1929681112019624
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19281045751633988
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1926530612244898
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00694 |
| Iteration     | 47       |
| MaximumReturn | -0.00424 |
| MinimumReturn | -0.0129  |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022122705355286598
Validation loss = 0.022780025377869606
Validation loss = 0.02396296337246895
Validation loss = 0.023609142750501633
Validation loss = 0.02332160994410515
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021514683961868286
Validation loss = 0.021832965314388275
Validation loss = 0.02168419025838375
Validation loss = 0.02124987728893757
Validation loss = 0.022107552736997604
Validation loss = 0.02146090939640999
Validation loss = 0.022351616993546486
Validation loss = 0.022228699177503586
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02396605908870697
Validation loss = 0.02346769906580448
Validation loss = 0.02362043969333172
Validation loss = 0.022339871153235435
Validation loss = 0.023217031732201576
Validation loss = 0.022372763603925705
Validation loss = 0.023113036528229713
Validation loss = 0.02292630448937416
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020795129239559174
Validation loss = 0.0202687568962574
Validation loss = 0.021839700639247894
Validation loss = 0.023647630587220192
Validation loss = 0.021691864356398582
Validation loss = 0.0207052044570446
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02101549133658409
Validation loss = 0.02318318746984005
Validation loss = 0.02139846608042717
Validation loss = 0.022919025272130966
Validation loss = 0.02120112255215645
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19249592169657423
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19233903830480847
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19218241042345277
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1920260374288039
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.191869918699187
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1917140536149472
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19155844155844157
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19140308191403083
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1912479740680713
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1910931174089069
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19093851132686085
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19078415521422798
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19063004846526657
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19047619047619047
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19032258064516128
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19016921837228043
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19001610305958133
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1898632341110217
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18971061093247588
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1895582329317269
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18940609951845908
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1892542101042502
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1891025641025641
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.188951160928743
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1888
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00291 |
| Iteration     | 48       |
| MaximumReturn | -0.00194 |
| MinimumReturn | -0.00433 |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022210905328392982
Validation loss = 0.022892318665981293
Validation loss = 0.023032646626234055
Validation loss = 0.02228875458240509
Validation loss = 0.024142857640981674
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02277735434472561
Validation loss = 0.0207369327545166
Validation loss = 0.020734935998916626
Validation loss = 0.022145450115203857
Validation loss = 0.021080875769257545
Validation loss = 0.02161606028676033
Validation loss = 0.021766453981399536
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024945881217718124
Validation loss = 0.023062216117978096
Validation loss = 0.02339821867644787
Validation loss = 0.022954383864998817
Validation loss = 0.022719845175743103
Validation loss = 0.025234440341591835
Validation loss = 0.022722870111465454
Validation loss = 0.023129718378186226
Validation loss = 0.022480590268969536
Validation loss = 0.02221601828932762
Validation loss = 0.024496201425790787
Validation loss = 0.023706573992967606
Validation loss = 0.023383766412734985
Validation loss = 0.022673502564430237
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02139447070658207
Validation loss = 0.02293388731777668
Validation loss = 0.020784713327884674
Validation loss = 0.022240810096263885
Validation loss = 0.021210214123129845
Validation loss = 0.020527716726064682
Validation loss = 0.02243915945291519
Validation loss = 0.02200627326965332
Validation loss = 0.020354192703962326
Validation loss = 0.019944652915000916
Validation loss = 0.020765086635947227
Validation loss = 0.021627113223075867
Validation loss = 0.020454930141568184
Validation loss = 0.020974062383174896
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022386731579899788
Validation loss = 0.022378044202923775
Validation loss = 0.021821491420269012
Validation loss = 0.0241067074239254
Validation loss = 0.021097835153341293
Validation loss = 0.021990705281496048
Validation loss = 0.024926507845520973
Validation loss = 0.022344673052430153
Validation loss = 0.022545937448740005
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18864908073541167
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18849840255591055
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18834796488427774
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18819776714513556
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18804780876494023
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18789808917197454
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1877486077963405
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1875993640699523
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.187450357426529
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1873015873015873
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18715305313243458
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18700475435816163
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1868566904196358
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18670886075949367
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1865612648221344
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18641390205371247
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.186266771902131
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1861198738170347
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18597320724980299
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1858267716535433
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18568056648308418
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18553459119496854
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18538884524744698
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18524332810047095
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18509803921568627
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00508 |
| Iteration     | 49       |
| MaximumReturn | -0.00292 |
| MinimumReturn | -0.00777 |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02170475386083126
Validation loss = 0.023476528003811836
Validation loss = 0.02421124465763569
Validation loss = 0.021464381366968155
Validation loss = 0.02408478409051895
Validation loss = 0.021850211545825005
Validation loss = 0.022563477978110313
Validation loss = 0.02213246002793312
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02157561480998993
Validation loss = 0.02051726169884205
Validation loss = 0.020886177197098732
Validation loss = 0.021292386576533318
Validation loss = 0.021366428583860397
Validation loss = 0.02100197970867157
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0215387511998415
Validation loss = 0.02221282385289669
Validation loss = 0.021600069478154182
Validation loss = 0.02229647897183895
Validation loss = 0.023036062717437744
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0201854445040226
Validation loss = 0.0204776581376791
Validation loss = 0.020575514063239098
Validation loss = 0.020695481449365616
Validation loss = 0.0229630284011364
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0214794110506773
Validation loss = 0.02197195403277874
Validation loss = 0.02349221520125866
Validation loss = 0.021070633083581924
Validation loss = 0.023197922855615616
Validation loss = 0.021501529961824417
Validation loss = 0.021346962079405785
Validation loss = 0.02194206230342388
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18495297805642633
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18480814408770557
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18466353677621283
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18451915559030493
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.184375
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18423106947697113
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18408736349453977
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18394388152766952
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1838006230529595
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18365758754863815
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18351477449455678
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18337218337218336
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18322981366459629
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1830876648564779
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18294573643410852
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18280402788536018
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1826625386996904
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18252126836813612
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18238021638330756
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18223938223938224
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18209876543209877
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18195836545875096
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18181818181818182
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18167821401077752
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18153846153846154
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000874 |
| Iteration     | 50        |
| MaximumReturn | -0.000703 |
| MinimumReturn | -0.00121  |
| TotalSamples  | 86632     |
-----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021476684138178825
Validation loss = 0.022721581161022186
Validation loss = 0.022214848548173904
Validation loss = 0.021019410341978073
Validation loss = 0.023677967488765717
Validation loss = 0.022301433607935905
Validation loss = 0.023764951154589653
Validation loss = 0.021555842831730843
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020531700924038887
Validation loss = 0.02148338034749031
Validation loss = 0.022213248535990715
Validation loss = 0.020904937759041786
Validation loss = 0.021346960216760635
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021589115262031555
Validation loss = 0.021739115938544273
Validation loss = 0.022633004933595657
Validation loss = 0.02296506054699421
Validation loss = 0.023875391110777855
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021624745801091194
Validation loss = 0.0198176521807909
Validation loss = 0.020345298573374748
Validation loss = 0.02318965084850788
Validation loss = 0.021101204678416252
Validation loss = 0.020970579236745834
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024292565882205963
Validation loss = 0.020657997578382492
Validation loss = 0.02119726687669754
Validation loss = 0.022247295826673508
Validation loss = 0.021951444447040558
Validation loss = 0.021418660879135132
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1813989239046887
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18125960061443933
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18112049117421336
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18098159509202455
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18084291187739462
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1807044410413476
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.180566182096404
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18042813455657492
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18029029793735676
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1801526717557252
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18001525553012968
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1798780487804878
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17974105102817975
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1796042617960426
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17946768060836502
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17933130699088146
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1791951404707669
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17905918057663125
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17892342683851403
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1787878787878788
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17865253595760788
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17851739788199697
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17838246409674982
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1782477341389728
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1781132075471698
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000807 |
| Iteration     | 51        |
| MaximumReturn | -0.000574 |
| MinimumReturn | -0.00101  |
| TotalSamples  | 88298     |
-----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02222718484699726
Validation loss = 0.022984404116868973
Validation loss = 0.02263399213552475
Validation loss = 0.021614300087094307
Validation loss = 0.022363124415278435
Validation loss = 0.02133024111390114
Validation loss = 0.02256280928850174
Validation loss = 0.0218873992562294
Validation loss = 0.021288936957716942
Validation loss = 0.022981584072113037
Validation loss = 0.022225534543395042
Validation loss = 0.023210812360048294
Validation loss = 0.022271454334259033
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019542235881090164
Validation loss = 0.02032347023487091
Validation loss = 0.02099464274942875
Validation loss = 0.020737839862704277
Validation loss = 0.02159368246793747
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02086050435900688
Validation loss = 0.02276976779103279
Validation loss = 0.021491635590791702
Validation loss = 0.02153092622756958
Validation loss = 0.023716801777482033
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02065575122833252
Validation loss = 0.02127779833972454
Validation loss = 0.0215154942125082
Validation loss = 0.02170792780816555
Validation loss = 0.020947830751538277
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022510791197419167
Validation loss = 0.021129826083779335
Validation loss = 0.022528065368533134
Validation loss = 0.020565245300531387
Validation loss = 0.02277001179754734
Validation loss = 0.0214292760938406
Validation loss = 0.022809242829680443
Validation loss = 0.02259986475110054
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1779788838612368
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17784476262245666
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17771084337349397
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17757712565838976
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1774436090225564
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17731029301277235
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17717717717717718
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17704426106526633
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17691154422788605
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17677902621722846
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17664670658682635
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17651458489154825
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17638266068759342
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17625093353248694
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1761194029850746
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17598806860551827
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17585692995529062
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1757259865971705
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17559523809523808
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1754646840148699
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17533432392273401
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17520415738678544
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17507418397626112
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17494440326167532
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1748148148148148
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0157  |
| Iteration     | 52       |
| MaximumReturn | -0.00956 |
| MinimumReturn | -0.0227  |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022002795711159706
Validation loss = 0.021402426064014435
Validation loss = 0.02074814960360527
Validation loss = 0.020625613629817963
Validation loss = 0.02229979820549488
Validation loss = 0.022541392594575882
Validation loss = 0.021550089120864868
Validation loss = 0.02219502627849579
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02224618010222912
Validation loss = 0.020646462216973305
Validation loss = 0.019453536719083786
Validation loss = 0.020976614207029343
Validation loss = 0.020936748012900352
Validation loss = 0.019556406885385513
Validation loss = 0.020376652479171753
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022022195160388947
Validation loss = 0.022795770317316055
Validation loss = 0.02152775228023529
Validation loss = 0.022284243255853653
Validation loss = 0.021342409774661064
Validation loss = 0.02135990932583809
Validation loss = 0.022596200928092003
Validation loss = 0.02039485238492489
Validation loss = 0.021856529638171196
Validation loss = 0.021921660751104355
Validation loss = 0.021421141922473907
Validation loss = 0.02125842310488224
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02005772478878498
Validation loss = 0.01982133835554123
Validation loss = 0.019970137625932693
Validation loss = 0.021781450137495995
Validation loss = 0.019917087629437447
Validation loss = 0.019319314509630203
Validation loss = 0.020264334976673126
Validation loss = 0.021847473457455635
Validation loss = 0.020289506763219833
Validation loss = 0.020498408004641533
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0203559510409832
Validation loss = 0.020881950855255127
Validation loss = 0.02275068312883377
Validation loss = 0.021271200850605965
Validation loss = 0.02120574191212654
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17468541820873426
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17455621301775148
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17442719881744273
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17429837518463812
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17416974169741697
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17404129793510326
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17391304347826086
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17378497790868924
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17365710080941868
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17352941176470588
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1734019103600294
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17327459618208516
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1731474688187821
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17302052785923755
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1728937728937729
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17276720351390923
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17264081931236283
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17251461988304093
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17238860482103727
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17226277372262774
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17213712618526622
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17201166180758018
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17188638018936636
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1717612809315866
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17163636363636364
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000944 |
| Iteration     | 53        |
| MaximumReturn | -0.00066  |
| MinimumReturn | -0.00123  |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022272098809480667
Validation loss = 0.020979922264814377
Validation loss = 0.021758364513516426
Validation loss = 0.023700615391135216
Validation loss = 0.0224688071757555
Validation loss = 0.020816553384065628
Validation loss = 0.023117192089557648
Validation loss = 0.02470853365957737
Validation loss = 0.023230545222759247
Validation loss = 0.02233486995100975
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020501721650362015
Validation loss = 0.023902304470539093
Validation loss = 0.020695723593235016
Validation loss = 0.020829644054174423
Validation loss = 0.022268665954470634
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022961579263210297
Validation loss = 0.02242531254887581
Validation loss = 0.022137751802802086
Validation loss = 0.02090047299861908
Validation loss = 0.02379670925438404
Validation loss = 0.020643850788474083
Validation loss = 0.023021239787340164
Validation loss = 0.021295873448252678
Validation loss = 0.023536987602710724
Validation loss = 0.02392551116645336
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02286815643310547
Validation loss = 0.020387541502714157
Validation loss = 0.019635597243905067
Validation loss = 0.019989848136901855
Validation loss = 0.02075444720685482
Validation loss = 0.02005070261657238
Validation loss = 0.02112255059182644
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022062642499804497
Validation loss = 0.023980779573321342
Validation loss = 0.020865224301815033
Validation loss = 0.023871460929512978
Validation loss = 0.0216374509036541
Validation loss = 0.021034371107816696
Validation loss = 0.02120506763458252
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17151162790697674
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17138707334785766
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17126269956458637
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17113850616388687
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17101449275362318
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17089065894279507
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.170767004341534
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17064352856109907
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17052023121387283
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1703971119133574
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17027417027417027
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17015140591204037
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17002881844380405
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16990640748740102
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1697841726618705
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16966211358734723
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16954022988505746
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16941852117731515
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16929698708751795
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16917562724014337
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16905444126074498
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16893342877594847
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1688125894134478
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16869192280200143
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16857142857142857
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000874 |
| Iteration     | 54        |
| MaximumReturn | -0.000634 |
| MinimumReturn | -0.00164  |
| TotalSamples  | 93296     |
-----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02110954374074936
Validation loss = 0.02147572860121727
Validation loss = 0.02410847134888172
Validation loss = 0.02154390886425972
Validation loss = 0.02304077334702015
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020141655579209328
Validation loss = 0.022307759150862694
Validation loss = 0.020103711634874344
Validation loss = 0.020772118121385574
Validation loss = 0.020056305453181267
Validation loss = 0.021436894312500954
Validation loss = 0.020725399255752563
Validation loss = 0.021336840465664864
Validation loss = 0.02097129635512829
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02074582874774933
Validation loss = 0.02129535563290119
Validation loss = 0.021907765418291092
Validation loss = 0.02265716716647148
Validation loss = 0.022659607231616974
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023748038336634636
Validation loss = 0.019446000456809998
Validation loss = 0.019611837342381477
Validation loss = 0.02512967400252819
Validation loss = 0.01934761181473732
Validation loss = 0.02139616198837757
Validation loss = 0.019819367676973343
Validation loss = 0.020292023196816444
Validation loss = 0.01959811896085739
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021676987409591675
Validation loss = 0.022504311054944992
Validation loss = 0.02138626016676426
Validation loss = 0.02063596062362194
Validation loss = 0.020899061113595963
Validation loss = 0.021358003839850426
Validation loss = 0.02137952856719494
Validation loss = 0.020830003544688225
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1684511063526053
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16833095577746077
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16821097647897362
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16809116809116809
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16797153024911032
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1678520625889047
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1677327647476901
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16761363636363635
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16749467707594037
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1673758865248227
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16725726435152374
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1671388101983003
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1670205237084218
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1669024045261669
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1667844522968198
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16666666666666666
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16654904728299225
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16643159379407615
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16631430584918958
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16619718309859155
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1660802251935257
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1659634317862166
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16584680252986647
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16573033707865167
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1656140350877193
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00188 |
| Iteration     | 55       |
| MaximumReturn | -0.00139 |
| MinimumReturn | -0.00271 |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021513499319553375
Validation loss = 0.020212210714817047
Validation loss = 0.022704077884554863
Validation loss = 0.022927124053239822
Validation loss = 0.02114708162844181
Validation loss = 0.020795682445168495
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023489074781537056
Validation loss = 0.01987210102379322
Validation loss = 0.020811675116419792
Validation loss = 0.02063475176692009
Validation loss = 0.022253599017858505
Validation loss = 0.02275320142507553
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02344098500907421
Validation loss = 0.021860498934984207
Validation loss = 0.020782457664608955
Validation loss = 0.021329790353775024
Validation loss = 0.021377796307206154
Validation loss = 0.021542204543948174
Validation loss = 0.022182127460837364
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021351071074604988
Validation loss = 0.01967768371105194
Validation loss = 0.020315440371632576
Validation loss = 0.020981505513191223
Validation loss = 0.019637661054730415
Validation loss = 0.02004506252706051
Validation loss = 0.019488008692860603
Validation loss = 0.021403413265943527
Validation loss = 0.02226763404905796
Validation loss = 0.019917398691177368
Validation loss = 0.01885300688445568
Validation loss = 0.022580552846193314
Validation loss = 0.019687317311763763
Validation loss = 0.019854197278618813
Validation loss = 0.021766753867268562
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020037947222590446
Validation loss = 0.022263433784246445
Validation loss = 0.020662929862737656
Validation loss = 0.02137344516813755
Validation loss = 0.02007942460477352
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16549789621318373
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16538192011212333
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16526610644257703
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16515045486354094
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16503496503496504
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16491963661774983
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.164804469273743
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16468946266573622
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16457461645746166
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16445993031358885
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16434540389972144
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16423103688239388
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16411682892906815
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16400277970813065
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1638888888888889
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16377515614156835
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1636615811373093
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16354816354816354
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1634349030470914
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16332179930795848
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1632088520055325
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1630960608154803
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16298342541436464
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16287094547964112
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16275862068965516
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00154 |
| Iteration     | 56       |
| MaximumReturn | -0.00098 |
| MinimumReturn | -0.00233 |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021471915766596794
Validation loss = 0.022554120048880577
Validation loss = 0.02063872665166855
Validation loss = 0.021450037136673927
Validation loss = 0.020450886338949203
Validation loss = 0.020813940092921257
Validation loss = 0.02461876906454563
Validation loss = 0.021048052236437798
Validation loss = 0.020363489165902138
Validation loss = 0.020834947004914284
Validation loss = 0.021686360239982605
Validation loss = 0.021418996155261993
Validation loss = 0.02100004255771637
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020921753719449043
Validation loss = 0.019446352496743202
Validation loss = 0.01981384865939617
Validation loss = 0.020975464954972267
Validation loss = 0.019928690046072006
Validation loss = 0.020360110327601433
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020376769825816154
Validation loss = 0.022348502650856972
Validation loss = 0.02095509134232998
Validation loss = 0.02315068244934082
Validation loss = 0.020983891561627388
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021616540849208832
Validation loss = 0.020130636170506477
Validation loss = 0.019699709489941597
Validation loss = 0.022058462724089622
Validation loss = 0.019150277599692345
Validation loss = 0.020077204331755638
Validation loss = 0.02138953097164631
Validation loss = 0.019730815663933754
Validation loss = 0.021919233724474907
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021114522591233253
Validation loss = 0.02206443063914776
Validation loss = 0.02064528502523899
Validation loss = 0.021820565685629845
Validation loss = 0.021376075223088264
Validation loss = 0.020551731809973717
Validation loss = 0.02065233141183853
Validation loss = 0.020508654415607452
Validation loss = 0.021919459104537964
Validation loss = 0.020259853452444077
Validation loss = 0.020440204069018364
Validation loss = 0.02160828746855259
Validation loss = 0.02143343724310398
Validation loss = 0.020727483555674553
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16264645072363887
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.162534435261708
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1624225739848589
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1623108665749656
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16219931271477664
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1620879120879121
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16197666437886069
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16186556927297668
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16175462645647704
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16164383561643836
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16153319644079397
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16142270861833105
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16131237183868763
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16120218579234974
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16109215017064846
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16098226466575716
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1608725289706885
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16076294277929154
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16065350578624915
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16054421768707483
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16043507817811012
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16032608695652173
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16021724372029872
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16010854816824965
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000899 |
| Iteration     | 57        |
| MaximumReturn | -0.00068  |
| MinimumReturn | -0.00187  |
| TotalSamples  | 98294     |
-----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02128957211971283
Validation loss = 0.020354105159640312
Validation loss = 0.022923916578292847
Validation loss = 0.022004088386893272
Validation loss = 0.02051699347794056
Validation loss = 0.02048836275935173
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020504038780927658
Validation loss = 0.020051777362823486
Validation loss = 0.02057606540620327
Validation loss = 0.019681550562381744
Validation loss = 0.019549228250980377
Validation loss = 0.021415000781416893
Validation loss = 0.021229512989521027
Validation loss = 0.020513154566287994
Validation loss = 0.022975649684667587
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020937934517860413
Validation loss = 0.021628087386488914
Validation loss = 0.021237706765532494
Validation loss = 0.02107558771967888
Validation loss = 0.025085369125008583
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020180437713861465
Validation loss = 0.02028363011777401
Validation loss = 0.019342945888638496
Validation loss = 0.020437121391296387
Validation loss = 0.01888568140566349
Validation loss = 0.024625930935144424
Validation loss = 0.020549947395920753
Validation loss = 0.019924098625779152
Validation loss = 0.0197686105966568
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02077893167734146
Validation loss = 0.019552849233150482
Validation loss = 0.020110346376895905
Validation loss = 0.02132379449903965
Validation loss = 0.020697681233286858
Validation loss = 0.020455773919820786
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15989159891598917
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15978334461746785
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15967523680649526
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15956727518593644
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15945945945945947
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15935178933153274
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1592442645074224
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15913688469318948
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15902964959568733
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1589225589225589
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15881561238223418
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15870880968392737
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1586021505376344
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1584956346541303
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15838926174496645
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15828303152246814
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1581769436997319
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1580709979906229
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15796519410977242
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15785953177257525
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15775401069518716
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1576486305945224
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.157543391188251
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15743829219479652
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15733333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000815 |
| Iteration     | 58        |
| MaximumReturn | -0.000632 |
| MinimumReturn | -0.00121  |
| TotalSamples  | 99960     |
-----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020439451560378075
Validation loss = 0.02015271969139576
Validation loss = 0.020821906626224518
Validation loss = 0.02020403929054737
Validation loss = 0.02024872973561287
Validation loss = 0.021608494222164154
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021966414526104927
Validation loss = 0.01963372342288494
Validation loss = 0.019845591858029366
Validation loss = 0.019107993692159653
Validation loss = 0.019684819504618645
Validation loss = 0.020607443526387215
Validation loss = 0.018946511670947075
Validation loss = 0.021825410425662994
Validation loss = 0.01972741261124611
Validation loss = 0.022453486919403076
Validation loss = 0.02040172554552555
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022922568023204803
Validation loss = 0.02078050561249256
Validation loss = 0.02124890498816967
Validation loss = 0.02016311325132847
Validation loss = 0.021636594086885452
Validation loss = 0.020488280802965164
Validation loss = 0.020851673558354378
Validation loss = 0.02017003484070301
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019987458363175392
Validation loss = 0.01941545680165291
Validation loss = 0.019604548811912537
Validation loss = 0.02083267644047737
Validation loss = 0.018870577216148376
Validation loss = 0.01930277608335018
Validation loss = 0.020820248872041702
Validation loss = 0.019630970433354378
Validation loss = 0.02090657502412796
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021316062659025192
Validation loss = 0.020475083962082863
Validation loss = 0.02037022076547146
Validation loss = 0.022718455642461777
Validation loss = 0.02075091563165188
Validation loss = 0.020479312166571617
Validation loss = 0.020591605454683304
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15722851432378415
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15712383488681758
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15701929474384566
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15691489361702127
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1568106312292359
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15670650730411687
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15660252156602522
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15649867374005305
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1563949635520212
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1562913907284768
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15618795499669094
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15608465608465608
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15598149372108394
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15587846763540292
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15577557755775578
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15567282321899736
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15557020435069216
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.155467720685112
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1553653719552337
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15526315789473685
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15516107823800132
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15505913272010513
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15495732107682206
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15485564304461943
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15475409836065573
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000826 |
| Iteration     | 59        |
| MaximumReturn | -0.00064  |
| MinimumReturn | -0.00106  |
| TotalSamples  | 101626    |
-----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024590864777565002
Validation loss = 0.022492244839668274
Validation loss = 0.02194339595735073
Validation loss = 0.02126193605363369
Validation loss = 0.021062517538666725
Validation loss = 0.021172724664211273
Validation loss = 0.02058776468038559
Validation loss = 0.021089892834424973
Validation loss = 0.022658254951238632
Validation loss = 0.02134036272764206
Validation loss = 0.020392445847392082
Validation loss = 0.023144418373703957
Validation loss = 0.0226522795855999
Validation loss = 0.020839251577854156
Validation loss = 0.021718338131904602
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02010529488325119
Validation loss = 0.020174171775579453
Validation loss = 0.018836194649338722
Validation loss = 0.01970636658370495
Validation loss = 0.020883016288280487
Validation loss = 0.020144589245319366
Validation loss = 0.02071288786828518
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020827163010835648
Validation loss = 0.020573459565639496
Validation loss = 0.024671949446201324
Validation loss = 0.02073046937584877
Validation loss = 0.022375669330358505
Validation loss = 0.02168431505560875
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019013185054063797
Validation loss = 0.02011943608522415
Validation loss = 0.0217894297093153
Validation loss = 0.020463889464735985
Validation loss = 0.02065426856279373
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02081671915948391
Validation loss = 0.020469054579734802
Validation loss = 0.019738873466849327
Validation loss = 0.024564992636442184
Validation loss = 0.02118752710521221
Validation loss = 0.020953945815563202
Validation loss = 0.01985749416053295
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15465268676277852
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15455140798952194
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1544502617801047
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15434924787442772
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1542483660130719
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1541476159372959
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15404699738903394
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15394651011089366
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15384615384615385
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15374592833876222
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15364583333333334
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15354586857514638
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15344603381014305
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15334632878492527
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15324675324675324
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15314730694354314
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15304798962386512
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15294880103694103
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15284974093264247
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15275080906148866
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15265200517464425
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15255332902391727
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1524547803617571
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15235635894125243
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15225806451612903
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00128 |
| Iteration     | 60       |
| MaximumReturn | -0.00093 |
| MinimumReturn | -0.00196 |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020877480506896973
Validation loss = 0.021069196984171867
Validation loss = 0.020784864202141762
Validation loss = 0.02244100160896778
Validation loss = 0.020737385377287865
Validation loss = 0.020446566864848137
Validation loss = 0.0225294828414917
Validation loss = 0.020493371412158012
Validation loss = 0.021511707454919815
Validation loss = 0.020927857607603073
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02131529711186886
Validation loss = 0.019943371415138245
Validation loss = 0.020417470484972
Validation loss = 0.019826889038085938
Validation loss = 0.020908527076244354
Validation loss = 0.02145267464220524
Validation loss = 0.01978776976466179
Validation loss = 0.019924337044358253
Validation loss = 0.01925220899283886
Validation loss = 0.019825676456093788
Validation loss = 0.01891721785068512
Validation loss = 0.019948400557041168
Validation loss = 0.020544104278087616
Validation loss = 0.019582893699407578
Validation loss = 0.019417021423578262
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020184455439448357
Validation loss = 0.02000495232641697
Validation loss = 0.02081838995218277
Validation loss = 0.020994260907173157
Validation loss = 0.020764341577887535
Validation loss = 0.020806236192584038
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021379441022872925
Validation loss = 0.019791502505540848
Validation loss = 0.01962267793715
Validation loss = 0.021409034729003906
Validation loss = 0.01853273995220661
Validation loss = 0.020945491269230843
Validation loss = 0.019765980541706085
Validation loss = 0.019701996818184853
Validation loss = 0.01955389603972435
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020021086558699608
Validation loss = 0.022060995921492577
Validation loss = 0.023479513823986053
Validation loss = 0.020383058115839958
Validation loss = 0.020656175911426544
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1521598968407479
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15206185567010308
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1519639407598197
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15186615186615188
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1517684887459807
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15167095115681234
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15157353885677585
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1514762516046213
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15137908915971776
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15128205128205127
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15118513773222295
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15108834827144688
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1509916826615483
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15089514066496162
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15079872204472844
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15070242656449553
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15060625398851307
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15051020408163265
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1504142766093053
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1503184713375796
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15022278803309994
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15012722646310434
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15003178639542275
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14993646759847523
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14984126984126983
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00385 |
| Iteration     | 61       |
| MaximumReturn | -0.00239 |
| MinimumReturn | -0.00645 |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02056599035859108
Validation loss = 0.021723590791225433
Validation loss = 0.02132098563015461
Validation loss = 0.020706772804260254
Validation loss = 0.02144988626241684
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018888862803578377
Validation loss = 0.019365552812814713
Validation loss = 0.02033642865717411
Validation loss = 0.019237510859966278
Validation loss = 0.018686441704630852
Validation loss = 0.01947525516152382
Validation loss = 0.019429173320531845
Validation loss = 0.021287666633725166
Validation loss = 0.020427588373422623
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01957973837852478
Validation loss = 0.020669011399149895
Validation loss = 0.021076960489153862
Validation loss = 0.02111233025789261
Validation loss = 0.020322877913713455
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019391536712646484
Validation loss = 0.020295370370149612
Validation loss = 0.02074294164776802
Validation loss = 0.020810287445783615
Validation loss = 0.019081812351942062
Validation loss = 0.02175632119178772
Validation loss = 0.019413653761148453
Validation loss = 0.02006004936993122
Validation loss = 0.01922696828842163
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021514972671866417
Validation loss = 0.02013425901532173
Validation loss = 0.020461685955524445
Validation loss = 0.02036367729306221
Validation loss = 0.01983777806162834
Validation loss = 0.019641412422060966
Validation loss = 0.021842781454324722
Validation loss = 0.021092232316732407
Validation loss = 0.02056133560836315
Validation loss = 0.020631982013583183
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14974619289340102
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14965123652504755
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14955640050697086
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14946168461051298
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14936708860759493
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14927261227071473
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14917825537294563
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1490840176879343
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14898989898989898
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14889589905362777
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14880201765447668
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.148708254568368
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1486146095717884
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1485210824417873
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14842767295597484
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14833438089252043
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14824120603015076
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14814814814814814
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1480552070263488
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14796238244514107
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14786967418546365
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14777708202880402
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1476846057571965
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14759224515322075
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1475
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00145  |
| Iteration     | 62        |
| MaximumReturn | -0.000916 |
| MinimumReturn | -0.00357  |
| TotalSamples  | 106624    |
-----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02057391032576561
Validation loss = 0.020501425489783287
Validation loss = 0.021240103989839554
Validation loss = 0.021731238812208176
Validation loss = 0.0206924881786108
Validation loss = 0.022086964920163155
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020931458100676537
Validation loss = 0.019472938030958176
Validation loss = 0.018705662339925766
Validation loss = 0.018894033506512642
Validation loss = 0.019554125145077705
Validation loss = 0.01965404860675335
Validation loss = 0.01938498020172119
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020620230585336685
Validation loss = 0.019731903448700905
Validation loss = 0.020418332889676094
Validation loss = 0.01970117725431919
Validation loss = 0.0205110851675272
Validation loss = 0.020469749346375465
Validation loss = 0.021788770332932472
Validation loss = 0.022164877504110336
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019396470859646797
Validation loss = 0.019286980852484703
Validation loss = 0.01988772489130497
Validation loss = 0.021555189043283463
Validation loss = 0.01959804818034172
Validation loss = 0.01937389001250267
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02229415997862816
Validation loss = 0.0210326686501503
Validation loss = 0.020642582327127457
Validation loss = 0.019610190764069557
Validation loss = 0.019984286278486252
Validation loss = 0.02011777274310589
Validation loss = 0.020552638918161392
Validation loss = 0.02151978202164173
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14740787008119924
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14731585518102372
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1472239550842171
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14713216957605985
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1470404984423676
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14694894146948942
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14685749844430615
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14676616915422885
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14667495338719702
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14658385093167703
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14649286157666047
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14640198511166252
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1463112213267204
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14622057001239158
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14613003095975233
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14603960396039603
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14594928880643165
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14585908529048208
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1457689932056825
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.145679012345679
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14558914250462676
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14549938347718866
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14540973505853358
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14532019704433496
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14523076923076922
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000915 |
| Iteration     | 63        |
| MaximumReturn | -0.000563 |
| MinimumReturn | -0.00211  |
| TotalSamples  | 108290    |
-----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02077929861843586
Validation loss = 0.020722094923257828
Validation loss = 0.019962158054113388
Validation loss = 0.019659247249364853
Validation loss = 0.020929766818881035
Validation loss = 0.020357651636004448
Validation loss = 0.020474808290600777
Validation loss = 0.020506015047430992
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019583845511078835
Validation loss = 0.02052035741508007
Validation loss = 0.01920236460864544
Validation loss = 0.019419288262724876
Validation loss = 0.021025363355875015
Validation loss = 0.02042054384946823
Validation loss = 0.01862400770187378
Validation loss = 0.018854321911931038
Validation loss = 0.020648131147027016
Validation loss = 0.019315702840685844
Validation loss = 0.022175543010234833
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02097618393599987
Validation loss = 0.02115003764629364
Validation loss = 0.019576510414481163
Validation loss = 0.019757995381951332
Validation loss = 0.021099753677845
Validation loss = 0.020340729504823685
Validation loss = 0.020134447142481804
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018513644114136696
Validation loss = 0.019109630957245827
Validation loss = 0.01879962347447872
Validation loss = 0.019812526181340218
Validation loss = 0.020369041711091995
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021611645817756653
Validation loss = 0.0197371244430542
Validation loss = 0.019626520574092865
Validation loss = 0.019666334614157677
Validation loss = 0.0201351810246706
Validation loss = 0.02125604636967182
Validation loss = 0.019956743344664574
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14514145141451415
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1450522433927474
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14496314496314497
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14487415592387967
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14478527607361963
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14469650521152666
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14460784313725492
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14451928965094918
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14443084455324356
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14434250764525994
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14425427872860636
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14416615760537568
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14407814407814407
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14399023794996949
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14390243902439023
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14381474710542352
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14372716199756394
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1436396835057821
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1435523114355231
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14346504559270518
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1433778857837181
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.143290831815422
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14320388349514562
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14311704063068525
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14303030303030304
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00243 |
| Iteration     | 64       |
| MaximumReturn | -0.00161 |
| MinimumReturn | -0.00414 |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019196195527911186
Validation loss = 0.019866615533828735
Validation loss = 0.020496012642979622
Validation loss = 0.020461546257138252
Validation loss = 0.01971612125635147
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022372931241989136
Validation loss = 0.018306996673345566
Validation loss = 0.020428229123353958
Validation loss = 0.01989232562482357
Validation loss = 0.018958846107125282
Validation loss = 0.01930425688624382
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020287221297621727
Validation loss = 0.020581860095262527
Validation loss = 0.02384951338171959
Validation loss = 0.019534507766366005
Validation loss = 0.019797958433628082
Validation loss = 0.02050275355577469
Validation loss = 0.020736204460263252
Validation loss = 0.020200885832309723
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01926463283598423
Validation loss = 0.0186395775526762
Validation loss = 0.019572418183088303
Validation loss = 0.019531507045030594
Validation loss = 0.019005009904503822
Validation loss = 0.01848589815199375
Validation loss = 0.018662886694073677
Validation loss = 0.019141413271427155
Validation loss = 0.01938619837164879
Validation loss = 0.018531518056988716
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01968802884221077
Validation loss = 0.01961003802716732
Validation loss = 0.019163858145475388
Validation loss = 0.019338201731443405
Validation loss = 0.019535141065716743
Validation loss = 0.020537715405225754
Validation loss = 0.022782135754823685
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14294367050272563
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14285714285714285
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14277071990320628
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14268440145102781
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14259818731117824
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14251207729468598
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1424260712130356
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14234016887816647
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14225437010247136
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14216867469879518
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14208308248043347
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14199759326113118
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14191220685508119
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14182692307692307
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14174174174174173
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14165666266506602
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14157168566286743
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14148681055155876
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14140203714799282
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14131736526946106
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1412327947336924
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14114832535885166
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14106395696353854
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14097968936678615
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1408955223880597
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00131  |
| Iteration     | 65        |
| MaximumReturn | -0.000746 |
| MinimumReturn | -0.00431  |
| TotalSamples  | 111622    |
-----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02143324539065361
Validation loss = 0.0231795571744442
Validation loss = 0.02017802558839321
Validation loss = 0.02045951597392559
Validation loss = 0.02162971906363964
Validation loss = 0.019763050600886345
Validation loss = 0.020584754645824432
Validation loss = 0.020545970648527145
Validation loss = 0.020012594759464264
Validation loss = 0.019661320373415947
Validation loss = 0.02051392011344433
Validation loss = 0.021550973877310753
Validation loss = 0.02028973586857319
Validation loss = 0.02067743055522442
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020267875865101814
Validation loss = 0.020006366074085236
Validation loss = 0.02065715752542019
Validation loss = 0.02113882452249527
Validation loss = 0.021762246266007423
Validation loss = 0.018472423776984215
Validation loss = 0.01923634298145771
Validation loss = 0.019978243857622147
Validation loss = 0.019000360742211342
Validation loss = 0.020403152331709862
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019711120054125786
Validation loss = 0.019888700917363167
Validation loss = 0.019980156794190407
Validation loss = 0.020429998636245728
Validation loss = 0.021219132468104362
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02066505141556263
Validation loss = 0.020162831991910934
Validation loss = 0.018955474719405174
Validation loss = 0.01930982619524002
Validation loss = 0.02020248956978321
Validation loss = 0.018643764778971672
Validation loss = 0.020312165841460228
Validation loss = 0.01949647068977356
Validation loss = 0.019369235262274742
Validation loss = 0.01951695792376995
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021188709884881973
Validation loss = 0.02049795724451542
Validation loss = 0.02079552598297596
Validation loss = 0.020670387893915176
Validation loss = 0.0205151978880167
Validation loss = 0.02028639055788517
Validation loss = 0.021602919325232506
Validation loss = 0.020358072593808174
Validation loss = 0.02017899416387081
Validation loss = 0.019715068861842155
Validation loss = 0.02019563317298889
Validation loss = 0.02031870186328888
Validation loss = 0.020900553092360497
Validation loss = 0.02160351537168026
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14081145584725538
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14072748956469885
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14064362336114422
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1405598570577725
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14047619047619048
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1403926234384295
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1403091557669441
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14022578728461083
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14014251781472684
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1400593471810089
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13997627520759193
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13989330171902786
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13981042654028436
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13972764949674363
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13964497041420118
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13956238911886457
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13947990543735225
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13939751919669227
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13931523022432113
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1392330383480826
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1391509433962264
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1390689451974072
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13898704358068315
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13890523837551502
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1388235294117647
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000805 |
| Iteration     | 66        |
| MaximumReturn | -0.000602 |
| MinimumReturn | -0.00101  |
| TotalSamples  | 113288    |
-----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02124258317053318
Validation loss = 0.020190592855215073
Validation loss = 0.020901065319776535
Validation loss = 0.020477091893553734
Validation loss = 0.019574826583266258
Validation loss = 0.02115226723253727
Validation loss = 0.02021212689578533
Validation loss = 0.020248085260391235
Validation loss = 0.019948435947299004
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020516229793429375
Validation loss = 0.01991075649857521
Validation loss = 0.02007034793496132
Validation loss = 0.02022477611899376
Validation loss = 0.01884719356894493
Validation loss = 0.02106483466923237
Validation loss = 0.020549356937408447
Validation loss = 0.019358137622475624
Validation loss = 0.01909097470343113
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020778199657797813
Validation loss = 0.021448839455842972
Validation loss = 0.02014843560755253
Validation loss = 0.022579461336135864
Validation loss = 0.020308654755353928
Validation loss = 0.020245051011443138
Validation loss = 0.02050749771296978
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019160771742463112
Validation loss = 0.01910044439136982
Validation loss = 0.01966484822332859
Validation loss = 0.01929738186299801
Validation loss = 0.020293215289711952
Validation loss = 0.02075493149459362
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02168290875852108
Validation loss = 0.019608084112405777
Validation loss = 0.01970408298075199
Validation loss = 0.021038727834820747
Validation loss = 0.019848058000206947
Validation loss = 0.019954154267907143
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1387419165196943
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13866039952996476
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13857897827363477
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13849765258215962
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13841642228739004
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13833528722157093
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13825424721734036
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13817330210772832
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13809245172615564
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13801169590643275
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13793103448275862
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1378504672897196
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13776999416228838
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13768961493582263
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13760932944606413
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13752913752913754
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1374490390215492
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13736903376018628
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1372891215823153
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1372093023255814
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13712957582800697
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13704994192799072
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13697040046430645
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1368909512761021
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13681159420289854
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000874 |
| Iteration     | 67        |
| MaximumReturn | -0.000706 |
| MinimumReturn | -0.00113  |
| TotalSamples  | 114954    |
-----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02087707258760929
Validation loss = 0.02093390002846718
Validation loss = 0.01950887031853199
Validation loss = 0.02437591180205345
Validation loss = 0.01961609162390232
Validation loss = 0.019678695127367973
Validation loss = 0.02114829048514366
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01963183842599392
Validation loss = 0.020839044824242592
Validation loss = 0.01868349500000477
Validation loss = 0.020188501104712486
Validation loss = 0.01952853612601757
Validation loss = 0.019102884456515312
Validation loss = 0.020574385300278664
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020345263183116913
Validation loss = 0.02066044695675373
Validation loss = 0.02049311436712742
Validation loss = 0.019667113199830055
Validation loss = 0.019992057234048843
Validation loss = 0.019870176911354065
Validation loss = 0.02131502516567707
Validation loss = 0.020231978967785835
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019820010289549828
Validation loss = 0.019317420199513435
Validation loss = 0.019406698644161224
Validation loss = 0.018282996490597725
Validation loss = 0.018898051232099533
Validation loss = 0.018039843067526817
Validation loss = 0.01888807862997055
Validation loss = 0.019390109926462173
Validation loss = 0.020024459809064865
Validation loss = 0.020158685743808746
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022369790822267532
Validation loss = 0.020489521324634552
Validation loss = 0.019592290744185448
Validation loss = 0.019646836444735527
Validation loss = 0.019429126754403114
Validation loss = 0.020110828801989555
Validation loss = 0.0200908575206995
Validation loss = 0.02047075517475605
Validation loss = 0.02134213037788868
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13673232908458866
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13665315576143602
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13657407407407407
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13649508386350492
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13641618497109825
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1363373772385904
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13625866050808313
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1361800346220427
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13610149942329874
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13602305475504323
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1359447004608295
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1358664363845711
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13578826237054084
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13571017826336976
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.135632183908046
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13555427914991383
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1354764638346728
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13539873780837636
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1353211009174312
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.135243553008596
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13516609392898052
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13508872352604465
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13501144164759726
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1349342481417953
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13485714285714287
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000879 |
| Iteration     | 68        |
| MaximumReturn | -0.000633 |
| MinimumReturn | -0.0013   |
| TotalSamples  | 116620    |
-----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02021588385105133
Validation loss = 0.020655987784266472
Validation loss = 0.0209581907838583
Validation loss = 0.020421458408236504
Validation loss = 0.021940290927886963
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019899575039744377
Validation loss = 0.01931597851216793
Validation loss = 0.02163682132959366
Validation loss = 0.0189825352281332
Validation loss = 0.019150596112012863
Validation loss = 0.01942465826869011
Validation loss = 0.01865404099225998
Validation loss = 0.01863095350563526
Validation loss = 0.02120598778128624
Validation loss = 0.020504333078861237
Validation loss = 0.019115025177598
Validation loss = 0.018653836101293564
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02025124989449978
Validation loss = 0.020207475870847702
Validation loss = 0.021366015076637268
Validation loss = 0.01978907734155655
Validation loss = 0.020666176453232765
Validation loss = 0.01970629021525383
Validation loss = 0.01976236328482628
Validation loss = 0.019388332962989807
Validation loss = 0.019791748374700546
Validation loss = 0.020563308149576187
Validation loss = 0.020405514165759087
Validation loss = 0.01999201811850071
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018489323556423187
Validation loss = 0.020125582814216614
Validation loss = 0.01906670816242695
Validation loss = 0.0179587434977293
Validation loss = 0.018845239654183388
Validation loss = 0.019017044454813004
Validation loss = 0.01852007396519184
Validation loss = 0.01885908842086792
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019518733024597168
Validation loss = 0.01989598385989666
Validation loss = 0.02323910780251026
Validation loss = 0.019791441038250923
Validation loss = 0.019590381532907486
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13478012564249
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13470319634703196
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13462635482030805
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1345496009122007
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13447293447293449
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13439635535307518
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13431986340352875
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1342434584755404
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13416714042069358
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1340909090909091
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13401476433844406
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13393870601589103
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13386273397617698
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13378684807256236
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13371104815864024
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13363533408833522
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13355970571590267
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1334841628959276
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13340870548332392
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13333333333333333
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13325804630152457
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13318284424379231
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13310772701635645
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.133032694475761
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13295774647887323
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00084  |
| Iteration     | 69        |
| MaximumReturn | -0.000616 |
| MinimumReturn | -0.00127  |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019279392436146736
Validation loss = 0.020162953063845634
Validation loss = 0.020575569942593575
Validation loss = 0.020147211849689484
Validation loss = 0.01923549175262451
Validation loss = 0.02046700194478035
Validation loss = 0.02013172022998333
Validation loss = 0.0208501685410738
Validation loss = 0.020321110263466835
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01927834562957287
Validation loss = 0.01921839267015457
Validation loss = 0.018930476158857346
Validation loss = 0.02063206024467945
Validation loss = 0.020024219527840614
Validation loss = 0.018900694325566292
Validation loss = 0.01919199526309967
Validation loss = 0.020412912592291832
Validation loss = 0.019758539274334908
Validation loss = 0.019211044535040855
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020232364535331726
Validation loss = 0.019529622048139572
Validation loss = 0.021867049857974052
Validation loss = 0.020329786464571953
Validation loss = 0.019808320328593254
Validation loss = 0.02101389318704605
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018954358994960785
Validation loss = 0.019052496179938316
Validation loss = 0.018736721947789192
Validation loss = 0.018827347084879875
Validation loss = 0.018472842872142792
Validation loss = 0.019117526710033417
Validation loss = 0.018861664459109306
Validation loss = 0.0187810268253088
Validation loss = 0.0184195414185524
Validation loss = 0.020645149052143097
Validation loss = 0.018657047301530838
Validation loss = 0.01868920587003231
Validation loss = 0.018885357305407524
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022144516929984093
Validation loss = 0.019051525741815567
Validation loss = 0.025111572816967964
Validation loss = 0.020733410492539406
Validation loss = 0.02033986523747444
Validation loss = 0.020160796120762825
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13288288288288289
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13280810354530106
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1327334083239595
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13265879707700956
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13258426966292136
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13250982594048288
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1324354657687991
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13236118900729107
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13228699551569506
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13221288515406163
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13213885778275475
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13206491326245104
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1319910514541387
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13191727221911684
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1318435754189944
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13176996091568957
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13169642857142858
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1316229782487451
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13154960981047936
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13147632311977717
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13140311804008908
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13132999443516974
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13125695216907676
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1311839911061701
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13111111111111112
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000855 |
| Iteration     | 70        |
| MaximumReturn | -0.000606 |
| MinimumReturn | -0.00113  |
| TotalSamples  | 119952    |
-----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01924332231283188
Validation loss = 0.02255190722644329
Validation loss = 0.020718175917863846
Validation loss = 0.020242737606167793
Validation loss = 0.01994699239730835
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01940879039466381
Validation loss = 0.019449181854724884
Validation loss = 0.018805881962180138
Validation loss = 0.018607284873723984
Validation loss = 0.019039984792470932
Validation loss = 0.018238568678498268
Validation loss = 0.018994586542248726
Validation loss = 0.019620154052972794
Validation loss = 0.019223686307668686
Validation loss = 0.019474951550364494
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020953966304659843
Validation loss = 0.019605837762355804
Validation loss = 0.021146196871995926
Validation loss = 0.020148107782006264
Validation loss = 0.020892512053251266
Validation loss = 0.020066937431693077
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019453611224889755
Validation loss = 0.019512873142957687
Validation loss = 0.018853221088647842
Validation loss = 0.01934093050658703
Validation loss = 0.019454648718237877
Validation loss = 0.021619010716676712
Validation loss = 0.01864541508257389
Validation loss = 0.01810658909380436
Validation loss = 0.01924034021794796
Validation loss = 0.01999446004629135
Validation loss = 0.019364187493920326
Validation loss = 0.01937493309378624
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020338308066129684
Validation loss = 0.02178431861102581
Validation loss = 0.018848801031708717
Validation loss = 0.01998279057443142
Validation loss = 0.019926955923438072
Validation loss = 0.019547410309314728
Validation loss = 0.02062533237040043
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13103831204886174
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1309655937846837
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13089295618413754
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13082039911308205
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13074792243767314
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13067552602436322
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1306032097399004
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13053097345132744
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1304588170259812
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13038674033149172
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13031474323578135
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13024282560706402
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13017098731384447
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13009922822491732
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13002754820936638
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1299559471365639
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1298844248761695
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12981298129812982
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1297416162726773
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12967032967032968
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12959912136188906
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12952799121844127
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12945693911135492
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12938596491228072
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1293150684931507
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0025  |
| Iteration     | 71       |
| MaximumReturn | -0.00156 |
| MinimumReturn | -0.00425 |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019618544727563858
Validation loss = 0.02026175707578659
Validation loss = 0.01978241465985775
Validation loss = 0.01881815306842327
Validation loss = 0.02102162316441536
Validation loss = 0.020250435918569565
Validation loss = 0.02138143591582775
Validation loss = 0.01927257888019085
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018920965492725372
Validation loss = 0.020158100873231888
Validation loss = 0.021377401426434517
Validation loss = 0.019797276705503464
Validation loss = 0.019644418731331825
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01921430602669716
Validation loss = 0.022407060489058495
Validation loss = 0.01994275115430355
Validation loss = 0.02151951566338539
Validation loss = 0.019988738000392914
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019174756482243538
Validation loss = 0.020064614713191986
Validation loss = 0.018480993807315826
Validation loss = 0.021272892132401466
Validation loss = 0.0182192400097847
Validation loss = 0.01856723614037037
Validation loss = 0.01978497952222824
Validation loss = 0.019440077245235443
Validation loss = 0.019422322511672974
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02043665386736393
Validation loss = 0.021239174529910088
Validation loss = 0.019400713965296745
Validation loss = 0.019434398040175438
Validation loss = 0.0201468113809824
Validation loss = 0.021515246480703354
Validation loss = 0.018796885386109352
Validation loss = 0.02041427232325077
Validation loss = 0.018943848088383675
Validation loss = 0.021762989461421967
Validation loss = 0.019917640835046768
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12924424972617743
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1291735084838533
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12910284463894967
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12903225806451613
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12896174863387977
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12889131622064445
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12882096069868995
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1287506819421713
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.128680479825518
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12861035422343325
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12854030501089325
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12847033206314643
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12840043525571274
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12833061446438282
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1282608695652174
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12819120043454643
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1281216069489685
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12805208898534998
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1279826464208243
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12791327913279132
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12784398699891658
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1277747698971305
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1277056277056277
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1276365603028664
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12756756756756757
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00111  |
| Iteration     | 72        |
| MaximumReturn | -0.000633 |
| MinimumReturn | -0.00222  |
| TotalSamples  | 123284    |
-----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02044147625565529
Validation loss = 0.01952950283885002
Validation loss = 0.02000470831990242
Validation loss = 0.022219499573111534
Validation loss = 0.02025727927684784
Validation loss = 0.02121531218290329
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018343476578593254
Validation loss = 0.019062064588069916
Validation loss = 0.019645102322101593
Validation loss = 0.020547227934002876
Validation loss = 0.018611092120409012
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021537872031331062
Validation loss = 0.02010958269238472
Validation loss = 0.020253902301192284
Validation loss = 0.020128091797232628
Validation loss = 0.02193065918982029
Validation loss = 0.01972527615725994
Validation loss = 0.019416945055127144
Validation loss = 0.021082669496536255
Validation loss = 0.020033717155456543
Validation loss = 0.020350299775600433
Validation loss = 0.01985238678753376
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018284043297171593
Validation loss = 0.018859826028347015
Validation loss = 0.02094755321741104
Validation loss = 0.01904103346168995
Validation loss = 0.019146522507071495
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01984567381441593
Validation loss = 0.02068251557648182
Validation loss = 0.019917253404855728
Validation loss = 0.019056882709264755
Validation loss = 0.020209135487675667
Validation loss = 0.01926494389772415
Validation loss = 0.018681909888982773
Validation loss = 0.01928703859448433
Validation loss = 0.02006606012582779
Validation loss = 0.02325388602912426
Validation loss = 0.020828161388635635
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12749864937871422
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12742980561555076
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1273610361575823
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1272923408845739
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12722371967654986
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1271551724137931
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12708669897684438
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12701829924650163
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12694997310381925
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12688172043010754
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12681354110693177
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1267454350161117
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1266774020397209
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12660944206008584
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12654155495978553
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1264737406216506
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12640599892876273
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12633832976445397
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12627073301230604
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12620320855614972
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12613575628006413
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12606837606837606
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12600106780565937
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12593383137673425
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12586666666666665
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00107  |
| Iteration     | 73        |
| MaximumReturn | -0.000691 |
| MinimumReturn | -0.00163  |
| TotalSamples  | 124950    |
-----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02015884593129158
Validation loss = 0.02013971470296383
Validation loss = 0.01978086680173874
Validation loss = 0.01943282037973404
Validation loss = 0.02166254259645939
Validation loss = 0.019404180347919464
Validation loss = 0.02064659260213375
Validation loss = 0.018795516341924667
Validation loss = 0.019375348463654518
Validation loss = 0.020040275529026985
Validation loss = 0.0199251938611269
Validation loss = 0.019442295655608177
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019336868077516556
Validation loss = 0.019051454961299896
Validation loss = 0.01924598589539528
Validation loss = 0.018538668751716614
Validation loss = 0.020572718232870102
Validation loss = 0.019501671195030212
Validation loss = 0.01902293786406517
Validation loss = 0.018835686147212982
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019605200737714767
Validation loss = 0.0200448427349329
Validation loss = 0.019544081762433052
Validation loss = 0.020688403397798538
Validation loss = 0.019956547766923904
Validation loss = 0.021924782544374466
Validation loss = 0.01958424225449562
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01972673274576664
Validation loss = 0.01842815987765789
Validation loss = 0.019743621349334717
Validation loss = 0.0203261561691761
Validation loss = 0.018141591921448708
Validation loss = 0.019391875714063644
Validation loss = 0.018534701317548752
Validation loss = 0.017979009076952934
Validation loss = 0.017998209223151207
Validation loss = 0.018535129725933075
Validation loss = 0.01902683638036251
Validation loss = 0.018073484301567078
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01954854093492031
Validation loss = 0.01907731220126152
Validation loss = 0.020167337730526924
Validation loss = 0.020862946286797523
Validation loss = 0.019054271280765533
Validation loss = 0.018904421478509903
Validation loss = 0.019533492624759674
Validation loss = 0.01909622736275196
Validation loss = 0.019889630377292633
Validation loss = 0.01972111128270626
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1257995735607676
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12573255194459243
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12566560170394037
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12559872272485365
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125531914893617
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12546517809675706
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12539851222104145
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1253319171534785
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12526539278131635
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12519893899204243
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12513255567338283
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12506624271330152
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12493382742191636
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12486772486772486
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12480169222633528
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12473572938689217
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12466983623877444
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1246040126715945
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12453825857519789
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12447257383966245
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12440695835529784
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12434141201264488
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12427593470247499
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12421052631578948
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00124  |
| Iteration     | 74        |
| MaximumReturn | -0.000903 |
| MinimumReturn | -0.0026   |
| TotalSamples  | 126616    |
-----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01934368535876274
Validation loss = 0.02146719954907894
Validation loss = 0.019621988758444786
Validation loss = 0.02031840942800045
Validation loss = 0.01885516569018364
Validation loss = 0.02131284587085247
Validation loss = 0.020965510979294777
Validation loss = 0.019443422555923462
Validation loss = 0.019792336970567703
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02128615975379944
Validation loss = 0.019611788913607597
Validation loss = 0.019235586747527122
Validation loss = 0.018494976684451103
Validation loss = 0.019472243264317513
Validation loss = 0.020943889394402504
Validation loss = 0.018506009131669998
Validation loss = 0.01871701143682003
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019532574340701103
Validation loss = 0.020273175090551376
Validation loss = 0.019026143476366997
Validation loss = 0.01988348923623562
Validation loss = 0.019284360110759735
Validation loss = 0.01989591121673584
Validation loss = 0.018871407955884933
Validation loss = 0.01880224235355854
Validation loss = 0.019751472398638725
Validation loss = 0.02003033645451069
Validation loss = 0.02033652551472187
Validation loss = 0.020381810143589973
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02019505389034748
Validation loss = 0.01923234388232231
Validation loss = 0.01988125406205654
Validation loss = 0.020527467131614685
Validation loss = 0.02095688320696354
Validation loss = 0.019058508798480034
Validation loss = 0.018631767481565475
Validation loss = 0.019067097455263138
Validation loss = 0.019283967092633247
Validation loss = 0.018699269741773605
Validation loss = 0.020447390154004097
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0212736576795578
Validation loss = 0.020381519570946693
Validation loss = 0.01941923052072525
Validation loss = 0.019103534519672394
Validation loss = 0.020881490781903267
Validation loss = 0.018769262358546257
Validation loss = 0.02078561671078205
Validation loss = 0.018562503159046173
Validation loss = 0.019003838300704956
Validation loss = 0.021385516971349716
Validation loss = 0.019448978826403618
Validation loss = 0.018923327326774597
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12414518674381904
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12407991587802314
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12401471361008934
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12394957983193278
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12388451443569554
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12381951731374606
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12375458835867856
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12368972746331237
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12362493452069147
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12356020942408377
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12349555206698064
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12343096234309624
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12336644014636697
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12330198537095088
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12323759791122715
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12317327766179541
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12310902451747523
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12304483837330553
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12298071912454403
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12291666666666666
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.122852680895367
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12278876170655567
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12272490899635985
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12266112266112267
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1225974025974026
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000817 |
| Iteration     | 75        |
| MaximumReturn | -0.000633 |
| MinimumReturn | -0.00118  |
| TotalSamples  | 128282    |
-----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0201801136136055
Validation loss = 0.020625099539756775
Validation loss = 0.019267333671450615
Validation loss = 0.020099956542253494
Validation loss = 0.019284706562757492
Validation loss = 0.01931285671889782
Validation loss = 0.020074956119060516
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0187668614089489
Validation loss = 0.019419003278017044
Validation loss = 0.018860336393117905
Validation loss = 0.018083445727825165
Validation loss = 0.020908523350954056
Validation loss = 0.0198039673268795
Validation loss = 0.019248999655246735
Validation loss = 0.01794298365712166
Validation loss = 0.02037382870912552
Validation loss = 0.018208902329206467
Validation loss = 0.019459854811429977
Validation loss = 0.01801033318042755
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019840318709611893
Validation loss = 0.019971588626503944
Validation loss = 0.02030482143163681
Validation loss = 0.019490260630846024
Validation loss = 0.019005751237273216
Validation loss = 0.02037518098950386
Validation loss = 0.02189161628484726
Validation loss = 0.020552678033709526
Validation loss = 0.018980666995048523
Validation loss = 0.01972750574350357
Validation loss = 0.02157927304506302
Validation loss = 0.019070878624916077
Validation loss = 0.01907411217689514
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019425690174102783
Validation loss = 0.019143156707286835
Validation loss = 0.01912318915128708
Validation loss = 0.01770941913127899
Validation loss = 0.019534531980752945
Validation loss = 0.01890375092625618
Validation loss = 0.01850571669638157
Validation loss = 0.01831827312707901
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019179821014404297
Validation loss = 0.02174505963921547
Validation loss = 0.019701313227415085
Validation loss = 0.01903477869927883
Validation loss = 0.019052375108003616
Validation loss = 0.018092090263962746
Validation loss = 0.02016458474099636
Validation loss = 0.02112976834177971
Validation loss = 0.017897965386509895
Validation loss = 0.018091771751642227
Validation loss = 0.021505963057279587
Validation loss = 0.019219208508729935
Validation loss = 0.019398147240281105
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.122533748701973
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12247016087182148
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12240663900414937
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12234318299637117
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12227979274611399
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12221646815121699
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12215320910973085
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12209001551991723
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12202688728024819
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12196382428940568
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12190082644628099
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12183789364997419
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1217750257997936
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12171222279525529
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12164948453608247
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12158681092220505
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.121524201853759
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12146165723108594
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12139917695473251
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12133676092544987
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12127440904419322
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12121212121212122
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12114989733059549
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1210877373011801
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12102564102564102
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000843 |
| Iteration     | 76        |
| MaximumReturn | -0.000626 |
| MinimumReturn | -0.00137  |
| TotalSamples  | 129948    |
-----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01963784359395504
Validation loss = 0.0205687303096056
Validation loss = 0.019199687987565994
Validation loss = 0.020104289054870605
Validation loss = 0.01927993819117546
Validation loss = 0.019226256757974625
Validation loss = 0.0207053329795599
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01865621842443943
Validation loss = 0.01915168948471546
Validation loss = 0.018331557512283325
Validation loss = 0.018090587109327316
Validation loss = 0.019643634557724
Validation loss = 0.018695300444960594
Validation loss = 0.018376756459474564
Validation loss = 0.020251821726560593
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02118576690554619
Validation loss = 0.019929517060518265
Validation loss = 0.02034337818622589
Validation loss = 0.01891700178384781
Validation loss = 0.01949424296617508
Validation loss = 0.018372179940342903
Validation loss = 0.020340340211987495
Validation loss = 0.019720662385225296
Validation loss = 0.02027047425508499
Validation loss = 0.021396437659859657
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018478024750947952
Validation loss = 0.017102716490626335
Validation loss = 0.018454458564519882
Validation loss = 0.018414795398712158
Validation loss = 0.018441908061504364
Validation loss = 0.018958842381834984
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018801119178533554
Validation loss = 0.019489014521241188
Validation loss = 0.020513854920864105
Validation loss = 0.01857934147119522
Validation loss = 0.0190851092338562
Validation loss = 0.018033603206276894
Validation loss = 0.020093273371458054
Validation loss = 0.0198522936552763
Validation loss = 0.018645675852894783
Validation loss = 0.019108928740024567
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12096360840594567
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12090163934426229
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12083973374295955
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12077789150460594
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12071611253196932
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12065439672801637
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1205927439959121
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1205311542390194
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12046962736089842
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12040816326530612
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12034676185619582
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12028542303771661
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12022414671421294
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12016293279022404
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12010178117048347
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12004069175991862
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11997966446365023
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11991869918699187
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11985779583544946
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11979695431472082
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11973617453069507
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11967545638945233
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11961479979726306
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11955420466058764
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11949367088607595
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000815 |
| Iteration     | 77        |
| MaximumReturn | -0.000539 |
| MinimumReturn | -0.00106  |
| TotalSamples  | 131614    |
-----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020172119140625
Validation loss = 0.020289037376642227
Validation loss = 0.020381104201078415
Validation loss = 0.02259327471256256
Validation loss = 0.01940365694463253
Validation loss = 0.01964496076107025
Validation loss = 0.019100161269307137
Validation loss = 0.02111520990729332
Validation loss = 0.019959695637226105
Validation loss = 0.01930280402302742
Validation loss = 0.019134294241666794
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020993763580918312
Validation loss = 0.018843336030840874
Validation loss = 0.018612006679177284
Validation loss = 0.01954110711812973
Validation loss = 0.019235989078879356
Validation loss = 0.020586350932717323
Validation loss = 0.020020291209220886
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020020809024572372
Validation loss = 0.019244592636823654
Validation loss = 0.01938173919916153
Validation loss = 0.020468704402446747
Validation loss = 0.019560953602194786
Validation loss = 0.021496716886758804
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019784698262810707
Validation loss = 0.019886383786797523
Validation loss = 0.021630674600601196
Validation loss = 0.01952408440411091
Validation loss = 0.018501175567507744
Validation loss = 0.01807916909456253
Validation loss = 0.0181894414126873
Validation loss = 0.019594436511397362
Validation loss = 0.021145861595869064
Validation loss = 0.018270352855324745
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019913334399461746
Validation loss = 0.01879757083952427
Validation loss = 0.020560063421726227
Validation loss = 0.01983249932527542
Validation loss = 0.018788833171129227
Validation loss = 0.020250190049409866
Validation loss = 0.01935695856809616
Validation loss = 0.01830076240003109
Validation loss = 0.022132448852062225
Validation loss = 0.018632318824529648
Validation loss = 0.01971060037612915
Validation loss = 0.01949894055724144
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1194331983805668
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1193727870510875
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11931243680485339
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1192521475492673
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1191919191919192
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11913175164058556
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11907164480322906
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11901159858799798
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11895161290322581
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11889168765743073
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11883182275931521
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11877201811776547
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11871227364185111
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11865258924082453
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1185929648241206
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1185334003013561
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11847389558232932
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11841445057701956
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11835506519558676
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11829573934837093
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11823647294589178
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11817726589884828
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11811811811811812
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11805902951475739
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.118
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000801 |
| Iteration     | 78        |
| MaximumReturn | -0.000555 |
| MinimumReturn | -0.00112  |
| TotalSamples  | 133280    |
-----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019623955711722374
Validation loss = 0.019665753468871117
Validation loss = 0.0192771814763546
Validation loss = 0.020796267315745354
Validation loss = 0.02003694325685501
Validation loss = 0.0198192335665226
Validation loss = 0.019826453179121017
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01985226385295391
Validation loss = 0.02017289027571678
Validation loss = 0.019038023427128792
Validation loss = 0.019354380667209625
Validation loss = 0.019173234701156616
Validation loss = 0.01991206966340542
Validation loss = 0.01891596056520939
Validation loss = 0.018064480274915695
Validation loss = 0.019572852179408073
Validation loss = 0.019296133890748024
Validation loss = 0.020345721393823624
Validation loss = 0.018403509631752968
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02024635672569275
Validation loss = 0.020513717085123062
Validation loss = 0.019022488966584206
Validation loss = 0.020294342190027237
Validation loss = 0.019222170114517212
Validation loss = 0.019219424575567245
Validation loss = 0.02017388492822647
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019041799008846283
Validation loss = 0.019392667338252068
Validation loss = 0.017794571816921234
Validation loss = 0.018620910122990608
Validation loss = 0.01868998259305954
Validation loss = 0.01850280910730362
Validation loss = 0.019973114132881165
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019618604332208633
Validation loss = 0.01948801428079605
Validation loss = 0.02187011018395424
Validation loss = 0.01893858052790165
Validation loss = 0.018544703722000122
Validation loss = 0.0195685476064682
Validation loss = 0.01876542717218399
Validation loss = 0.019178979098796844
Validation loss = 0.019167643040418625
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11794102948525736
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11788211788211789
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11782326510234647
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11776447105788423
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11770573566084788
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11764705882352941
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11758844045839562
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11752988047808766
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1174713787954206
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11741293532338308
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11735454997513675
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1172962226640159
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11723795330352707
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11717974180734857
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11712158808933003
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11706349206349206
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11700545364402579
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11694747274529237
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11688954928182269
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11683168316831684
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11677387431964374
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11671612265084075
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1166584280771132
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.116600790513834
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11654320987654321
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00982 |
| Iteration     | 79       |
| MaximumReturn | -0.00634 |
| MinimumReturn | -0.0131  |
| TotalSamples  | 134946   |
----------------------------
