Logging to experiments/invertedPendulum/nov2/IPO01w350e3_seed2431
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.832614541053772
Validation loss = 0.7183802723884583
Validation loss = 0.6757965683937073
Validation loss = 0.6415250897407532
Validation loss = 0.616887092590332
Validation loss = 0.5998833775520325
Validation loss = 0.5877112150192261
Validation loss = 0.5745161175727844
Validation loss = 0.5558178424835205
Validation loss = 0.5451139807701111
Validation loss = 0.547703742980957
Validation loss = 0.5313217043876648
Validation loss = 0.5411882400512695
Validation loss = 0.5250403881072998
Validation loss = 0.5091210007667542
Validation loss = 0.4996107220649719
Validation loss = 0.522310197353363
Validation loss = 0.515442967414856
Validation loss = 0.5024352073669434
Validation loss = 0.5085568428039551
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.8368035554885864
Validation loss = 0.7229803800582886
Validation loss = 0.6615867018699646
Validation loss = 0.650592565536499
Validation loss = 0.6543924808502197
Validation loss = 0.6259573698043823
Validation loss = 0.6063988208770752
Validation loss = 0.5818240642547607
Validation loss = 0.5672434568405151
Validation loss = 0.5527796149253845
Validation loss = 0.5517450571060181
Validation loss = 0.5536501407623291
Validation loss = 0.5329086184501648
Validation loss = 0.5307743549346924
Validation loss = 0.5161373615264893
Validation loss = 0.5268709659576416
Validation loss = 0.5082291960716248
Validation loss = 0.5021781921386719
Validation loss = 0.5162953734397888
Validation loss = 0.48831966519355774
Validation loss = 0.4902314841747284
Validation loss = 0.49532386660575867
Validation loss = 0.4811578392982483
Validation loss = 0.4952259361743927
Validation loss = 0.4830811619758606
Validation loss = 0.4834669232368469
Validation loss = 0.47973164916038513
Validation loss = 0.46833834052085876
Validation loss = 0.45805275440216064
Validation loss = 0.4597785770893097
Validation loss = 0.45048752427101135
Validation loss = 0.45856237411499023
Validation loss = 0.45404675602912903
Validation loss = 0.4534980058670044
Validation loss = 0.4741784632205963
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.855893075466156
Validation loss = 0.7210298776626587
Validation loss = 0.6721415519714355
Validation loss = 0.6467602252960205
Validation loss = 0.6374045014381409
Validation loss = 0.6349275708198547
Validation loss = 0.5985987186431885
Validation loss = 0.5988060235977173
Validation loss = 0.5702519416809082
Validation loss = 0.5574101805686951
Validation loss = 0.5420363545417786
Validation loss = 0.5473631620407104
Validation loss = 0.5464656352996826
Validation loss = 0.5328298807144165
Validation loss = 0.5353483557701111
Validation loss = 0.5151450037956238
Validation loss = 0.5138742327690125
Validation loss = 0.5096986889839172
Validation loss = 0.5213281512260437
Validation loss = 0.5037372708320618
Validation loss = 0.5042340159416199
Validation loss = 0.5019840598106384
Validation loss = 0.4977167844772339
Validation loss = 0.4912048876285553
Validation loss = 0.47737208008766174
Validation loss = 0.46682244539260864
Validation loss = 0.46562495827674866
Validation loss = 0.475241094827652
Validation loss = 0.47854921221733093
Validation loss = 0.5205198526382446
Validation loss = 0.4908033311367035
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.8362381458282471
Validation loss = 0.7230396866798401
Validation loss = 0.6590743064880371
Validation loss = 0.6510726809501648
Validation loss = 0.6492241621017456
Validation loss = 0.6159679889678955
Validation loss = 0.5912294983863831
Validation loss = 0.5804663896560669
Validation loss = 0.5591431260108948
Validation loss = 0.5445622801780701
Validation loss = 0.5489781498908997
Validation loss = 0.530689001083374
Validation loss = 0.5348986983299255
Validation loss = 0.5493595600128174
Validation loss = 0.534356415271759
Validation loss = 0.5208159685134888
Validation loss = 0.519770085811615
Validation loss = 0.5309242010116577
Validation loss = 0.49543413519859314
Validation loss = 0.5001392960548401
Validation loss = 0.513570249080658
Validation loss = 0.5286614894866943
Validation loss = 0.5121917128562927
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.8429416418075562
Validation loss = 0.7122595906257629
Validation loss = 0.7165238857269287
Validation loss = 0.6608697175979614
Validation loss = 0.650745153427124
Validation loss = 0.6205900311470032
Validation loss = 0.6020717024803162
Validation loss = 0.5961819887161255
Validation loss = 0.5704545974731445
Validation loss = 0.5650962591171265
Validation loss = 0.5561050176620483
Validation loss = 0.5554175972938538
Validation loss = 0.5423589944839478
Validation loss = 0.5458806753158569
Validation loss = 0.5422404408454895
Validation loss = 0.5364550948143005
Validation loss = 0.5337620377540588
Validation loss = 0.512275755405426
Validation loss = 0.508181631565094
Validation loss = 0.5218724608421326
Validation loss = 0.48846063017845154
Validation loss = 0.49471575021743774
Validation loss = 0.5033742189407349
Validation loss = 0.5064073801040649
Validation loss = 0.4783793091773987
Validation loss = 0.47722724080085754
Validation loss = 0.4976043403148651
Validation loss = 0.467559278011322
Validation loss = 0.47908875346183777
Validation loss = 0.46693944931030273
Validation loss = 0.46670928597450256
Validation loss = 0.4536263942718506
Validation loss = 0.45389124751091003
Validation loss = 0.4674361050128937
Validation loss = 0.45519375801086426
Validation loss = 0.4647546410560608
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -9.09    |
| Iteration     | 0        |
| MaximumReturn | -0.111   |
| MinimumReturn | -62.5    |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5476925373077393
Validation loss = 0.4725847840309143
Validation loss = 0.44838574528694153
Validation loss = 0.4359845817089081
Validation loss = 0.4285062253475189
Validation loss = 0.4198756814002991
Validation loss = 0.4255427122116089
Validation loss = 0.42185550928115845
Validation loss = 0.4162086844444275
Validation loss = 0.41675999760627747
Validation loss = 0.41518691182136536
Validation loss = 0.41108444333076477
Validation loss = 0.40841442346572876
Validation loss = 0.4026532769203186
Validation loss = 0.4042486846446991
Validation loss = 0.41293591260910034
Validation loss = 0.40848588943481445
Validation loss = 0.39321979880332947
Validation loss = 0.4043898284435272
Validation loss = 0.4074689745903015
Validation loss = 0.42055460810661316
Validation loss = 0.3949047029018402
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6612230539321899
Validation loss = 0.48395299911499023
Validation loss = 0.4616522789001465
Validation loss = 0.44614291191101074
Validation loss = 0.4422539472579956
Validation loss = 0.4312073290348053
Validation loss = 0.43280890583992004
Validation loss = 0.4255886971950531
Validation loss = 0.41879603266716003
Validation loss = 0.4183661639690399
Validation loss = 0.41727763414382935
Validation loss = 0.4132958650588989
Validation loss = 0.4129539132118225
Validation loss = 0.40957099199295044
Validation loss = 0.4185536503791809
Validation loss = 0.4126521944999695
Validation loss = 0.4012230932712555
Validation loss = 0.4001927971839905
Validation loss = 0.39809340238571167
Validation loss = 0.3955267369747162
Validation loss = 0.37750935554504395
Validation loss = 0.40469279885292053
Validation loss = 0.39047229290008545
Validation loss = 0.38219112157821655
Validation loss = 0.3838875889778137
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6399032473564148
Validation loss = 0.48045939207077026
Validation loss = 0.44726529717445374
Validation loss = 0.43713805079460144
Validation loss = 0.4330335557460785
Validation loss = 0.42324164509773254
Validation loss = 0.4205791652202606
Validation loss = 0.419007807970047
Validation loss = 0.41230425238609314
Validation loss = 0.4126284420490265
Validation loss = 0.40932711958885193
Validation loss = 0.41228264570236206
Validation loss = 0.4018612802028656
Validation loss = 0.4032440185546875
Validation loss = 0.4071733355522156
Validation loss = 0.39984241127967834
Validation loss = 0.3915282189846039
Validation loss = 0.39402100443840027
Validation loss = 0.39153140783309937
Validation loss = 0.38799190521240234
Validation loss = 0.41835951805114746
Validation loss = 0.41460928320884705
Validation loss = 0.4065231382846832
Validation loss = 0.4093049168586731
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5984715223312378
Validation loss = 0.479791522026062
Validation loss = 0.45704185962677
Validation loss = 0.4471638798713684
Validation loss = 0.4388520121574402
Validation loss = 0.43162891268730164
Validation loss = 0.4247972369194031
Validation loss = 0.434764564037323
Validation loss = 0.41130712628364563
Validation loss = 0.42751288414001465
Validation loss = 0.405937135219574
Validation loss = 0.40504783391952515
Validation loss = 0.41383302211761475
Validation loss = 0.3945735692977905
Validation loss = 0.40033432841300964
Validation loss = 0.4025506377220154
Validation loss = 0.4070605933666229
Validation loss = 0.4100002944469452
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6571106314659119
Validation loss = 0.4936220645904541
Validation loss = 0.4730224907398224
Validation loss = 0.4589284360408783
Validation loss = 0.44945818185806274
Validation loss = 0.44550007581710815
Validation loss = 0.4343651831150055
Validation loss = 0.42718979716300964
Validation loss = 0.4177650213241577
Validation loss = 0.41443946957588196
Validation loss = 0.407468318939209
Validation loss = 0.3992730677127838
Validation loss = 0.4197269380092621
Validation loss = 0.41250985860824585
Validation loss = 0.3964872360229492
Validation loss = 0.40361714363098145
Validation loss = 0.40577176213264465
Validation loss = 0.4032294750213623
Validation loss = 0.38704413175582886
Validation loss = 0.38449177145957947
Validation loss = 0.41907382011413574
Validation loss = 0.40349113941192627
Validation loss = 0.38684576749801636
Validation loss = 0.4136693477630615
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.36    |
| Iteration     | 1        |
| MaximumReturn | -0.0735  |
| MinimumReturn | -58.2    |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.463329017162323
Validation loss = 0.4104883670806885
Validation loss = 0.4008329510688782
Validation loss = 0.39820635318756104
Validation loss = 0.4002862572669983
Validation loss = 0.39356744289398193
Validation loss = 0.40375643968582153
Validation loss = 0.4074952006340027
Validation loss = 0.40970227122306824
Validation loss = 0.3994506597518921
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5330163240432739
Validation loss = 0.4223642349243164
Validation loss = 0.4019477665424347
Validation loss = 0.4014541506767273
Validation loss = 0.39385056495666504
Validation loss = 0.39554062485694885
Validation loss = 0.3935728073120117
Validation loss = 0.391643762588501
Validation loss = 0.38962724804878235
Validation loss = 0.39372897148132324
Validation loss = 0.403522253036499
Validation loss = 0.39301159977912903
Validation loss = 0.40316706895828247
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5308933258056641
Validation loss = 0.42202430963516235
Validation loss = 0.4033390283584595
Validation loss = 0.3983049690723419
Validation loss = 0.3980792164802551
Validation loss = 0.39354443550109863
Validation loss = 0.390067458152771
Validation loss = 0.39456862211227417
Validation loss = 0.3942152261734009
Validation loss = 0.4013761878013611
Validation loss = 0.39405208826065063
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5052619576454163
Validation loss = 0.41990119218826294
Validation loss = 0.4085267186164856
Validation loss = 0.40049564838409424
Validation loss = 0.39768657088279724
Validation loss = 0.4023939371109009
Validation loss = 0.3996659219264984
Validation loss = 0.4050714373588562
Validation loss = 0.3987269401550293
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5426641702651978
Validation loss = 0.43360522389411926
Validation loss = 0.41402730345726013
Validation loss = 0.40511900186538696
Validation loss = 0.4032789170742035
Validation loss = 0.3974480628967285
Validation loss = 0.40158554911613464
Validation loss = 0.4013872742652893
Validation loss = 0.3927419185638428
Validation loss = 0.39862895011901855
Validation loss = 0.39330869913101196
Validation loss = 0.40683555603027344
Validation loss = 0.3967789113521576
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.17    |
| Iteration     | 2        |
| MaximumReturn | -0.0413  |
| MinimumReturn | -26.3    |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4120888411998749
Validation loss = 0.37144067883491516
Validation loss = 0.37029966711997986
Validation loss = 0.3742351233959198
Validation loss = 0.37779363989830017
Validation loss = 0.36734485626220703
Validation loss = 0.3789104223251343
Validation loss = 0.3676739037036896
Validation loss = 0.3768174648284912
Validation loss = 0.3731478154659271
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.419945627450943
Validation loss = 0.364463210105896
Validation loss = 0.3709303140640259
Validation loss = 0.3722251355648041
Validation loss = 0.36560091376304626
Validation loss = 0.3706565797328949
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.387845516204834
Validation loss = 0.37647780776023865
Validation loss = 0.365736722946167
Validation loss = 0.38160440325737
Validation loss = 0.3773280382156372
Validation loss = 0.37085995078086853
Validation loss = 0.3719390332698822
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.39874789118766785
Validation loss = 0.37831464409828186
Validation loss = 0.36599740386009216
Validation loss = 0.3646133244037628
Validation loss = 0.37588801980018616
Validation loss = 0.38494786620140076
Validation loss = 0.374268501996994
Validation loss = 0.3668489456176758
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4024693965911865
Validation loss = 0.3699468672275543
Validation loss = 0.3647975027561188
Validation loss = 0.36751365661621094
Validation loss = 0.3734714686870575
Validation loss = 0.3672359883785248
Validation loss = 0.37258008122444153
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.266   |
| Iteration     | 3        |
| MaximumReturn | -0.0295  |
| MinimumReturn | -3.95    |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.37550273537635803
Validation loss = 0.3589950203895569
Validation loss = 0.36415624618530273
Validation loss = 0.3649362325668335
Validation loss = 0.362134724855423
Validation loss = 0.3601224720478058
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.36446765065193176
Validation loss = 0.35769587755203247
Validation loss = 0.35841426253318787
Validation loss = 0.36528417468070984
Validation loss = 0.35707420110702515
Validation loss = 0.36503568291664124
Validation loss = 0.3623921871185303
Validation loss = 0.3612680435180664
Validation loss = 0.3573808968067169
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.367963969707489
Validation loss = 0.3612344563007355
Validation loss = 0.3669736087322235
Validation loss = 0.3633010983467102
Validation loss = 0.3633224368095398
Validation loss = 0.3611680269241333
Validation loss = 0.36885854601860046
Validation loss = 0.3603457510471344
Validation loss = 0.3675236403942108
Validation loss = 0.36623841524124146
Validation loss = 0.3615770936012268
Validation loss = 0.36375993490219116
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.36832594871520996
Validation loss = 0.35660406947135925
Validation loss = 0.36226117610931396
Validation loss = 0.35764411091804504
Validation loss = 0.3607812225818634
Validation loss = 0.3671973943710327
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.37002235651016235
Validation loss = 0.35821300745010376
Validation loss = 0.356479287147522
Validation loss = 0.3686410188674927
Validation loss = 0.36127012968063354
Validation loss = 0.35476231575012207
Validation loss = 0.3718557357788086
Validation loss = 0.3644898533821106
Validation loss = 0.36042866110801697
Validation loss = 0.37026333808898926
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0467  |
| Iteration     | 4        |
| MaximumReturn | -0.0187  |
| MinimumReturn | -0.117   |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.37739458680152893
Validation loss = 0.36840909719467163
Validation loss = 0.361925333738327
Validation loss = 0.3663131296634674
Validation loss = 0.37237223982810974
Validation loss = 0.37001389265060425
Validation loss = 0.36962151527404785
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.37687551975250244
Validation loss = 0.3680984377861023
Validation loss = 0.36785244941711426
Validation loss = 0.3622260093688965
Validation loss = 0.3650173544883728
Validation loss = 0.37052395939826965
Validation loss = 0.36292147636413574
Validation loss = 0.37235918641090393
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.37276750802993774
Validation loss = 0.3655281662940979
Validation loss = 0.3656560778617859
Validation loss = 0.3761776387691498
Validation loss = 0.3699337840080261
Validation loss = 0.36732807755470276
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.37798023223876953
Validation loss = 0.35650309920310974
Validation loss = 0.35515326261520386
Validation loss = 0.3650112748146057
Validation loss = 0.3607577681541443
Validation loss = 0.3649345934391022
Validation loss = 0.3694819509983063
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3897201418876648
Validation loss = 0.36561691761016846
Validation loss = 0.3660896122455597
Validation loss = 0.36735352873802185
Validation loss = 0.36625587940216064
Validation loss = 0.3716280460357666
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.148   |
| Iteration     | 5        |
| MaximumReturn | -0.0194  |
| MinimumReturn | -0.889   |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3995581269264221
Validation loss = 0.3780069351196289
Validation loss = 0.38316506147384644
Validation loss = 0.3843151926994324
Validation loss = 0.38943377137184143
Validation loss = 0.38298386335372925
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3857671022415161
Validation loss = 0.3740386366844177
Validation loss = 0.37905678153038025
Validation loss = 0.38278621435165405
Validation loss = 0.3826929032802582
Validation loss = 0.37893468141555786
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3863750994205475
Validation loss = 0.3798828423023224
Validation loss = 0.3756389021873474
Validation loss = 0.3739086985588074
Validation loss = 0.3865850865840912
Validation loss = 0.3792448043823242
Validation loss = 0.3815948963165283
Validation loss = 0.3844735324382782
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.37471017241477966
Validation loss = 0.3707377314567566
Validation loss = 0.37300196290016174
Validation loss = 0.3736143410205841
Validation loss = 0.3737412691116333
Validation loss = 0.38011905550956726
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3937000334262848
Validation loss = 0.3748966455459595
Validation loss = 0.37396594882011414
Validation loss = 0.38247713446617126
Validation loss = 0.38535791635513306
Validation loss = 0.3777538239955902
Validation loss = 0.3816268742084503
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.611   |
| Iteration     | 6        |
| MaximumReturn | -0.0725  |
| MinimumReturn | -1.68    |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.39254602789878845
Validation loss = 0.3883959949016571
Validation loss = 0.38861414790153503
Validation loss = 0.3914550840854645
Validation loss = 0.3939422070980072
Validation loss = 0.38671937584877014
Validation loss = 0.39129385352134705
Validation loss = 0.39202114939689636
Validation loss = 0.3905780017375946
Validation loss = 0.3972473442554474
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.38405469059944153
Validation loss = 0.37863364815711975
Validation loss = 0.3833625316619873
Validation loss = 0.3800787925720215
Validation loss = 0.38711392879486084
Validation loss = 0.383765310049057
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.38320061564445496
Validation loss = 0.38264742493629456
Validation loss = 0.38320934772491455
Validation loss = 0.3835456669330597
Validation loss = 0.3853868544101715
Validation loss = 0.3822164535522461
Validation loss = 0.39093565940856934
Validation loss = 0.3904222548007965
Validation loss = 0.38980627059936523
Validation loss = 0.3950655460357666
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.39035511016845703
Validation loss = 0.38230013847351074
Validation loss = 0.3774476945400238
Validation loss = 0.38374781608581543
Validation loss = 0.38199272751808167
Validation loss = 0.3784351646900177
Validation loss = 0.3838438093662262
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.38902315497398376
Validation loss = 0.387273907661438
Validation loss = 0.3833216726779938
Validation loss = 0.38466131687164307
Validation loss = 0.38300105929374695
Validation loss = 0.3861267566680908
Validation loss = 0.386775404214859
Validation loss = 0.3869664669036865
Validation loss = 0.3907260000705719
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.39    |
| Iteration     | 7        |
| MaximumReturn | -0.0534  |
| MinimumReturn | -25.3    |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.38933685421943665
Validation loss = 0.38899296522140503
Validation loss = 0.3943027853965759
Validation loss = 0.40019962191581726
Validation loss = 0.4006107449531555
Validation loss = 0.39526039361953735
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3848930895328522
Validation loss = 0.3829663097858429
Validation loss = 0.38223451375961304
Validation loss = 0.38623207807540894
Validation loss = 0.388163298368454
Validation loss = 0.38912439346313477
Validation loss = 0.39777374267578125
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.39392706751823425
Validation loss = 0.389567106962204
Validation loss = 0.3910071849822998
Validation loss = 0.3978671729564667
Validation loss = 0.39911553263664246
Validation loss = 0.394409716129303
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3911166489124298
Validation loss = 0.3885258734226227
Validation loss = 0.38405776023864746
Validation loss = 0.3839775621891022
Validation loss = 0.3873865604400635
Validation loss = 0.3895849287509918
Validation loss = 0.3865533769130707
Validation loss = 0.38896268606185913
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.38596540689468384
Validation loss = 0.40013620257377625
Validation loss = 0.39519163966178894
Validation loss = 0.39419788122177124
Validation loss = 0.3899412751197815
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.008547008547008548
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00851063829787234
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.00847457627118644
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008438818565400843
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008403361344537815
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008368200836820083
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008333333333333333
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.008298755186721992
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.01652892561983471
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01646090534979424
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01639344262295082
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0163265306122449
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016260162601626018
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016194331983805668
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016129032258064516
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01606425702811245
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.016
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -73.2    |
| Iteration     | 8        |
| MaximumReturn | -0.025   |
| MinimumReturn | -119     |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.39295694231987
Validation loss = 0.3886108994483948
Validation loss = 0.39267802238464355
Validation loss = 0.3969636559486389
Validation loss = 0.397693395614624
Validation loss = 0.3920784294605255
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3839559555053711
Validation loss = 0.38287389278411865
Validation loss = 0.3878360986709595
Validation loss = 0.38665854930877686
Validation loss = 0.38698917627334595
Validation loss = 0.39193862676620483
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.39004087448120117
Validation loss = 0.3929377794265747
Validation loss = 0.39488446712493896
Validation loss = 0.3924512565135956
Validation loss = 0.40378981828689575
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3892327547073364
Validation loss = 0.38430121541023254
Validation loss = 0.3899693489074707
Validation loss = 0.3883759379386902
Validation loss = 0.38582563400268555
Validation loss = 0.38253262639045715
Validation loss = 0.38930743932724
Validation loss = 0.3999941349029541
Validation loss = 0.39326000213623047
Validation loss = 0.3963787853717804
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3876723051071167
Validation loss = 0.38788267970085144
Validation loss = 0.3933252692222595
Validation loss = 0.38985583186149597
Validation loss = 0.3944527506828308
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01593625498007968
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.01984126984126984
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019762845849802372
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01968503937007874
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0196078431372549
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01953125
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019455252918287938
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01937984496124031
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.023166023166023165
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.023076923076923078
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022988505747126436
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022900763358778626
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022813688212927757
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022727272727272728
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022641509433962263
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022556390977443608
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02247191011235955
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022388059701492536
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022304832713754646
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.022222222222222223
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02214022140221402
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.025735294117647058
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.029304029304029304
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.029197080291970802
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02909090909090909
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26.6    |
| Iteration     | 9        |
| MaximumReturn | -0.254   |
| MinimumReturn | -99.4    |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4014514088630676
Validation loss = 0.40315738320350647
Validation loss = 0.40527433156967163
Validation loss = 0.4055866003036499
Validation loss = 0.3998297452926636
Validation loss = 0.4051819443702698
Validation loss = 0.4048713147640228
Validation loss = 0.4133511483669281
Validation loss = 0.40769001841545105
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.38749992847442627
Validation loss = 0.39040812849998474
Validation loss = 0.38508254289627075
Validation loss = 0.38654768466949463
Validation loss = 0.39008885622024536
Validation loss = 0.3936140835285187
Validation loss = 0.39343684911727905
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4062025547027588
Validation loss = 0.4046713411808014
Validation loss = 0.3928379714488983
Validation loss = 0.39492881298065186
Validation loss = 0.40047088265419006
Validation loss = 0.3998490273952484
Validation loss = 0.3947499990463257
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4118351936340332
Validation loss = 0.3969147205352783
Validation loss = 0.40396958589553833
Validation loss = 0.4059920608997345
Validation loss = 0.4027009606361389
Validation loss = 0.3998173177242279
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4022117555141449
Validation loss = 0.38862308859825134
Validation loss = 0.3998337984085083
Validation loss = 0.3954344391822815
Validation loss = 0.3963354527950287
Validation loss = 0.3990277945995331
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.028985507246376812
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02888086642599278
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02877697841726619
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02867383512544803
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02857142857142857
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.03558718861209965
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03546099290780142
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0353356890459364
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.04225352112676056
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.042105263157894736
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.045454545454545456
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.05226480836236934
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052083333333333336
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.05536332179930796
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05517241379310345
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.058419243986254296
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05821917808219178
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05802047781569966
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05782312925170068
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0576271186440678
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057432432432432436
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05723905723905724
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.06040268456375839
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.06354515050167224
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -46.4    |
| Iteration     | 10       |
| MaximumReturn | -1.21    |
| MinimumReturn | -113     |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.394389808177948
Validation loss = 0.39076653122901917
Validation loss = 0.4010388255119324
Validation loss = 0.4005768299102783
Validation loss = 0.39573603868484497
Validation loss = 0.3977743983268738
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3914831578731537
Validation loss = 0.389339417219162
Validation loss = 0.3861052095890045
Validation loss = 0.39243072271347046
Validation loss = 0.3910515308380127
Validation loss = 0.3899860978126526
Validation loss = 0.3821541666984558
Validation loss = 0.3858127295970917
Validation loss = 0.3900200426578522
Validation loss = 0.39147424697875977
Validation loss = 0.38642507791519165
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4198838770389557
Validation loss = 0.3873412013053894
Validation loss = 0.3939836025238037
Validation loss = 0.4023084044456482
Validation loss = 0.4103057384490967
Validation loss = 0.38706234097480774
Validation loss = 0.39612898230552673
Validation loss = 0.39932313561439514
Validation loss = 0.3981361985206604
Validation loss = 0.3982136845588684
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.39249104261398315
Validation loss = 0.4021909832954407
Validation loss = 0.3821183145046234
Validation loss = 0.397438645362854
Validation loss = 0.38796085119247437
Validation loss = 0.40791982412338257
Validation loss = 0.39695116877555847
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.39856013655662537
Validation loss = 0.3913542330265045
Validation loss = 0.38817134499549866
Validation loss = 0.39530640840530396
Validation loss = 0.3924756646156311
Validation loss = 0.39244288206100464
Validation loss = 0.397280752658844
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06312292358803986
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.0695364238410596
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06930693069306931
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06907894736842106
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.07213114754098361
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0718954248366013
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.0749185667752443
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.07792207792207792
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07766990291262135
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08064516129032258
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08038585209003216
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08333333333333333
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08306709265175719
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08280254777070063
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08571428571428572
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08544303797468354
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08832807570977919
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09119496855345911
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09090909090909091
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09375
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09657320872274143
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09937888198757763
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1021671826625387
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10185185185185185
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10153846153846154
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -75.8    |
| Iteration     | 11       |
| MaximumReturn | -2.96    |
| MinimumReturn | -98      |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.38452577590942383
Validation loss = 0.38343364000320435
Validation loss = 0.3877849578857422
Validation loss = 0.386570006608963
Validation loss = 0.4059911370277405
Validation loss = 0.39602434635162354
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3788570761680603
Validation loss = 0.3719503581523895
Validation loss = 0.3784370422363281
Validation loss = 0.3765808045864105
Validation loss = 0.38221457600593567
Validation loss = 0.38232386112213135
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.40839487314224243
Validation loss = 0.38016170263290405
Validation loss = 0.39274758100509644
Validation loss = 0.3913659155368805
Validation loss = 0.3933028280735016
Validation loss = 0.3956875205039978
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.36941468715667725
Validation loss = 0.3781856298446655
Validation loss = 0.3909829556941986
Validation loss = 0.38487178087234497
Validation loss = 0.3853527307510376
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.38031482696533203
Validation loss = 0.37773433327674866
Validation loss = 0.37550589442253113
Validation loss = 0.37413543462753296
Validation loss = 0.37810367345809937
Validation loss = 0.3741399347782135
Validation loss = 0.38214728236198425
Validation loss = 0.3851533532142639
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10122699386503067
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10091743119266056
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10060975609756098
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10030395136778116
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09969788519637462
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09939759036144578
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0990990990990991
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09880239520958084
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09850746268656717
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09821428571428571
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09792284866468842
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09763313609467456
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09734513274336283
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09705882352941177
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0967741935483871
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09649122807017543
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09620991253644315
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09593023255813954
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09565217391304348
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0953757225433526
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09510086455331412
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09482758620689655
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09455587392550144
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09428571428571429
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.378   |
| Iteration     | 12       |
| MaximumReturn | -0.0447  |
| MinimumReturn | -2.44    |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4034979045391083
Validation loss = 0.4064411222934723
Validation loss = 0.399376779794693
Validation loss = 0.4160395860671997
Validation loss = 0.40861865878105164
Validation loss = 0.4144156575202942
Validation loss = 0.4140881299972534
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.400776207447052
Validation loss = 0.40146008133888245
Validation loss = 0.38988667726516724
Validation loss = 0.3881666362285614
Validation loss = 0.3947168290615082
Validation loss = 0.3961893320083618
Validation loss = 0.3984152674674988
Validation loss = 0.41054418683052063
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3986026644706726
Validation loss = 0.3955821990966797
Validation loss = 0.39809149503707886
Validation loss = 0.39748379588127136
Validation loss = 0.4121304452419281
Validation loss = 0.407361775636673
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.39319801330566406
Validation loss = 0.3939376175403595
Validation loss = 0.3967303931713104
Validation loss = 0.4111810028553009
Validation loss = 0.4049592912197113
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3971494734287262
Validation loss = 0.3888601064682007
Validation loss = 0.38690420985221863
Validation loss = 0.40042704343795776
Validation loss = 0.4012065827846527
Validation loss = 0.394626259803772
Validation loss = 0.40543290972709656
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09686609686609686
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09943181818181818
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09915014164305949
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09887005649717515
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10140845070422536
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10393258426966293
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10644257703081232
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10614525139664804
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10863509749303621
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1111111111111111
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.11357340720221606
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1132596685082873
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.11570247933884298
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11538461538461539
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1178082191780822
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12021857923497267
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1226158038147139
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.125
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12737127371273713
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12702702702702703
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1293800539083558
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12903225806451613
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.128686327077748
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.13101604278074866
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13066666666666665
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -52.4    |
| Iteration     | 13       |
| MaximumReturn | -0.0967  |
| MinimumReturn | -92.2    |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4110099971294403
Validation loss = 0.4194681942462921
Validation loss = 0.42082738876342773
Validation loss = 0.4272472858428955
Validation loss = 0.4208326041698456
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4065047800540924
Validation loss = 0.40444740653038025
Validation loss = 0.4084444046020508
Validation loss = 0.40521466732025146
Validation loss = 0.4147005081176758
Validation loss = 0.41594573855400085
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4006391763687134
Validation loss = 0.4137934148311615
Validation loss = 0.4033614695072174
Validation loss = 0.4024622440338135
Validation loss = 0.4219212532043457
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4107823371887207
Validation loss = 0.40244269371032715
Validation loss = 0.403756707906723
Validation loss = 0.4122411012649536
Validation loss = 0.4089960753917694
Validation loss = 0.4166456162929535
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.40169063210487366
Validation loss = 0.40074336528778076
Validation loss = 0.417342871427536
Validation loss = 0.4090994894504547
Validation loss = 0.40022408962249756
Validation loss = 0.41122639179229736
Validation loss = 0.40961048007011414
Validation loss = 0.4119250774383545
Validation loss = 0.4166594445705414
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13031914893617022
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.129973474801061
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12962962962962962
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12928759894459102
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12894736842105264
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12860892388451445
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12827225130890052
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1279373368146214
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12760416666666666
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12727272727272726
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12694300518134716
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12661498708010335
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12628865979381443
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12596401028277635
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12564102564102564
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12531969309462915
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12468193384223919
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12436548223350254
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1240506329113924
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12373737373737374
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12342569269521411
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12311557788944724
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12280701754385964
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.125
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.09    |
| Iteration     | 14       |
| MaximumReturn | -0.0353  |
| MinimumReturn | -10.8    |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.42579028010368347
Validation loss = 0.41938433051109314
Validation loss = 0.4256766438484192
Validation loss = 0.4243539571762085
Validation loss = 0.4234568774700165
Validation loss = 0.42141908407211304
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4131391942501068
Validation loss = 0.41017675399780273
Validation loss = 0.4131756126880646
Validation loss = 0.4205254912376404
Validation loss = 0.4164251983165741
Validation loss = 0.4177260100841522
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.41309675574302673
Validation loss = 0.4059043228626251
Validation loss = 0.41255855560302734
Validation loss = 0.41951242089271545
Validation loss = 0.4255933463573456
Validation loss = 0.4247381389141083
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.40242379903793335
Validation loss = 0.4135966897010803
Validation loss = 0.41499561071395874
Validation loss = 0.41472533345222473
Validation loss = 0.42303192615509033
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.41156381368637085
Validation loss = 0.41636568307876587
Validation loss = 0.4164237678050995
Validation loss = 0.41471603512763977
Validation loss = 0.418857216835022
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12468827930174564
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12437810945273632
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12406947890818859
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12623762376237624
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12839506172839507
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.13054187192118227
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.1375921375921376
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13725490196078433
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13691931540342298
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13658536585365855
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1362530413625304
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13592233009708737
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13559322033898305
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13526570048309178
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13493975903614458
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1346153846153846
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1366906474820144
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13636363636363635
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1360381861575179
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1357142857142857
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13539192399049882
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13507109004739337
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.13711583924349882
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13679245283018868
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13647058823529412
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.28    |
| Iteration     | 15       |
| MaximumReturn | -0.0844  |
| MinimumReturn | -69.9    |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4328742027282715
Validation loss = 0.4308198392391205
Validation loss = 0.42753416299819946
Validation loss = 0.43317508697509766
Validation loss = 0.4348890483379364
Validation loss = 0.4351955056190491
Validation loss = 0.44382795691490173
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.41123268008232117
Validation loss = 0.4169418215751648
Validation loss = 0.4181215465068817
Validation loss = 0.4187413156032562
Validation loss = 0.4251317083835602
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4187891185283661
Validation loss = 0.4251483082771301
Validation loss = 0.42226698994636536
Validation loss = 0.4367842972278595
Validation loss = 0.4250665307044983
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.405712366104126
Validation loss = 0.42604827880859375
Validation loss = 0.42975443601608276
Validation loss = 0.421659380197525
Validation loss = 0.42737966775894165
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4206174314022064
Validation loss = 0.41917768120765686
Validation loss = 0.4249020218849182
Validation loss = 0.42529231309890747
Validation loss = 0.43776196241378784
Validation loss = 0.42524459958076477
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13615023474178403
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1358313817330211
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.1425233644859813
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14219114219114218
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.14418604651162792
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14385150812064965
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14351851851851852
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14318706697459585
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14285714285714285
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.14482758620689656
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.14678899082568808
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14645308924485126
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1461187214611872
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1480637813211845
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14772727272727273
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1473922902494331
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14705882352941177
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14672686230248308
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1463963963963964
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.14831460674157304
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.15246636771300448
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15212527964205816
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15178571428571427
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15367483296213807
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15333333333333332
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.1    |
| Iteration     | 16       |
| MaximumReturn | -0.0478  |
| MinimumReturn | -108     |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.43209388852119446
Validation loss = 0.43518683314323425
Validation loss = 0.44085410237312317
Validation loss = 0.43595510721206665
Validation loss = 0.44596827030181885
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4238058924674988
Validation loss = 0.42419755458831787
Validation loss = 0.42568257451057434
Validation loss = 0.42653003334999084
Validation loss = 0.42855721712112427
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4197308421134949
Validation loss = 0.4347185790538788
Validation loss = 0.42847803235054016
Validation loss = 0.4286915957927704
Validation loss = 0.4342789947986603
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.42052772641181946
Validation loss = 0.4182129204273224
Validation loss = 0.42478084564208984
Validation loss = 0.4251284599304199
Validation loss = 0.43117931485176086
Validation loss = 0.43014535307884216
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.424824595451355
Validation loss = 0.42766496539115906
Validation loss = 0.4257674813270569
Validation loss = 0.4262186586856842
Validation loss = 0.43869930505752563
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15299334811529933
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15265486725663716
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.152317880794702
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15198237885462554
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15384615384615385
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15570175438596492
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1575492341356674
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15938864628820962
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16122004357298475
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1608695652173913
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16052060737527116
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16233766233766234
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16414686825053995
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16379310344827586
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16559139784946236
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16523605150214593
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16488222698072805
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.17094017094017094
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17270788912579957
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17446808510638298
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1740976645435244
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17584745762711865
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.17970401691331925
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18143459915611815
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1831578947368421
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -29.6    |
| Iteration     | 17       |
| MaximumReturn | -0.12    |
| MinimumReturn | -67.9    |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.44093388319015503
Validation loss = 0.4442388415336609
Validation loss = 0.44913250207901
Validation loss = 0.44660016894340515
Validation loss = 0.448807030916214
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4242636561393738
Validation loss = 0.4256732761859894
Validation loss = 0.42935922741889954
Validation loss = 0.426462322473526
Validation loss = 0.43490299582481384
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.42688241600990295
Validation loss = 0.428958535194397
Validation loss = 0.4284449815750122
Validation loss = 0.4384322464466095
Validation loss = 0.4424077272415161
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.43634071946144104
Validation loss = 0.4296298623085022
Validation loss = 0.4374886751174927
Validation loss = 0.4383902847766876
Validation loss = 0.43291985988616943
Validation loss = 0.4400408864021301
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.42855843901634216
Validation loss = 0.42430049180984497
Validation loss = 0.43621790409088135
Validation loss = 0.43254920840263367
Validation loss = 0.4354870915412903
Validation loss = 0.4381864666938782
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18487394957983194
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18658280922431866
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18619246861924685
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18789144050104384
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18958333333333333
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1891891891891892
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.1950207468879668
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.19668737060041408
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.19834710743801653
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1979381443298969
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.19958847736625515
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20123203285420946
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2028688524590164
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20449897750511248
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20408163265306123
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20570264765784113
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20528455284552846
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20689655172413793
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20647773279352227
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20606060606060606
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20766129032258066
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20724346076458752
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20682730923694778
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20841683366733466
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.21
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -56.4    |
| Iteration     | 18       |
| MaximumReturn | -0.176   |
| MinimumReturn | -127     |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.444191038608551
Validation loss = 0.44352391362190247
Validation loss = 0.44210314750671387
Validation loss = 0.44812673330307007
Validation loss = 0.45181113481521606
Validation loss = 0.454487144947052
Validation loss = 0.45240262150764465
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4242469072341919
Validation loss = 0.4279234707355499
Validation loss = 0.43647241592407227
Validation loss = 0.4423156678676605
Validation loss = 0.4321715831756592
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4311619699001312
Validation loss = 0.42563098669052124
Validation loss = 0.43407291173934937
Validation loss = 0.4610969126224518
Validation loss = 0.43620479106903076
Validation loss = 0.44024136662483215
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.43234819173812866
Validation loss = 0.43309569358825684
Validation loss = 0.4306800067424774
Validation loss = 0.43808674812316895
Validation loss = 0.43814972043037415
Validation loss = 0.4382222890853882
Validation loss = 0.4460243582725525
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4266313910484314
Validation loss = 0.43849337100982666
Validation loss = 0.42729389667510986
Validation loss = 0.4455983638763428
Validation loss = 0.44127628207206726
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2155688622754491
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2151394422310757
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.21669980119284293
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21626984126984128
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21584158415841584
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21541501976284586
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21499013806706113
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.22244094488188976
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2220039292730845
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22156862745098038
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22113502935420742
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2265625
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22612085769980506
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22568093385214008
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22524271844660193
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22674418604651161
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.23210831721470018
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2335907335907336
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23314065510597304
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2326923076923077
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23416506717850288
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23371647509578544
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2332695984703633
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2366412213740458
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.24
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -69.6    |
| Iteration     | 19       |
| MaximumReturn | -0.215   |
| MinimumReturn | -126     |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4518405795097351
Validation loss = 0.4457484483718872
Validation loss = 0.4477597177028656
Validation loss = 0.45130544900894165
Validation loss = 0.45176923274993896
Validation loss = 0.46833327412605286
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4256856143474579
Validation loss = 0.4367278516292572
Validation loss = 0.43720537424087524
Validation loss = 0.4352743625640869
Validation loss = 0.43706372380256653
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4324394166469574
Validation loss = 0.4336327612400055
Validation loss = 0.4370880722999573
Validation loss = 0.4407120645046234
Validation loss = 0.44910967350006104
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.43338292837142944
Validation loss = 0.43938738107681274
Validation loss = 0.4488970637321472
Validation loss = 0.4508664906024933
Validation loss = 0.44514232873916626
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4347108006477356
Validation loss = 0.4334721565246582
Validation loss = 0.43284696340560913
Validation loss = 0.4424114525318146
Validation loss = 0.436700701713562
Validation loss = 0.4373510777950287
Validation loss = 0.4536041021347046
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23954372623574144
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2409867172675522
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24242424242424243
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24196597353497165
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24150943396226415
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.24670433145009416
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24812030075187969
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24953095684803
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2546816479400749
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.25981308411214954
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2593283582089552
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.26629422718808193
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26579925650557623
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.26716141001855287
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.26851851851851855
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2698706099815157
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.27121771217712176
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27071823204419887
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2702205882352941
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.27155963302752295
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.27289377289377287
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27239488117001825
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2737226277372263
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2750455373406193
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.27636363636363637
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -53.3    |
| Iteration     | 20       |
| MaximumReturn | -0.0638  |
| MinimumReturn | -148     |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.45076364278793335
Validation loss = 0.45188209414482117
Validation loss = 0.45581749081611633
Validation loss = 0.4547266364097595
Validation loss = 0.4532615840435028
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4371451139450073
Validation loss = 0.43451210856437683
Validation loss = 0.43506109714508057
Validation loss = 0.44228595495224
Validation loss = 0.44077521562576294
Validation loss = 0.4404834508895874
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4387771487236023
Validation loss = 0.4323030114173889
Validation loss = 0.4385118782520294
Validation loss = 0.4435982406139374
Validation loss = 0.44393694400787354
Validation loss = 0.4445659816265106
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4442296028137207
Validation loss = 0.4489370882511139
Validation loss = 0.4440270662307739
Validation loss = 0.4456998407840729
Validation loss = 0.4455299377441406
Validation loss = 0.4485803246498108
Validation loss = 0.4539967179298401
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.43704357743263245
Validation loss = 0.44126802682876587
Validation loss = 0.44078001379966736
Validation loss = 0.4401722550392151
Validation loss = 0.44397881627082825
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2776769509981851
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27717391304347827
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.27848101265822783
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2779783393501805
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2828828828828829
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2823741007194245
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2836624775583483
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2849462365591398
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2898032200357782
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.29285714285714287
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.29590017825311943
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29537366548042704
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29484902309058614
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.29609929078014185
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2973451327433628
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3003533568904594
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.30158730158730157
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3028169014084507
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.30404217926186294
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.30701754385964913
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3064798598949212
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.3111888111888112
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3106457242582897
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.31010452961672474
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3130434782608696
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -86.9    |
| Iteration     | 21       |
| MaximumReturn | -29.3    |
| MinimumReturn | -141     |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.44671007990837097
Validation loss = 0.4513578414916992
Validation loss = 0.45562979578971863
Validation loss = 0.45635661482810974
Validation loss = 0.46105149388313293
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4342724680900574
Validation loss = 0.4415876865386963
Validation loss = 0.43810272216796875
Validation loss = 0.4399350881576538
Validation loss = 0.4405543804168701
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4337066411972046
Validation loss = 0.4416535496711731
Validation loss = 0.443276971578598
Validation loss = 0.44136276841163635
Validation loss = 0.45321521162986755
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.44354578852653503
Validation loss = 0.44765397906303406
Validation loss = 0.4663466811180115
Validation loss = 0.45564019680023193
Validation loss = 0.4540860652923584
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.43814125657081604
Validation loss = 0.4431246817111969
Validation loss = 0.4378836154937744
Validation loss = 0.44116342067718506
Validation loss = 0.44316744804382324
Validation loss = 0.44524088501930237
Validation loss = 0.44416993856430054
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3142361111111111
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.31369150779896016
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.314878892733564
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3160621761658031
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.31551724137931036
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.31669535283993117
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3178694158075601
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.31732418524871353
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3167808219178082
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.31794871794871793
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3174061433447099
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.31686541737649065
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3197278911564626
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3225806451612903
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3254237288135593
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.32656514382402707
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3260135135135135
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.33220910623946037
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.33164983164983164
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.33277310924369746
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.33221476510067116
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3316582914572864
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3311036789297659
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.33889816360601
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3416666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -27.9    |
| Iteration     | 22       |
| MaximumReturn | -0.147   |
| MinimumReturn | -97.7    |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.45119374990463257
Validation loss = 0.45973730087280273
Validation loss = 0.46310433745384216
Validation loss = 0.4698314070701599
Validation loss = 0.4602244794368744
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.44538578391075134
Validation loss = 0.4455404281616211
Validation loss = 0.43896952271461487
Validation loss = 0.44172757863998413
Validation loss = 0.4487206041812897
Validation loss = 0.45390525460243225
Validation loss = 0.4517485499382019
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.44515904784202576
Validation loss = 0.45077046751976013
Validation loss = 0.44055309891700745
Validation loss = 0.44495290517807007
Validation loss = 0.45039740204811096
Validation loss = 0.45601290464401245
Validation loss = 0.4593859612941742
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4479326605796814
Validation loss = 0.44750910997390747
Validation loss = 0.45518842339515686
Validation loss = 0.4523959159851074
Validation loss = 0.4567975401878357
Validation loss = 0.4598379135131836
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4469875395298004
Validation loss = 0.4467249810695648
Validation loss = 0.4461720585823059
Validation loss = 0.45031148195266724
Validation loss = 0.4449658989906311
Validation loss = 0.4516976475715637
Validation loss = 0.4565269351005554
Validation loss = 0.45816388726234436
Validation loss = 0.4596633017063141
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3410981697171381
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.34053156146179403
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.33996683250414594
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.3509933774834437
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.35206611570247937
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.35148514851485146
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.35090609555189456
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.35526315789473684
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.35467980295566504
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3540983606557377
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.35515548281505727
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3545751633986928
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3539967373572594
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3550488599348534
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.359349593495935
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.36038961038961037
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.36142625607779577
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.36084142394822005
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.36187399030694667
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.3741935483870968
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37359098228663445
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3729903536977492
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3723916532905297
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.375
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3744
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -41      |
| Iteration     | 23       |
| MaximumReturn | -0.0605  |
| MinimumReturn | -93.4    |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.46381083130836487
Validation loss = 0.45678505301475525
Validation loss = 0.46083369851112366
Validation loss = 0.4694386124610901
Validation loss = 0.4620889127254486
Validation loss = 0.46095985174179077
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.44858771562576294
Validation loss = 0.4430030286312103
Validation loss = 0.45070362091064453
Validation loss = 0.4578920006752014
Validation loss = 0.44943398237228394
Validation loss = 0.463348388671875
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.45203670859336853
Validation loss = 0.44950446486473083
Validation loss = 0.46144357323646545
Validation loss = 0.4520505964756012
Validation loss = 0.45193105936050415
Validation loss = 0.4737250804901123
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.45383161306381226
Validation loss = 0.4610322117805481
Validation loss = 0.45547622442245483
Validation loss = 0.45885905623435974
Validation loss = 0.46189427375793457
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4616616368293762
Validation loss = 0.4581477642059326
Validation loss = 0.4485456049442291
Validation loss = 0.45736822485923767
Validation loss = 0.45837610960006714
Validation loss = 0.46116504073143005
Validation loss = 0.45940548181533813
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3738019169329074
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37320574162679426
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37261146496815284
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37201907790143085
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37142857142857144
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37083993660855785
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.370253164556962
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3696682464454976
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.36908517350157727
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.36850393700787404
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.36792452830188677
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3673469387755102
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3667711598746082
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.36619718309859156
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.365625
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.36505460218408736
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3644859813084112
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.36391912908242613
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.36335403726708076
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3627906976744186
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3622291021671827
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3616692426584235
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3611111111111111
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3605546995377504
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.36
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.07    |
| Iteration     | 24       |
| MaximumReturn | -0.164   |
| MinimumReturn | -7.31    |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.45824503898620605
Validation loss = 0.46833810210227966
Validation loss = 0.46650469303131104
Validation loss = 0.46833282709121704
Validation loss = 0.4665400981903076
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4523847699165344
Validation loss = 0.4499044418334961
Validation loss = 0.45146480202674866
Validation loss = 0.44952479004859924
Validation loss = 0.4618391692638397
Validation loss = 0.4561515152454376
Validation loss = 0.4606591761112213
Validation loss = 0.46434521675109863
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4530448913574219
Validation loss = 0.4513465464115143
Validation loss = 0.46759840846061707
Validation loss = 0.4588466286659241
Validation loss = 0.46293389797210693
Validation loss = 0.4625416398048401
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.45412468910217285
Validation loss = 0.45781776309013367
Validation loss = 0.4609861969947815
Validation loss = 0.4627673625946045
Validation loss = 0.4594702422618866
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4602247476577759
Validation loss = 0.45635369420051575
Validation loss = 0.4596893787384033
Validation loss = 0.46413883566856384
Validation loss = 0.4619249105453491
Validation loss = 0.45957842469215393
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.36098310291858676
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3604294478527607
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.35987748851454826
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.35932721712538224
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.366412213740458
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3673780487804878
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3668188736681887
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3662613981762918
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.36570561456752654
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.36666666666666664
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.367624810892587
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3685800604229607
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.37104072398190047
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.37198795180722893
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.37293233082706767
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37237237237237236
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37181409295352325
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3727544910179641
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.37369207772795215
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3746268656716418
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.37555886736214605
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.37648809523809523
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37592867756315007
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3768545994065282
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.37777777777777777
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -46.3    |
| Iteration     | 25       |
| MaximumReturn | -0.183   |
| MinimumReturn | -126     |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.470981627702713
Validation loss = 0.4651719927787781
Validation loss = 0.4673951268196106
Validation loss = 0.4753960371017456
Validation loss = 0.47606027126312256
Validation loss = 0.4743245840072632
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.45519158244132996
Validation loss = 0.45990613102912903
Validation loss = 0.4688662588596344
Validation loss = 0.46346357464790344
Validation loss = 0.4628250300884247
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4564247131347656
Validation loss = 0.4593190848827362
Validation loss = 0.46320226788520813
Validation loss = 0.4670157730579376
Validation loss = 0.4696297347545624
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4623226225376129
Validation loss = 0.46476876735687256
Validation loss = 0.4616110622882843
Validation loss = 0.4630785286426544
Validation loss = 0.4678023159503937
Validation loss = 0.46987953782081604
Validation loss = 0.46958696842193604
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4624526798725128
Validation loss = 0.46571236848831177
Validation loss = 0.4652813971042633
Validation loss = 0.4759974777698517
Validation loss = 0.4765583276748657
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.378698224852071
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.37961595273264404
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.38200589970501475
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.38291605301914583
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.38676470588235295
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3876651982378855
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3885630498533724
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.38799414348462663
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.38742690058479534
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3883211678832117
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3877551020408163
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.388646288209607
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.38953488372093026
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3904208998548621
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.391304347826087
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3921852387843705
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3930635838150289
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3939393939393939
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.39481268011527376
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.39568345323741005
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.39655172413793105
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.39598278335724535
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3954154727793696
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.39628040057224606
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.39714285714285713
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -89.8    |
| Iteration     | 26       |
| MaximumReturn | -0.121   |
| MinimumReturn | -141     |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4763622283935547
Validation loss = 0.4724652171134949
Validation loss = 0.46971550583839417
Validation loss = 0.47262972593307495
Validation loss = 0.4732748866081238
Validation loss = 0.4821324944496155
Validation loss = 0.48374125361442566
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4530293047428131
Validation loss = 0.4541251063346863
Validation loss = 0.46327880024909973
Validation loss = 0.4688847064971924
Validation loss = 0.47088623046875
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.45731121301651
Validation loss = 0.45762699842453003
Validation loss = 0.46412020921707153
Validation loss = 0.47073936462402344
Validation loss = 0.47047528624534607
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.46033889055252075
Validation loss = 0.4676741361618042
Validation loss = 0.4629680812358856
Validation loss = 0.4743340015411377
Validation loss = 0.4689159095287323
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4577397108078003
Validation loss = 0.47007986903190613
Validation loss = 0.47094517946243286
Validation loss = 0.4697335362434387
Validation loss = 0.47049638628959656
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3980028530670471
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.39886039886039887
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.39971550497866287
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4005681818181818
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4014184397163121
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.40226628895184136
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4031117397454031
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4053672316384181
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.40620592383638926
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4070422535211268
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4078762306610408
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.40870786516853935
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.40953716690042075
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4103641456582633
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4125874125874126
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.41480446927374304
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.41562064156206413
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.41643454038997213
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4172461752433936
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.41944444444444445
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4188626907073509
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4196675900277008
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4218533886583679
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.42265193370165743
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.42344827586206896
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -99.2    |
| Iteration     | 27       |
| MaximumReturn | -0.686   |
| MinimumReturn | -146     |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4627341330051422
Validation loss = 0.47048425674438477
Validation loss = 0.4745747148990631
Validation loss = 0.4741307199001312
Validation loss = 0.48316922783851624
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4509643018245697
Validation loss = 0.4604404866695404
Validation loss = 0.4672147333621979
Validation loss = 0.4684689939022064
Validation loss = 0.46638163924217224
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.45827516913414
Validation loss = 0.4609560966491699
Validation loss = 0.4653303325176239
Validation loss = 0.4612802565097809
Validation loss = 0.469193696975708
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4587971270084381
Validation loss = 0.4725836217403412
Validation loss = 0.46836623549461365
Validation loss = 0.470360666513443
Validation loss = 0.469650000333786
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.46640530228614807
Validation loss = 0.4637707769870758
Validation loss = 0.4631028175354004
Validation loss = 0.46903279423713684
Validation loss = 0.46903446316719055
Validation loss = 0.4708220064640045
Validation loss = 0.47309812903404236
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.42424242424242425
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4250343878954608
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4258241758241758
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.42661179698216734
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4273972602739726
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.42818057455540354
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.42896174863387976
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4297407912687585
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.42915531335149865
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.42993197278911566
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.43070652173913043
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4314789687924016
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.43224932249322495
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4330175913396482
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4337837837837838
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.43454790823211875
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4353099730458221
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4347240915208614
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4368279569892473
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.44026845637583895
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4410187667560322
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4404283801874163
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.43983957219251335
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4392523364485981
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.44
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -82.5    |
| Iteration     | 28       |
| MaximumReturn | -0.168   |
| MinimumReturn | -149     |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4678124189376831
Validation loss = 0.4675614833831787
Validation loss = 0.4701313376426697
Validation loss = 0.47478678822517395
Validation loss = 0.474129855632782
Validation loss = 0.47585529088974
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4593600928783417
Validation loss = 0.4609951078891754
Validation loss = 0.4646480083465576
Validation loss = 0.4638324975967407
Validation loss = 0.4671221077442169
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4628522992134094
Validation loss = 0.4652975797653198
Validation loss = 0.4630189538002014
Validation loss = 0.46939098834991455
Validation loss = 0.4701095521450043
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4568813741207123
Validation loss = 0.47642219066619873
Validation loss = 0.4682689905166626
Validation loss = 0.46686476469039917
Validation loss = 0.4676404297351837
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.46857503056526184
Validation loss = 0.46365290880203247
Validation loss = 0.4655614495277405
Validation loss = 0.46698710322380066
Validation loss = 0.4693082869052887
Validation loss = 0.47684481739997864
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4394141145139814
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.43882978723404253
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4395750332005312
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4403183023872679
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4410596026490066
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4417989417989418
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.44121532364597094
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.44063324538258575
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4426877470355731
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.44473684210526315
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.44415243101182655
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4435695538057743
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4442988204456094
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.443717277486911
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.44575163398692813
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4451697127937337
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.44589308996088656
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.44921875
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4486345903771131
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.45064935064935063
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.45136186770428016
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.45077720207253885
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4501940491591203
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4496124031007752
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.44903225806451613
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -93.2    |
| Iteration     | 29       |
| MaximumReturn | -0.63    |
| MinimumReturn | -145     |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4711928069591522
Validation loss = 0.4700138568878174
Validation loss = 0.4719049036502838
Validation loss = 0.4761141836643219
Validation loss = 0.4776204824447632
Validation loss = 0.4844643473625183
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.46169421076774597
Validation loss = 0.4658983647823334
Validation loss = 0.46070653200149536
Validation loss = 0.468567430973053
Validation loss = 0.46832579374313354
Validation loss = 0.4692937135696411
Validation loss = 0.46786877512931824
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4615996181964874
Validation loss = 0.4621838629245758
Validation loss = 0.46804359555244446
Validation loss = 0.4688079059123993
Validation loss = 0.4687643349170685
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.47273674607276917
Validation loss = 0.466642826795578
Validation loss = 0.47009873390197754
Validation loss = 0.474038302898407
Validation loss = 0.47463834285736084
Validation loss = 0.480072945356369
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.468826562166214
Validation loss = 0.4711458086967468
Validation loss = 0.4787977635860443
Validation loss = 0.4703562557697296
Validation loss = 0.4712551236152649
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4497422680412371
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.45045045045045046
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.45115681233933164
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.45186136071887034
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.45256410256410257
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4532650448143406
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4539641943734015
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.454661558109834
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.45663265306122447
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4573248407643312
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.45674300254452926
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.45870393900889456
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4581218274111675
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.458808618504436
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.45949367088607596
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4589127686472819
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4595959595959596
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.45901639344262296
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.45843828715365237
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.46037735849056605
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4610552763819096
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4617314930991217
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46115288220551376
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46057571964956195
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.46125
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -135     |
| Iteration     | 30       |
| MaximumReturn | -56.1    |
| MinimumReturn | -168     |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4698721468448639
Validation loss = 0.4723837971687317
Validation loss = 0.4805428385734558
Validation loss = 0.4794989824295044
Validation loss = 0.4822697639465332
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4621410667896271
Validation loss = 0.46706530451774597
Validation loss = 0.4631401300430298
Validation loss = 0.470960795879364
Validation loss = 0.47043365240097046
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.45699945092201233
Validation loss = 0.4673844873905182
Validation loss = 0.4642049968242645
Validation loss = 0.47146308422088623
Validation loss = 0.47225597500801086
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.47278136014938354
Validation loss = 0.47082123160362244
Validation loss = 0.4703700840473175
Validation loss = 0.4750223159790039
Validation loss = 0.4773311913013458
Validation loss = 0.4752497673034668
Validation loss = 0.47613194584846497
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.464558482170105
Validation loss = 0.46816539764404297
Validation loss = 0.4686373174190521
Validation loss = 0.47614815831184387
Validation loss = 0.47250109910964966
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4606741573033708
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4613466334164589
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.46201743462017436
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4626865671641791
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.46335403726708074
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4652605459057072
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.4684014869888476
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.46905940594059403
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4684796044499382
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4691358024691358
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.468557336621455
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46798029556650245
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4698646986469865
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4705159705159705
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47116564417177914
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47181372549019607
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4724602203182375
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4731051344743276
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47374847374847373
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.474390243902439
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47503045066991473
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4744525547445255
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47509113001215064
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47451456310679613
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4763636363636364
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -96.7    |
| Iteration     | 31       |
| MaximumReturn | -0.172   |
| MinimumReturn | -157     |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.47200119495391846
Validation loss = 0.4795111119747162
Validation loss = 0.4752376973628998
Validation loss = 0.48021066188812256
Validation loss = 0.4772765636444092
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.46678340435028076
Validation loss = 0.46446681022644043
Validation loss = 0.46945878863334656
Validation loss = 0.4652359187602997
Validation loss = 0.47563880681991577
Validation loss = 0.4697667956352234
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4615058898925781
Validation loss = 0.46116581559181213
Validation loss = 0.464120477437973
Validation loss = 0.47427475452423096
Validation loss = 0.4708070755004883
Validation loss = 0.4745592772960663
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.46869713068008423
Validation loss = 0.4705243408679962
Validation loss = 0.47092336416244507
Validation loss = 0.47425732016563416
Validation loss = 0.47410622239112854
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4621562361717224
Validation loss = 0.47300758957862854
Validation loss = 0.47172489762306213
Validation loss = 0.48455530405044556
Validation loss = 0.473265677690506
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4782082324455206
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47762998790810157
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47705314009661837
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4764776839565742
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4759036144578313
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4753309265944645
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47475961538461536
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47418967587034816
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.473621103117506
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47305389221556887
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47248803827751196
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47192353643966545
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4713603818615752
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47079856972586415
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47023809523809523
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.469678953626635
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4691211401425178
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46856465005931197
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46800947867298576
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46745562130177515
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.466903073286052
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46635182998819363
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4658018867924528
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4652532391048292
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4647058823529412
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -132     |
| Iteration     | 32       |
| MaximumReturn | -55.3    |
| MinimumReturn | -178     |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4670330584049225
Validation loss = 0.472530335187912
Validation loss = 0.47898969054222107
Validation loss = 0.48049411177635193
Validation loss = 0.47637298703193665
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4643802344799042
Validation loss = 0.46251195669174194
Validation loss = 0.4707978367805481
Validation loss = 0.47051897644996643
Validation loss = 0.47195717692375183
Validation loss = 0.46818891167640686
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.46716785430908203
Validation loss = 0.4698145091533661
Validation loss = 0.47259730100631714
Validation loss = 0.46856406331062317
Validation loss = 0.47190946340560913
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4683191180229187
Validation loss = 0.4742293059825897
Validation loss = 0.4727536737918854
Validation loss = 0.47736746072769165
Validation loss = 0.48045191168785095
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4674183428287506
Validation loss = 0.4663841426372528
Validation loss = 0.47600647807121277
Validation loss = 0.4745922386646271
Validation loss = 0.47607144713401794
Validation loss = 0.47113823890686035
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46415981198589895
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.465962441314554
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.46776084407971863
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.46955503512880564
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46900584795321637
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46845794392523366
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4679113185530922
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.46853146853146854
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4691501746216531
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4686046511627907
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4692218350754936
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46867749419953597
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4681344148319815
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.4710648148148148
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4728323699421965
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4722863741339492
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4717416378316032
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47119815668202764
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4718066743383199
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4735632183908046
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47416762342135477
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4736238532110092
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47308132875143183
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47254004576659037
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4742857142857143
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -66.4    |
| Iteration     | 33       |
| MaximumReturn | -0.226   |
| MinimumReturn | -152     |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4769754111766815
Validation loss = 0.47504329681396484
Validation loss = 0.4797337055206299
Validation loss = 0.4797579348087311
Validation loss = 0.47933268547058105
Validation loss = 0.4814830422401428
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4680407643318176
Validation loss = 0.4685698449611664
Validation loss = 0.46956899762153625
Validation loss = 0.4794235825538635
Validation loss = 0.475371390581131
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4664488732814789
Validation loss = 0.4722624123096466
Validation loss = 0.46473732590675354
Validation loss = 0.47514575719833374
Validation loss = 0.4725821912288666
Validation loss = 0.4745144248008728
Validation loss = 0.48201823234558105
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4750700891017914
Validation loss = 0.4697974622249603
Validation loss = 0.48088064789772034
Validation loss = 0.4696832001209259
Validation loss = 0.4757492244243622
Validation loss = 0.48556968569755554
Validation loss = 0.47858914732933044
Validation loss = 0.48588865995407104
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4725634753704071
Validation loss = 0.47406452894210815
Validation loss = 0.47089654207229614
Validation loss = 0.47785717248916626
Validation loss = 0.47149690985679626
Validation loss = 0.4820877015590668
Validation loss = 0.48013439774513245
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4737442922374429
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4732041049030787
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47266514806378135
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.47440273037542663
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4738636363636364
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4733257661748014
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47278911564625853
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4733861834654587
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47285067873303166
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47231638418079097
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4717832957110609
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47125140924464487
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47072072072072074
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.47244094488188976
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47191011235955055
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4713804713804714
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47197309417040356
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47144456886898095
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.470917225950783
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47039106145251397
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46986607142857145
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4693422519509476
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4688195991091314
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4682981090100111
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4677777777777778
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -35.2    |
| Iteration     | 34       |
| MaximumReturn | -0.207   |
| MinimumReturn | -140     |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4768758714199066
Validation loss = 0.47628217935562134
Validation loss = 0.4796698987483978
Validation loss = 0.4823018014431
Validation loss = 0.4875336289405823
Validation loss = 0.4839150011539459
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.47883355617523193
Validation loss = 0.4706379771232605
Validation loss = 0.47660401463508606
Validation loss = 0.4723844826221466
Validation loss = 0.4771251678466797
Validation loss = 0.48646315932273865
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4783698320388794
Validation loss = 0.4820248782634735
Validation loss = 0.47622251510620117
Validation loss = 0.4790842831134796
Validation loss = 0.48652103543281555
Validation loss = 0.48468461632728577
Validation loss = 0.4835393726825714
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.479428231716156
Validation loss = 0.48294055461883545
Validation loss = 0.48288440704345703
Validation loss = 0.47862327098846436
Validation loss = 0.48939916491508484
Validation loss = 0.48293063044548035
Validation loss = 0.48476487398147583
Validation loss = 0.48634740710258484
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4798307716846466
Validation loss = 0.47732189297676086
Validation loss = 0.4832119047641754
Validation loss = 0.48117101192474365
Validation loss = 0.4806079566478729
Validation loss = 0.4839459955692291
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4672586015538291
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46674057649667405
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46622369878183834
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4657079646017699
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46519337016574586
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46467991169977924
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46416758544652703
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46365638766519823
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.46644664466446645
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46593406593406594
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4654226125137212
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4649122807017544
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4644030668127054
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4660831509846827
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46557377049180326
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4650655021834061
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.46673936750272627
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4662309368191721
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4657236126224157
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4652173913043478
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46471226927252984
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.46746203904555317
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.46912242686890576
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4686147186147186
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4702702702702703
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -61.7    |
| Iteration     | 35       |
| MaximumReturn | -0.263   |
| MinimumReturn | -184     |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49515047669410706
Validation loss = 0.47965675592422485
Validation loss = 0.4810084104537964
Validation loss = 0.48333460092544556
Validation loss = 0.48731642961502075
Validation loss = 0.4875740706920624
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4741705358028412
Validation loss = 0.47113075852394104
Validation loss = 0.47263532876968384
Validation loss = 0.4812096953392029
Validation loss = 0.48309090733528137
Validation loss = 0.4833983778953552
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.47549211978912354
Validation loss = 0.4788140058517456
Validation loss = 0.47936227917671204
Validation loss = 0.48252543807029724
Validation loss = 0.48110896348953247
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.48927605152130127
Validation loss = 0.48795726895332336
Validation loss = 0.48908135294914246
Validation loss = 0.4888119697570801
Validation loss = 0.49833327531814575
Validation loss = 0.49666541814804077
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4869929552078247
Validation loss = 0.4776301085948944
Validation loss = 0.4830365777015686
Validation loss = 0.48343655467033386
Validation loss = 0.4925454258918762
Validation loss = 0.49058571457862854
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46976241900647947
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4714131607335491
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.47521551724137934
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.47685683530678147
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47634408602150535
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.47798066595059074
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47746781115879827
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4780278670953912
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47751605995717344
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.47914438502673795
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4797008547008547
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4791889007470651
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47867803837953093
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47816826411075614
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4776595744680851
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4771519659936238
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.47876857749469215
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47932131495228
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.4841101694915254
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4857142857142857
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48520084566596194
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.48680042238648363
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4873417721518987
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.48893572181243417
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.48947368421052634
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.2    |
| Iteration     | 36       |
| MaximumReturn | -0.44    |
| MinimumReturn | -81.1    |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.48398512601852417
Validation loss = 0.48579511046409607
Validation loss = 0.4884348511695862
Validation loss = 0.49292850494384766
Validation loss = 0.49129775166511536
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.47719013690948486
Validation loss = 0.4777284264564514
Validation loss = 0.4794759154319763
Validation loss = 0.4885821044445038
Validation loss = 0.49056553840637207
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.48052388429641724
Validation loss = 0.4799949526786804
Validation loss = 0.4843909740447998
Validation loss = 0.4911167323589325
Validation loss = 0.48720160126686096
Validation loss = 0.4946458637714386
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49126753211021423
Validation loss = 0.4896838068962097
Validation loss = 0.4893932044506073
Validation loss = 0.49427202343940735
Validation loss = 0.49491143226623535
Validation loss = 0.497737854719162
Validation loss = 0.4969962239265442
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.48411449790000916
Validation loss = 0.49012893438339233
Validation loss = 0.4854656159877777
Validation loss = 0.4892972707748413
Validation loss = 0.48887085914611816
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4889589905362776
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.49264705882352944
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.49527806925498424
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4947589098532495
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.49842931937172774
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5010460251046025
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5015673981191222
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5010438413361169
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5015641293013556
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5010416666666667
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5036420395421436
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5031185031185031
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5046728971962616
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5072614107883817
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.5139896373056995
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5134575569358178
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5129265770423992
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5144628099173554
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5139318885448917
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5164948453608248
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.5211122554067971
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5216049382716049
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5210688591983555
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5205338809034907
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.52
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.65    |
| Iteration     | 37       |
| MaximumReturn | -0.265   |
| MinimumReturn | -51.6    |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5021385550498962
Validation loss = 0.4953846335411072
Validation loss = 0.49457448720932007
Validation loss = 0.4948107600212097
Validation loss = 0.5021263360977173
Validation loss = 0.4985303282737732
Validation loss = 0.5042360424995422
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4883192777633667
Validation loss = 0.48404407501220703
Validation loss = 0.48682522773742676
Validation loss = 0.48760029673576355
Validation loss = 0.4904565215110779
Validation loss = 0.4941195845603943
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.48799026012420654
Validation loss = 0.4919118285179138
Validation loss = 0.49871963262557983
Validation loss = 0.4951654076576233
Validation loss = 0.49700337648391724
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49883851408958435
Validation loss = 0.4989667534828186
Validation loss = 0.5039129853248596
Validation loss = 0.4998664855957031
Validation loss = 0.49967747926712036
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.49035555124282837
Validation loss = 0.48872053623199463
Validation loss = 0.48940688371658325
Validation loss = 0.4923701286315918
Validation loss = 0.49614864587783813
Validation loss = 0.4936830997467041
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5215163934426229
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.526100307062436
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.5316973415132924
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5311542390194075
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.5357142857142857
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5372069317023446
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5366598778004074
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5361139369277721
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5376016260162602
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5370558375634518
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.537525354969574
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5379939209726444
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5374493927125507
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.538928210313448
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5424242424242425
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.5469223007063572
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5463709677419355
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5468277945619335
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5503018108651911
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5507537688442211
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5502008032128514
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5496489468405216
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5490981963927856
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5485485485485485
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.552
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.19    |
| Iteration     | 38       |
| MaximumReturn | -0.134   |
| MinimumReturn | -15.4    |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4986153244972229
Validation loss = 0.4981512725353241
Validation loss = 0.4981127679347992
Validation loss = 0.49914881587028503
Validation loss = 0.5037178993225098
Validation loss = 0.5002350807189941
Validation loss = 0.5058102607727051
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.49443379044532776
Validation loss = 0.49567365646362305
Validation loss = 0.4932544529438019
Validation loss = 0.4956187605857849
Validation loss = 0.49068665504455566
Validation loss = 0.4991197884082794
Validation loss = 0.5046656727790833
Validation loss = 0.4992646872997284
Validation loss = 0.5034694671630859
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49270397424697876
Validation loss = 0.4938254952430725
Validation loss = 0.4929659962654114
Validation loss = 0.5028473734855652
Validation loss = 0.49961501359939575
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5023537278175354
Validation loss = 0.5069530010223389
Validation loss = 0.5041680335998535
Validation loss = 0.5020409226417542
Validation loss = 0.5033583045005798
Validation loss = 0.5092462301254272
Validation loss = 0.5037775039672852
Validation loss = 0.5096532106399536
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.49908071756362915
Validation loss = 0.494391530752182
Validation loss = 0.49606555700302124
Validation loss = 0.4941214919090271
Validation loss = 0.4955631494522095
Validation loss = 0.5024083852767944
Validation loss = 0.5055912733078003
Validation loss = 0.502009928226471
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5514485514485514
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5548902195608783
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5543369890329013
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5537848605577689
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.554228855721393
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5556660039761432
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5580933465739821
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5575396825396826
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5569871159563925
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5574257425742575
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5568743818001978
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5573122529644269
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5567620927936822
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5562130177514792
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5596059113300492
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5600393700787402
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5614552605703048
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5618860510805501
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5613346418056918
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5617647058823529
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5621939275220372
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5626223091976517
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5620723362658846
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5654296875
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5648780487804878
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26.1    |
| Iteration     | 39       |
| MaximumReturn | -0.156   |
| MinimumReturn | -119     |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5118556618690491
Validation loss = 0.5017414093017578
Validation loss = 0.504271388053894
Validation loss = 0.5050198435783386
Validation loss = 0.5102641582489014
Validation loss = 0.5103869438171387
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5065751075744629
Validation loss = 0.4950335919857025
Validation loss = 0.49923011660575867
Validation loss = 0.5151274800300598
Validation loss = 0.5031072497367859
Validation loss = 0.5021945238113403
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4956377148628235
Validation loss = 0.49740251898765564
Validation loss = 0.5005789995193481
Validation loss = 0.4999021589756012
Validation loss = 0.5074052810668945
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5103905200958252
Validation loss = 0.5040799379348755
Validation loss = 0.5047997236251831
Validation loss = 0.5048177242279053
Validation loss = 0.5062259435653687
Validation loss = 0.5107672214508057
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5072684288024902
Validation loss = 0.4983181655406952
Validation loss = 0.4997248351573944
Validation loss = 0.5004618763923645
Validation loss = 0.5065062046051025
Validation loss = 0.5041998624801636
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5662768031189084
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5657254138266796
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5651750972762646
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.564625850340136
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5660194174757281
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.565470417070805
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.564922480620155
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5663117134559535
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5667311411992263
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5661835748792271
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5656370656370656
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5650916104146577
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5664739884393064
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5659287776708374
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5673076923076923
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5686839577329491
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5700575815738963
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5695110258868649
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5689655172413793
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5684210526315789
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5678776290630975
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5682903533906399
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5677480916030534
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.567206863679695
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5685714285714286
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -21.1    |
| Iteration     | 40       |
| MaximumReturn | -0.152   |
| MinimumReturn | -129     |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5066115856170654
Validation loss = 0.5068966746330261
Validation loss = 0.5087613463401794
Validation loss = 0.5089800953865051
Validation loss = 0.5105915069580078
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5040522813796997
Validation loss = 0.5068604350090027
Validation loss = 0.5061648488044739
Validation loss = 0.5096678733825684
Validation loss = 0.5124972462654114
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5247994065284729
Validation loss = 0.5043032765388489
Validation loss = 0.5033413767814636
Validation loss = 0.5075674057006836
Validation loss = 0.5063358545303345
Validation loss = 0.5139235854148865
Validation loss = 0.5123400092124939
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5099337100982666
Validation loss = 0.511396586894989
Validation loss = 0.5066929459571838
Validation loss = 0.5210922360420227
Validation loss = 0.5194928050041199
Validation loss = 0.5184217691421509
Validation loss = 0.5143340826034546
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4997429847717285
Validation loss = 0.5109912157058716
Validation loss = 0.5068625211715698
Validation loss = 0.5056384205818176
Validation loss = 0.511459231376648
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5689819219790676
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5684410646387833
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5679012345679012
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5683111954459203
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5677725118483412
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.5748106060606061
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5761589403973509
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5765595463137996
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5779036827195467
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5792452830188679
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.586239396795476
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5856873822975518
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5851364063969896
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5883458646616542
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5915492957746479
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5928705440900562
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5923149015932521
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5917602996254682
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5949485500467727
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5962616822429907
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5957049486461251
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5970149253731343
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5964585274930102
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5977653631284916
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.598139534883721
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.44    |
| Iteration     | 41       |
| MaximumReturn | -0.159   |
| MinimumReturn | -60      |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5105530023574829
Validation loss = 0.5183019638061523
Validation loss = 0.5146244168281555
Validation loss = 0.5116075873374939
Validation loss = 0.5113800168037415
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5106757283210754
Validation loss = 0.5067245960235596
Validation loss = 0.5033988356590271
Validation loss = 0.5130610466003418
Validation loss = 0.5075575113296509
Validation loss = 0.5109646320343018
Validation loss = 0.5151689648628235
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5113893747329712
Validation loss = 0.5108376741409302
Validation loss = 0.5093072652816772
Validation loss = 0.509624183177948
Validation loss = 0.5130696892738342
Validation loss = 0.5122120380401611
Validation loss = 0.5143043994903564
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5132061243057251
Validation loss = 0.5166735053062439
Validation loss = 0.521594762802124
Validation loss = 0.515065610408783
Validation loss = 0.5187819600105286
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5086445808410645
Validation loss = 0.5031371712684631
Validation loss = 0.5112670063972473
Validation loss = 0.5107178092002869
Validation loss = 0.5074697136878967
Validation loss = 0.5103748440742493
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5985130111524164
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5979572887650882
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5983302411873841
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6024096385542169
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6027777777777777
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6049953746530989
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6044362292051756
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.605724838411819
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6060885608856088
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6073732718894009
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6095764272559853
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6117755289788408
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6112132352941176
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.610651974288338
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6100917431192661
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6095325389550871
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6089743589743589
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6084172003659652
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6087751371115173
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6100456621004566
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6113138686131386
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6125797629899726
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6120218579234973
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6114649681528662
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.610909090909091
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -65.3    |
| Iteration     | 42       |
| MaximumReturn | -0.136   |
| MinimumReturn | -149     |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5109357237815857
Validation loss = 0.5119922161102295
Validation loss = 0.5185163021087646
Validation loss = 0.519403874874115
Validation loss = 0.5179730653762817
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5136685371398926
Validation loss = 0.5211073160171509
Validation loss = 0.5127989053726196
Validation loss = 0.5141355395317078
Validation loss = 0.5189985036849976
Validation loss = 0.5214518904685974
Validation loss = 0.5180899500846863
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.509307324886322
Validation loss = 0.5105577707290649
Validation loss = 0.5124136209487915
Validation loss = 0.5326438546180725
Validation loss = 0.5247825980186462
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5168383121490479
Validation loss = 0.5225858092308044
Validation loss = 0.516926646232605
Validation loss = 0.5213221311569214
Validation loss = 0.5180978775024414
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5134211182594299
Validation loss = 0.5138849020004272
Validation loss = 0.5072841048240662
Validation loss = 0.5156657099723816
Validation loss = 0.5164361000061035
Validation loss = 0.5129134654998779
Validation loss = 0.5182873606681824
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6130790190735694
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6152450090744102
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.614687216681777
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6168478260869565
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6199095022624435
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6238698010849909
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.6296296296296297
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6326714801444043
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6339044183949504
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6378378378378379
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6381638163816382
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6393884892086331
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6388140161725068
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6391382405745063
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6412556053811659
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6406810035842294
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6410026857654432
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6431127012522362
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.645218945487042
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6464285714285715
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6458519179304193
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6452762923351159
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 11
average number of affinization = 0.6544968833481746
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6548042704626335
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6551111111111111
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.62    |
| Iteration     | 43       |
| MaximumReturn | -0.0968  |
| MinimumReturn | -59.9    |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5135714411735535
Validation loss = 0.5170115828514099
Validation loss = 0.5136133432388306
Validation loss = 0.5154647827148438
Validation loss = 0.5189557075500488
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5187671780586243
Validation loss = 0.5144822001457214
Validation loss = 0.5174347758293152
Validation loss = 0.5207710862159729
Validation loss = 0.5164377093315125
Validation loss = 0.5216929912567139
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5131739377975464
Validation loss = 0.5133535861968994
Validation loss = 0.5121886134147644
Validation loss = 0.521632194519043
Validation loss = 0.5219137072563171
Validation loss = 0.5187768340110779
Validation loss = 0.5258640050888062
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5203235149383545
Validation loss = 0.5204388499259949
Validation loss = 0.5199384093284607
Validation loss = 0.5264006853103638
Validation loss = 0.5207459926605225
Validation loss = 0.5249332785606384
Validation loss = 0.5236433744430542
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5152677893638611
Validation loss = 0.5166423916816711
Validation loss = 0.5206727385520935
Validation loss = 0.5137555599212646
Validation loss = 0.5177117586135864
Validation loss = 0.5223138928413391
Validation loss = 0.5261762738227844
Validation loss = 0.5239816308021545
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6580817051509769
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6574977817213842
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6595744680851063
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6598759964570416
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6619469026548672
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6622458001768347
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6625441696113075
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.6681376875551632
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6701940035273368
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6696035242290749
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6725352112676056
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6745822339489885
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6775043936731108
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6777875329236172
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6771929824561403
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6783523225241017
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6786339754816112
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6780402449693789
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6809440559440559
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6812227074235808
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6823734729493892
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6843940714908456
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6881533101045296
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6910356832027851
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.691304347826087
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -42.5    |
| Iteration     | 44       |
| MaximumReturn | -0.157   |
| MinimumReturn | -111     |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5204544067382812
Validation loss = 0.5184102058410645
Validation loss = 0.5207670331001282
Validation loss = 0.524817168712616
Validation loss = 0.5243092775344849
Validation loss = 0.5261461138725281
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5207323431968689
Validation loss = 0.521221399307251
Validation loss = 0.5210729241371155
Validation loss = 0.5206562280654907
Validation loss = 0.5186704397201538
Validation loss = 0.5253435969352722
Validation loss = 0.5222073793411255
Validation loss = 0.5299435257911682
Validation loss = 0.5307872891426086
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.525180459022522
Validation loss = 0.5199211239814758
Validation loss = 0.519494891166687
Validation loss = 0.524816632270813
Validation loss = 0.5194109082221985
Validation loss = 0.5294415354728699
Validation loss = 0.5331051349639893
Validation loss = 0.5314469337463379
Validation loss = 0.5265800952911377
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5215045809745789
Validation loss = 0.5203248858451843
Validation loss = 0.5246725678443909
Validation loss = 0.524394690990448
Validation loss = 0.5319809913635254
Validation loss = 0.52854323387146
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.518944263458252
Validation loss = 0.5215583443641663
Validation loss = 0.5235098004341125
Validation loss = 0.5216377377510071
Validation loss = 0.5277411937713623
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6950477845351868
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6961805555555556
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6973113616652211
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7001733102253033
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7038961038961039
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7050173010380623
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7070008643042351
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7089810017271158
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7100949094046591
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7120689655172414
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 11
average number of affinization = 0.7209302325581395
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7246127366609294
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.7248495270851246
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.7302405498281787
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7296137339055794
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7324185248713551
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7343616109682948
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7337328767123288
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.739093242087254
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 11
average number of affinization = 0.7478632478632479
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.7480785653287788
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7474402730375427
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7510656436487638
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.7563884156729132
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.7565957446808511
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7       |
| Iteration     | 45       |
| MaximumReturn | -0.157   |
| MinimumReturn | -47.5    |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5203418731689453
Validation loss = 0.5249913334846497
Validation loss = 0.5239024758338928
Validation loss = 0.52370285987854
Validation loss = 0.5327016115188599
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5366767048835754
Validation loss = 0.5236600637435913
Validation loss = 0.5299654006958008
Validation loss = 0.5347698330879211
Validation loss = 0.5325031280517578
Validation loss = 0.5343488454818726
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5354478359222412
Validation loss = 0.528489887714386
Validation loss = 0.5283969044685364
Validation loss = 0.5271335244178772
Validation loss = 0.5337422490119934
Validation loss = 0.5319964289665222
Validation loss = 0.5326784253120422
Validation loss = 0.536982536315918
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5264917612075806
Validation loss = 0.5243932604789734
Validation loss = 0.5221412777900696
Validation loss = 0.5317568182945251
Validation loss = 0.5319761037826538
Validation loss = 0.5337278842926025
Validation loss = 0.5353305339813232
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5264214277267456
Validation loss = 0.5218178033828735
Validation loss = 0.5193981528282166
Validation loss = 0.5257603526115417
Validation loss = 0.5257708430290222
Validation loss = 0.5317497253417969
Validation loss = 0.5308975577354431
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7585034013605442
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7604078164825828
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.764855687606112
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7667514843087362
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.7728813559322034
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7747671464860287
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.7800338409475466
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 12
average number of affinization = 0.7895181741335587
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.7947635135135135
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.7991561181434599
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8001686340640809
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.802021903959562
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8013468013468014
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8023549201009251
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8058823529411765
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 9
average number of affinization = 0.8127623845507976
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8162751677852349
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.818105616093881
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8174204355108877
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 9
average number of affinization = 0.8242677824267782
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8277591973244147
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8295739348370927
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8330550918196995
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8373644703919934
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8408333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.8    |
| Iteration     | 46       |
| MaximumReturn | -0.228   |
| MinimumReturn | -83.6    |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5303279161453247
Validation loss = 0.5270183682441711
Validation loss = 0.523213267326355
Validation loss = 0.5237535238265991
Validation loss = 0.5259757041931152
Validation loss = 0.5318286418914795
Validation loss = 0.5314397811889648
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5261168479919434
Validation loss = 0.5292161703109741
Validation loss = 0.5291298031806946
Validation loss = 0.5288858413696289
Validation loss = 0.5334410667419434
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5317143201828003
Validation loss = 0.5272533893585205
Validation loss = 0.5335562825202942
Validation loss = 0.5326655507087708
Validation loss = 0.534138023853302
Validation loss = 0.5347975492477417
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5301673412322998
Validation loss = 0.5296724438667297
Validation loss = 0.5307918787002563
Validation loss = 0.536497950553894
Validation loss = 0.5348080396652222
Validation loss = 0.5365883111953735
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5263348817825317
Validation loss = 0.5207832455635071
Validation loss = 0.5217024683952332
Validation loss = 0.5331248044967651
Validation loss = 0.5265945792198181
Validation loss = 0.5286121964454651
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8451290591174022
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8477537437603994
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 10
average number of affinization = 0.8553615960099751
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8546511627906976
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8589211618257261
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8615257048092869
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8641259320629661
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 13
average number of affinization = 0.8741721854304636
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8775847808105872
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 10
average number of affinization = 0.8851239669421488
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8868703550784476
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8910891089108911
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8928276999175597
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 9
average number of affinization = 0.899505766062603
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.9037037037037037
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9046052631578947
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.903861955628595
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9064039408866995
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.9105824446267432
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9098360655737705
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9131859131859131
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9157119476268413
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9190515126737531
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9223856209150327
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9240816326530612
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -49.8    |
| Iteration     | 47       |
| MaximumReturn | -0.614   |
| MinimumReturn | -129     |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5249390602111816
Validation loss = 0.5246112942695618
Validation loss = 0.5221349000930786
Validation loss = 0.5252588987350464
Validation loss = 0.523337185382843
Validation loss = 0.5276656150817871
Validation loss = 0.5314309597015381
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5197429656982422
Validation loss = 0.5230520367622375
Validation loss = 0.5253145098686218
Validation loss = 0.5279861688613892
Validation loss = 0.5280049443244934
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5256780385971069
Validation loss = 0.5249599814414978
Validation loss = 0.5275720357894897
Validation loss = 0.5270131826400757
Validation loss = 0.5299059152603149
Validation loss = 0.5364501476287842
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5283671617507935
Validation loss = 0.5280174612998962
Validation loss = 0.525769054889679
Validation loss = 0.526971697807312
Validation loss = 0.5281841158866882
Validation loss = 0.5287203788757324
Validation loss = 0.5304561853408813
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5238901972770691
Validation loss = 0.5214621424674988
Validation loss = 0.5244759321212769
Validation loss = 0.5203077793121338
Validation loss = 0.5313863754272461
Validation loss = 0.525119423866272
Validation loss = 0.5259487628936768
Validation loss = 0.5334590673446655
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.9298531810766721
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9315403422982885
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 14
average number of affinization = 0.9421824104234527
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.9471114727420668
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9504065040650407
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9496344435418359
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.950487012987013
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9521492295214923
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9513776337115073
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9538461538461539
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 12
average number of affinization = 0.9627831715210357
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9636216653193209
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9636510500807755
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9661016949152542
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9685483870967742
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 11
average number of affinization = 0.9766317485898469
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 14
average number of affinization = 0.9871175523349437
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9871279163314561
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9895498392282959
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9919678714859438
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9943820224719101
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9975942261427426
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.0072115384615385
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0088070456365092
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.012
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -40.3    |
| Iteration     | 48       |
| MaximumReturn | -0.642   |
| MinimumReturn | -146     |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5285773873329163
Validation loss = 0.5262768268585205
Validation loss = 0.5287186503410339
Validation loss = 0.5297358632087708
Validation loss = 0.5283042192459106
Validation loss = 0.5372893214225769
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5302150845527649
Validation loss = 0.5252626538276672
Validation loss = 0.5275449156761169
Validation loss = 0.5313588380813599
Validation loss = 0.5317477583885193
Validation loss = 0.5302861928939819
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5319370031356812
Validation loss = 0.5289890766143799
Validation loss = 0.5305072665214539
Validation loss = 0.5342087745666504
Validation loss = 0.5345126390457153
Validation loss = 0.535843551158905
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5240549445152283
Validation loss = 0.5277196168899536
Validation loss = 0.5319674611091614
Validation loss = 0.5299448370933533
Validation loss = 0.5344305634498596
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5273370146751404
Validation loss = 0.5251260995864868
Validation loss = 0.532471239566803
Validation loss = 0.5299424529075623
Validation loss = 0.5284813046455383
Validation loss = 0.5364308953285217
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0135891286970424
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.0199680511182108
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.0247406225059856
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.029505582137161
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0326693227091635
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.035828025477707
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0381861575178997
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0405405405405406
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.04527402700556
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0492063492063493
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.0539254559873117
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.057052297939778
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0601741884402216
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0601265822784811
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0640316205533598
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0639810426540284
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.069455406471981
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0733438485804416
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0724980299448386
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0763779527559054
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.081038552321007
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0849056603773586
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.0903377847604085
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.0949764521193093
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0980392156862746
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -68.1    |
| Iteration     | 49       |
| MaximumReturn | -0.675   |
| MinimumReturn | -156     |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5229983329772949
Validation loss = 0.5264900326728821
Validation loss = 0.5276499390602112
Validation loss = 0.5298975706100464
Validation loss = 0.5255776643753052
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5241569876670837
Validation loss = 0.5207993388175964
Validation loss = 0.5378829836845398
Validation loss = 0.5282446146011353
Validation loss = 0.529212474822998
Validation loss = 0.5272022485733032
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5291791558265686
Validation loss = 0.5282585620880127
Validation loss = 0.5239540934562683
Validation loss = 0.5301606059074402
Validation loss = 0.5296441316604614
Validation loss = 0.5304824113845825
Validation loss = 0.5258961915969849
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5234103798866272
Validation loss = 0.5253280401229858
Validation loss = 0.5323659181594849
Validation loss = 0.53077632188797
Validation loss = 0.5278796553611755
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5227413773536682
Validation loss = 0.5274056196212769
Validation loss = 0.5251530408859253
Validation loss = 0.5265232920646667
Validation loss = 0.5284347534179688
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1010971786833856
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.105716523101018
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.107981220657277
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1102423768569194
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.11953125
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.1233411397345823
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.1294851794071763
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1309431021044427
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1339563862928348
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.1416342412451361
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1423017107309488
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.1484071484071483
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 17
average number of affinization = 1.1607142857142858
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.160589604344453
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1635658914728682
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1665375677769172
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 15
average number of affinization = 1.1772445820433437
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1778808971384378
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.1846986089644513
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.1907335907335908
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.191358024691358
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.1958365458750964
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.2049306625577811
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.2063125481139338
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.2130769230769232
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.6    |
| Iteration     | 50       |
| MaximumReturn | -0.72    |
| MinimumReturn | -74.1    |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5280443429946899
Validation loss = 0.5259571075439453
Validation loss = 0.5273897647857666
Validation loss = 0.5277677774429321
Validation loss = 0.5288230180740356
Validation loss = 0.5371562838554382
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5242252945899963
Validation loss = 0.5238597393035889
Validation loss = 0.532169759273529
Validation loss = 0.5249444246292114
Validation loss = 0.5282809734344482
Validation loss = 0.5292664170265198
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5312660932540894
Validation loss = 0.532803475856781
Validation loss = 0.5292659401893616
Validation loss = 0.5314827561378479
Validation loss = 0.5309247970581055
Validation loss = 0.5376359224319458
Validation loss = 0.5400261282920837
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.529342532157898
Validation loss = 0.5317777991294861
Validation loss = 0.5285815000534058
Validation loss = 0.5287559628486633
Validation loss = 0.5309194922447205
Validation loss = 0.5303494334220886
Validation loss = 0.5376286506652832
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5258526802062988
Validation loss = 0.5244830250740051
Validation loss = 0.5264002084732056
Validation loss = 0.5283458828926086
Validation loss = 0.5277650356292725
Validation loss = 0.5307497382164001
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.2152190622598003
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.2150537634408602
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.2164236377590176
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.2177914110429449
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.2222222222222223
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.2289433384379786
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.2302983932670237
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.2362385321100917
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.2406417112299466
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.2427480916030533
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.2463768115942029
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.2484756097560976
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.2536176694592536
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.2587519025875191
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.2608365019011407
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.2606382978723405
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.2619589977220957
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.2663125948406677
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.2653525398028809
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.2704545454545455
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.2763058289174867
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.2806354009077157
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.2842025699168556
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.2870090634441087
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.2890566037735849
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.03    |
| Iteration     | 51       |
| MaximumReturn | -0.396   |
| MinimumReturn | -5.99    |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5252577066421509
Validation loss = 0.533763587474823
Validation loss = 0.5302063226699829
Validation loss = 0.5338647365570068
Validation loss = 0.5329709053039551
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.533366858959198
Validation loss = 0.5315924286842346
Validation loss = 0.5336526036262512
Validation loss = 0.5357539057731628
Validation loss = 0.5342460870742798
Validation loss = 0.5364811420440674
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5339087843894958
Validation loss = 0.5378907322883606
Validation loss = 0.5358076691627502
Validation loss = 0.5413259863853455
Validation loss = 0.5422925353050232
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5333318114280701
Validation loss = 0.5295360088348389
Validation loss = 0.5348364114761353
Validation loss = 0.5374745726585388
Validation loss = 0.5376761555671692
Validation loss = 0.5378485321998596
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5325409173965454
Validation loss = 0.5326303839683533
Validation loss = 0.5298869609832764
Validation loss = 0.5329800844192505
Validation loss = 0.5418214201927185
Validation loss = 0.539861261844635
Validation loss = 0.5359259247779846
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.294871794871795
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.2961567445365485
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.2974397590361446
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.3017306245297215
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.3082706766917294
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.3110443275732533
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.3190690690690692
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.3248312078019504
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.3275862068965518
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3288389513108614
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.3285928143712575
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.3275991024682123
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.3340807174887892
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3353248693054518
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.337313432835821
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3385533184190903
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.3375558867362145
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.3417721518987342
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.3415178571428572
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.3449814126394053
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3462109955423478
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.3504083147735708
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.3501483679525224
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.3498888065233505
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3511111111111112
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.12    |
| Iteration     | 52       |
| MaximumReturn | -0.22    |
| MinimumReturn | -13.3    |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5323045253753662
Validation loss = 0.5305498242378235
Validation loss = 0.5361206531524658
Validation loss = 0.5335503220558167
Validation loss = 0.5352305769920349
Validation loss = 0.5366976857185364
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.536197304725647
Validation loss = 0.5336636304855347
Validation loss = 0.5327355861663818
Validation loss = 0.5340977311134338
Validation loss = 0.5342221260070801
Validation loss = 0.5343528985977173
Validation loss = 0.5370907187461853
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5399976372718811
Validation loss = 0.5344981551170349
Validation loss = 0.5457322597503662
Validation loss = 0.5368715524673462
Validation loss = 0.5404483079910278
Validation loss = 0.5452319383621216
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5384124517440796
Validation loss = 0.5344739556312561
Validation loss = 0.5384292006492615
Validation loss = 0.5344803929328918
Validation loss = 0.5388681888580322
Validation loss = 0.5409765243530273
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.533139705657959
Validation loss = 0.5417838096618652
Validation loss = 0.537526547908783
Validation loss = 0.5379513502120972
Validation loss = 0.5337997674942017
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.3560325684678016
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.3616863905325445
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.3680709534368072
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.3751846381093058
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.3756457564575646
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.376843657817109
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.3795136330140014
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.385861561119293
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.390728476821192
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.3955882352941176
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3967670830271859
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.3986784140969164
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.4013206162876009
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.4076246334310851
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.4102564102564104
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.411420204978038
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.4184345281638624
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.4261695906432748
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.433162892622352
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.4386861313868613
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.4398249452954048
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.4402332361516035
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.4450109249817917
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 14
average number of affinization = 1.4541484716157205
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.456
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19      |
| Iteration     | 53       |
| MaximumReturn | -0.12    |
| MinimumReturn | -71.3    |
| TotalSamples  | 91630    |
----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5348461866378784
Validation loss = 0.5327718257904053
Validation loss = 0.5370924472808838
Validation loss = 0.5351260900497437
Validation loss = 0.5357627868652344
Validation loss = 0.538938581943512
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5396613478660583
Validation loss = 0.528104841709137
Validation loss = 0.5374361872673035
Validation loss = 0.5406749248504639
Validation loss = 0.5391884446144104
Validation loss = 0.5398628115653992
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5361946225166321
Validation loss = 0.5367565155029297
Validation loss = 0.5428333282470703
Validation loss = 0.539331316947937
Validation loss = 0.5406152606010437
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5413484573364258
Validation loss = 0.5362393856048584
Validation loss = 0.5386080145835876
Validation loss = 0.538716733455658
Validation loss = 0.540716826915741
Validation loss = 0.5425164699554443
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5356433987617493
Validation loss = 0.5362982153892517
Validation loss = 0.5392636656761169
Validation loss = 0.5366542935371399
Validation loss = 0.5373142957687378
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.4607558139534884
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.4625998547567176
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.4644412191582004
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.4677302393038434
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.4666666666666666
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.4699493120926865
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.4739507959479017
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.4757772957339117
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.4761560693641618
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.4779783393501804
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.4841269841269842
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.487382840663302
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.4863112391930835
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.4852411807055435
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.4870503597122302
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.4895758447160317
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.492816091954023
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.4938980617372577
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.494261119081779
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.496057347670251
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.497134670487106
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.4974946313528992
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.5017869907076484
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.5007142857142857
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -76.7    |
| Iteration     | 54       |
| MaximumReturn | -0.389   |
| MinimumReturn | -163     |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5360317230224609
Validation loss = 0.5342320203781128
Validation loss = 0.5386683940887451
Validation loss = 0.5414135456085205
Validation loss = 0.5423067808151245
Validation loss = 0.5408057570457458
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5368656516075134
Validation loss = 0.534821629524231
Validation loss = 0.5373314619064331
Validation loss = 0.5375341176986694
Validation loss = 0.5384136438369751
Validation loss = 0.5394042134284973
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5370421409606934
Validation loss = 0.5395085215568542
Validation loss = 0.5422195196151733
Validation loss = 0.5419298410415649
Validation loss = 0.5412486791610718
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5376267433166504
Validation loss = 0.541239321231842
Validation loss = 0.5404232144355774
Validation loss = 0.5411058068275452
Validation loss = 0.5408757925033569
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5361363291740417
Validation loss = 0.5394890904426575
Validation loss = 0.5330885648727417
Validation loss = 0.5351098775863647
Validation loss = 0.5337514281272888
Validation loss = 0.5361490845680237
Validation loss = 0.5419016480445862
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5032119914346895
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.505706134094151
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.5096222380612971
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5121082621082622
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.5124555160142348
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.5120910384068278
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5145700071073205
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5170454545454546
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.518097941802697
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.5212765957446808
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.524450744153083
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.5318696883852692
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5343241330502477
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.533946251768034
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.5342756183745583
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.5388418079096045
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.5398729710656316
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.540197461212976
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.5454545454545454
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.5471830985915493
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.553131597466573
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5555555555555556
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.5586788475052706
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.5575842696629214
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.5564912280701755
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -39.4    |
| Iteration     | 55       |
| MaximumReturn | -0.192   |
| MinimumReturn | -129     |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5337872505187988
Validation loss = 0.5315571427345276
Validation loss = 0.5412493944168091
Validation loss = 0.534504771232605
Validation loss = 0.5383860468864441
Validation loss = 0.5409668684005737
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5336173176765442
Validation loss = 0.5321394205093384
Validation loss = 0.5384312272071838
Validation loss = 0.5371389985084534
Validation loss = 0.5454699993133545
Validation loss = 0.5312257409095764
Validation loss = 0.5397241711616516
Validation loss = 0.5389803647994995
Validation loss = 0.5392171144485474
Validation loss = 0.5383209586143494
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5356343984603882
Validation loss = 0.5374495387077332
Validation loss = 0.5390207171440125
Validation loss = 0.5407689213752747
Validation loss = 0.540152907371521
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5379939675331116
Validation loss = 0.5382702946662903
Validation loss = 0.5393368005752563
Validation loss = 0.5402214527130127
Validation loss = 0.5370759963989258
Validation loss = 0.5411406755447388
Validation loss = 0.5411933064460754
Validation loss = 0.5475362539291382
Validation loss = 0.5445687770843506
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5408750176429749
Validation loss = 0.5361976623535156
Validation loss = 0.5378583073616028
Validation loss = 0.5401256084442139
Validation loss = 0.5381057262420654
Validation loss = 0.5412295460700989
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.5610098176718092
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.5620182200420463
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.5658263305322129
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.5675297410776767
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.572027972027972
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.5751222921034242
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.5782122905027933
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5806001395673412
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.5857740585774058
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.5902439024390245
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.5961002785515321
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.5956854558107167
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5980528511821974
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.605281445448228
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.6076388888888888
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.6120749479528105
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.617891816920943
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.6195426195426195
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.621191135734072
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.6235294117647059
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.6251728907330567
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.627505183137526
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.6312154696132597
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.6342305037957212
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.6379310344827587
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.32    |
| Iteration     | 56       |
| MaximumReturn | -0.56    |
| MinimumReturn | -58.7    |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5345678925514221
Validation loss = 0.541978120803833
Validation loss = 0.5406860709190369
Validation loss = 0.5430386066436768
Validation loss = 0.5421795845031738
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5489535331726074
Validation loss = 0.5360986590385437
Validation loss = 0.5440134406089783
Validation loss = 0.5414060950279236
Validation loss = 0.5415573716163635
Validation loss = 0.5437719821929932
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.541678249835968
Validation loss = 0.5386250019073486
Validation loss = 0.5401901602745056
Validation loss = 0.5366042256355286
Validation loss = 0.5417460203170776
Validation loss = 0.539929986000061
Validation loss = 0.5431256890296936
Validation loss = 0.5455338954925537
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5481662750244141
Validation loss = 0.5437998175621033
Validation loss = 0.5435351729393005
Validation loss = 0.5450279712677002
Validation loss = 0.5465192794799805
Validation loss = 0.548072874546051
Validation loss = 0.5480528473854065
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5403016209602356
Validation loss = 0.5390533804893494
Validation loss = 0.5363757014274597
Validation loss = 0.5415074229240417
Validation loss = 0.5407879948616028
Validation loss = 0.5407876968383789
Validation loss = 0.5463941693305969
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.6395589248793936
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.643939393939394
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.6490020646937371
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.6506189821182944
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.6508591065292095
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.6517857142857142
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.6527110501029512
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.6570644718792866
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.660726525017135
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.6609589041095891
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.6673511293634498
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.6675786593707251
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.669172932330827
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.6721311475409837
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.6791808873720135
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.6855388813096863
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.687116564417178
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.6900544959128065
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.6895847515316542
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.692517006802721
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.698844323589395
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.6997282608695652
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.701968771215207
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.7042062415196744
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.7030508474576271
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.53    |
| Iteration     | 57       |
| MaximumReturn | -0.217   |
| MinimumReturn | -92.3    |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5414850115776062
Validation loss = 0.5332968235015869
Validation loss = 0.5413877367973328
Validation loss = 0.5423603057861328
Validation loss = 0.5396121144294739
Validation loss = 0.5397312045097351
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.543603777885437
Validation loss = 0.5395409464836121
Validation loss = 0.5463408827781677
Validation loss = 0.5443946719169617
Validation loss = 0.542896568775177
Validation loss = 0.5444101691246033
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.543425977230072
Validation loss = 0.5423120260238647
Validation loss = 0.5463660359382629
Validation loss = 0.5428268313407898
Validation loss = 0.5491254925727844
Validation loss = 0.5459157824516296
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5481842756271362
Validation loss = 0.5469574928283691
Validation loss = 0.54544597864151
Validation loss = 0.5515609979629517
Validation loss = 0.5559549927711487
Validation loss = 0.5524071455001831
Validation loss = 0.5499307513237
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5434655547142029
Validation loss = 0.5455403327941895
Validation loss = 0.5403804183006287
Validation loss = 0.5412803888320923
Validation loss = 0.5466721057891846
Validation loss = 0.5479749441146851
Validation loss = 0.5478826761245728
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.703929539295393
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.7054840893703453
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.705683355886333
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.7065584854631508
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.7074324324324324
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.7076299797434167
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.7125506072874495
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.7140930546190154
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.7176549865229112
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.7171717171717171
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.7200538358008075
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.722259583053127
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.7251344086021505
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.7259905977165884
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.7308724832214766
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.7303822937625755
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.7352546916890081
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.740120562625586
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.7436412315930387
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.7451505016722408
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.7466577540106951
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.7508350033400133
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.7510013351134845
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.752501667778519
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.7546666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -37.4    |
| Iteration     | 58       |
| MaximumReturn | -0.223   |
| MinimumReturn | -96.1    |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5480393767356873
Validation loss = 0.543199360370636
Validation loss = 0.5396127104759216
Validation loss = 0.5410820841789246
Validation loss = 0.5429962277412415
Validation loss = 0.5476887226104736
Validation loss = 0.5457952618598938
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5426244139671326
Validation loss = 0.542777955532074
Validation loss = 0.5440235137939453
Validation loss = 0.5435965061187744
Validation loss = 0.5408990979194641
Validation loss = 0.5481277704238892
Validation loss = 0.5444103479385376
Validation loss = 0.5463929176330566
Validation loss = 0.548530101776123
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5457689166069031
Validation loss = 0.5411324501037598
Validation loss = 0.5413689613342285
Validation loss = 0.5464637875556946
Validation loss = 0.5475099682807922
Validation loss = 0.551120936870575
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5512633919715881
Validation loss = 0.5481365323066711
Validation loss = 0.5565146803855896
Validation loss = 0.5494149923324585
Validation loss = 0.5502580404281616
Validation loss = 0.5612805485725403
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5383579134941101
Validation loss = 0.5459973812103271
Validation loss = 0.5416473150253296
Validation loss = 0.5436356067657471
Validation loss = 0.5463691353797913
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.7568287808127914
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.75965379494008
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.7644710578842315
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.766622340425532
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.7654485049833888
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.7669322709163346
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.7697412076974122
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.7745358090185677
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.7753479125248508
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.7768211920529802
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.7782925215089345
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.7824074074074074
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.7851949768671513
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.7879788639365919
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.7907590759075909
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.79155672823219
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.7943309162821357
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.7944664031620554
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.7978933508887427
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.8019736842105263
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.8040762656147271
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.8061760840998686
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.8128693368351938
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.8156167979002624
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.819672131147541
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -27.5    |
| Iteration     | 59       |
| MaximumReturn | -0.168   |
| MinimumReturn | -117     |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5420551896095276
Validation loss = 0.543102502822876
Validation loss = 0.5415058135986328
Validation loss = 0.5481881499290466
Validation loss = 0.5473814606666565
Validation loss = 0.5433807969093323
Validation loss = 0.5487082004547119
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5481356978416443
Validation loss = 0.5419589281082153
Validation loss = 0.5444323420524597
Validation loss = 0.5479674339294434
Validation loss = 0.5469158887863159
Validation loss = 0.548912525177002
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5451287627220154
Validation loss = 0.5407516956329346
Validation loss = 0.5443475842475891
Validation loss = 0.5507732629776001
Validation loss = 0.5486950874328613
Validation loss = 0.5495482087135315
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5498833656311035
Validation loss = 0.5514971017837524
Validation loss = 0.5488746762275696
Validation loss = 0.5510886907577515
Validation loss = 0.5510866641998291
Validation loss = 0.5512123703956604
Validation loss = 0.548819899559021
Validation loss = 0.555090069770813
Validation loss = 0.5566271543502808
Validation loss = 0.562898576259613
Validation loss = 0.5546497106552124
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5419402718544006
Validation loss = 0.5428439974784851
Validation loss = 0.5465354323387146
Validation loss = 0.542008638381958
Validation loss = 0.5430915355682373
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.8230668414154654
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.8264571054354943
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.8278795811518325
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.827992151733159
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.8326797385620914
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.8347485303723057
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.835509138381201
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.8382257012393999
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.8402868318122556
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.8423452768729642
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.8463541666666667
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.8464541314248537
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.8459037711313393
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.8460038986354776
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.8454545454545455
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.8462037637897468
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.8488975356679638
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.850939727802981
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.8536269430051813
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.8563106796116504
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.8622250970245795
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.8668390433096316
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.8714470284237725
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.8741123305358296
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.876774193548387
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -30.9    |
| Iteration     | 60       |
| MaximumReturn | -0.0761  |
| MinimumReturn | -110     |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.542619526386261
Validation loss = 0.5462047457695007
Validation loss = 0.5450504422187805
Validation loss = 0.5452737808227539
Validation loss = 0.5460259318351746
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5505773425102234
Validation loss = 0.5469415187835693
Validation loss = 0.5487916469573975
Validation loss = 0.5473448634147644
Validation loss = 0.5551117062568665
Validation loss = 0.5525673031806946
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5474562644958496
Validation loss = 0.5405911207199097
Validation loss = 0.5486140251159668
Validation loss = 0.5478976368904114
Validation loss = 0.5517465472221375
Validation loss = 0.5501253604888916
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5502657890319824
Validation loss = 0.549749493598938
Validation loss = 0.5545473098754883
Validation loss = 0.5550041794776917
Validation loss = 0.5502879023551941
Validation loss = 0.5551083087921143
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5493782758712769
Validation loss = 0.5396775603294373
Validation loss = 0.546151876449585
Validation loss = 0.546608030796051
Validation loss = 0.5489051938056946
Validation loss = 0.5464270114898682
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.8768536428110896
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.8827319587628866
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.8860270444301352
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.886100386100386
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.8919614147909969
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.8939588688946016
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.897238278741169
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.8998716302952503
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.905708787684413
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.9083333333333334
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.9147982062780269
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.9180537772087067
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.9219449776071658
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 15
average number of affinization = 1.930306905370844
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.9329073482428114
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.9361430395913155
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.9387364390555202
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.9438775510204083
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.9509241555130656
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.951592356687898
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.9535327816677275
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.9586513994910941
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.9612205975842338
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.9644218551461246
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 16
average number of affinization = 1.9733333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -24.8    |
| Iteration     | 61       |
| MaximumReturn | -0.228   |
| MinimumReturn | -104     |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5445569753646851
Validation loss = 0.5429879426956177
Validation loss = 0.5445137619972229
Validation loss = 0.5481741428375244
Validation loss = 0.5481871962547302
Validation loss = 0.5489223003387451
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5468143820762634
Validation loss = 0.5507000684738159
Validation loss = 0.5458359122276306
Validation loss = 0.5451016426086426
Validation loss = 0.5482811331748962
Validation loss = 0.5457946062088013
Validation loss = 0.550532341003418
Validation loss = 0.5545127391815186
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5452402830123901
Validation loss = 0.5498446226119995
Validation loss = 0.5490049123764038
Validation loss = 0.5444889664649963
Validation loss = 0.548518180847168
Validation loss = 0.5553470253944397
Validation loss = 0.5531488656997681
Validation loss = 0.5532524585723877
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.554945707321167
Validation loss = 0.5497559905052185
Validation loss = 0.5496987104415894
Validation loss = 0.5512303709983826
Validation loss = 0.5523492097854614
Validation loss = 0.5519992113113403
Validation loss = 0.5565683841705322
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5445948243141174
Validation loss = 0.5462153553962708
Validation loss = 0.5473169088363647
Validation loss = 0.5476189255714417
Validation loss = 0.548351526260376
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.975253807106599
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.9778059606848446
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.9816223067173637
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.986700443318556
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.9886075949367088
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.9879822896900696
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.988621997471555
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.9905243209096652
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.9968434343434343
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.9993690851735015
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.0031525851197984
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.003780718336484
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.006926952141058
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.0088105726872247
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.010062893081761
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.011313639220616
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.01570351758794
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 2.01443816698054
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.0163111668757843
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.0163009404388714
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.0181704260651627
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.0194113963681906
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.0200250312891113
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.020012507817386
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.020625
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -92.2    |
| Iteration     | 62       |
| MaximumReturn | -0.214   |
| MinimumReturn | -159     |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5524943470954895
Validation loss = 0.5522427558898926
Validation loss = 0.5473293662071228
Validation loss = 0.545448899269104
Validation loss = 0.5524705648422241
Validation loss = 0.5495818853378296
Validation loss = 0.550387442111969
Validation loss = 0.549976646900177
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5515879392623901
Validation loss = 0.5490597486495972
Validation loss = 0.5504025220870972
Validation loss = 0.5496773719787598
Validation loss = 0.548616886138916
Validation loss = 0.5535256862640381
Validation loss = 0.5519777536392212
Validation loss = 0.5563396215438843
Validation loss = 0.5568017959594727
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5509823560714722
Validation loss = 0.5534368753433228
Validation loss = 0.5516408085823059
Validation loss = 0.5550732612609863
Validation loss = 0.552013099193573
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.550997257232666
Validation loss = 0.5544536113739014
Validation loss = 0.5549939274787903
Validation loss = 0.5521162748336792
Validation loss = 0.5561250448226929
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5461729764938354
Validation loss = 0.5467055439949036
Validation loss = 0.5495652556419373
Validation loss = 0.5428321957588196
Validation loss = 0.5536717176437378
Validation loss = 0.5520637035369873
Validation loss = 0.5464913249015808
Validation loss = 0.5567442178726196
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.0212367270455966
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.023096129837703
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.0262008733624453
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.0355361596009973
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.0367601246105917
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.0386052303860525
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.0385812072184195
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.038557213930348
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.042883778744562
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.045341614906832
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.047796399751707
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.050248138957816
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.0520768753874767
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.0557620817843865
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.058204334365325
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.061262376237624
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 2.059987631416203
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.060568603213844
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.0654725138974674
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.0691358024691358
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.070326958667489
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.0721331689272504
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.0745532963647566
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.0794334975369457
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.083076923076923
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -49.6    |
| Iteration     | 63       |
| MaximumReturn | -0.293   |
| MinimumReturn | -119     |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5488045811653137
Validation loss = 0.5534651875495911
Validation loss = 0.5473266243934631
Validation loss = 0.5466226935386658
Validation loss = 0.5494479537010193
Validation loss = 0.5520607829093933
Validation loss = 0.5515801310539246
Validation loss = 0.5521603226661682
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5483067035675049
Validation loss = 0.5480265617370605
Validation loss = 0.5516142845153809
Validation loss = 0.5512003898620605
Validation loss = 0.5513756275177002
Validation loss = 0.5533452033996582
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5441668629646301
Validation loss = 0.5466949939727783
Validation loss = 0.546942949295044
Validation loss = 0.5483114719390869
Validation loss = 0.5504326820373535
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5523538589477539
Validation loss = 0.5493238568305969
Validation loss = 0.5527090430259705
Validation loss = 0.5522028803825378
Validation loss = 0.551381528377533
Validation loss = 0.5558228492736816
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5515970587730408
Validation loss = 0.5491334795951843
Validation loss = 0.5479021668434143
Validation loss = 0.5484012365341187
Validation loss = 0.5432782173156738
Validation loss = 0.5473704934120178
Validation loss = 0.5530562996864319
Validation loss = 0.5524666905403137
Validation loss = 0.5485029816627502
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.0867158671586714
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.087277197295636
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.0896805896805897
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 2.089011663597299
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.092638036809816
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.0944206008583692
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 2.093137254901961
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.096142069810165
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.099143206854345
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.0996941896024466
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.1026894865525674
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.1038485033598047
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.1086691086691087
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.11104331909701
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.1128048780487805
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.113954905545399
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.115712545676005
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.118685331710286
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.1234793187347933
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.123404255319149
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.12636695018226
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.129933211900425
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.1304611650485437
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 2.1297756215888417
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 2.1284848484848484
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -52.5    |
| Iteration     | 64       |
| MaximumReturn | -0.498   |
| MinimumReturn | -167     |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.547617495059967
Validation loss = 0.5457039475440979
Validation loss = 0.5498645305633545
Validation loss = 0.5475239753723145
Validation loss = 0.5488486289978027
Validation loss = 0.5500191450119019
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5505049824714661
Validation loss = 0.5501267313957214
Validation loss = 0.5495370030403137
Validation loss = 0.5508086681365967
Validation loss = 0.5511339902877808
Validation loss = 0.5510548949241638
Validation loss = 0.551653265953064
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5529617667198181
Validation loss = 0.5450039505958557
Validation loss = 0.5474604368209839
Validation loss = 0.5513002872467041
Validation loss = 0.5470045208930969
Validation loss = 0.5480441451072693
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5557277798652649
Validation loss = 0.5518255233764648
Validation loss = 0.557172954082489
Validation loss = 0.5497254133224487
Validation loss = 0.5532636642456055
Validation loss = 0.5539748668670654
Validation loss = 0.5557431578636169
Validation loss = 0.5531624555587769
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5466152429580688
Validation loss = 0.549672544002533
Validation loss = 0.5484875440597534
Validation loss = 0.5472719073295593
Validation loss = 0.5437957644462585
Validation loss = 0.5505546927452087
Validation loss = 0.5518820881843567
Validation loss = 0.5558786392211914
Validation loss = 0.5500995516777039
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.1368867353119323
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.141041162227603
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.147005444646098
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.1469165659008462
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.148640483383686
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.151570048309179
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.1575135787567894
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.158624849215923
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.165159734779988
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.1674698795180722
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.176399759181216
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.1793020457280385
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.1809981960312688
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.180889423076923
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.180780780780781
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.1830732292917165
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.1835632873425315
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.184652278177458
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.1875374475733973
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.1880239520958082
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.1956912028725313
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.202751196172249
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.203227734608488
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.208482676224612
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.2155223880597017
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -34.4    |
| Iteration     | 65       |
| MaximumReturn | -0.289   |
| MinimumReturn | -96.2    |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5461082458496094
Validation loss = 0.5442706942558289
Validation loss = 0.5461329221725464
Validation loss = 0.5459128022193909
Validation loss = 0.5480673313140869
Validation loss = 0.5468946099281311
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5463258028030396
Validation loss = 0.551630973815918
Validation loss = 0.5481136441230774
Validation loss = 0.5485247373580933
Validation loss = 0.5524225234985352
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5497636795043945
Validation loss = 0.5430591106414795
Validation loss = 0.5422868132591248
Validation loss = 0.5458170175552368
Validation loss = 0.5474177002906799
Validation loss = 0.5526065826416016
Validation loss = 0.5524761080741882
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5487285852432251
Validation loss = 0.5532819628715515
Validation loss = 0.5477467775344849
Validation loss = 0.5510327219963074
Validation loss = 0.5511403679847717
Validation loss = 0.5563015937805176
Validation loss = 0.5548397898674011
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5479596853256226
Validation loss = 0.545861005783081
Validation loss = 0.545123279094696
Validation loss = 0.5470268130302429
Validation loss = 0.548740029335022
Validation loss = 0.5497987866401672
Validation loss = 0.5555986762046814
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.217780429594272
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.221824686940966
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.2252681764004767
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.2310899344848125
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.2375
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.2391433670434266
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.243162901307967
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 18
average number of affinization = 2.2525252525252526
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.255938242280285
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.2617210682492583
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.2663107947805456
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.2744516893894486
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.2760663507109005
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.2818235642391946
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.283431952662722
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.2885866351271438
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.2937352245862885
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.297105729474306
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.3034238488783942
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.3073746312684364
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.3095518867924527
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.311726576311137
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.3150765606595995
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.3178340200117717
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.3223529411764705
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -40.7    |
| Iteration     | 66       |
| MaximumReturn | -0.384   |
| MinimumReturn | -88.9    |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5481800436973572
Validation loss = 0.5485553741455078
Validation loss = 0.5497452616691589
Validation loss = 0.549996018409729
Validation loss = 0.5485008358955383
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5488081574440002
Validation loss = 0.5424489378929138
Validation loss = 0.5510180592536926
Validation loss = 0.5468252897262573
Validation loss = 0.5494019985198975
Validation loss = 0.5494718551635742
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5494361519813538
Validation loss = 0.5491457581520081
Validation loss = 0.5465481877326965
Validation loss = 0.551505982875824
Validation loss = 0.5487282872200012
Validation loss = 0.5528585314750671
Validation loss = 0.5530996918678284
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5479375720024109
Validation loss = 0.5519968271255493
Validation loss = 0.5496240854263306
Validation loss = 0.5511912107467651
Validation loss = 0.5582267642021179
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5492066144943237
Validation loss = 0.5561586022377014
Validation loss = 0.5546171069145203
Validation loss = 0.5474169850349426
Validation loss = 0.5540407299995422
Validation loss = 0.5503193736076355
Validation loss = 0.5527083277702332
Validation loss = 0.552064836025238
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.326278659611993
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.329024676850764
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.3364650616559013
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.3386150234741785
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.3390029325513195
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.3417350527549825
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.3438781487990625
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.3495316159250588
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.3563487419543594
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.3567251461988303
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.35885447106955
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.3633177570093458
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.366024518388792
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.371061843640607
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.3720116618075804
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.3776223776223775
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.380314502038439
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.3876600698486614
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.390343222803956
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.3924418604651163
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.3974433468913423
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.400116144018583
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.4062681369704007
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.410092807424594
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.409855072463768
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.5    |
| Iteration     | 67       |
| MaximumReturn | -0.306   |
| MinimumReturn | -80      |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.546180009841919
Validation loss = 0.5466507077217102
Validation loss = 0.543581485748291
Validation loss = 0.5476781129837036
Validation loss = 0.5477206707000732
Validation loss = 0.5555232763290405
Validation loss = 0.5487337708473206
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5527288317680359
Validation loss = 0.5467281341552734
Validation loss = 0.5471869111061096
Validation loss = 0.5480113625526428
Validation loss = 0.5485019087791443
Validation loss = 0.5483555793762207
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5542519092559814
Validation loss = 0.5502707958221436
Validation loss = 0.5515241622924805
Validation loss = 0.5506302714347839
Validation loss = 0.5509102940559387
Validation loss = 0.5529674887657166
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5535565614700317
Validation loss = 0.5501342415809631
Validation loss = 0.5504549741744995
Validation loss = 0.5499354004859924
Validation loss = 0.5521442890167236
Validation loss = 0.5521253347396851
Validation loss = 0.5585078001022339
Validation loss = 0.557427167892456
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5464054346084595
Validation loss = 0.5493189096450806
Validation loss = 0.5517499446868896
Validation loss = 0.5532656908035278
Validation loss = 0.5540413856506348
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.4119351100811124
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.4140127388535033
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.4207175925925926
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.4256795835743206
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.4265895953757224
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.4280762564991334
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.4324480369515014
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.433929601846509
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.434256055363322
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.4340057636887606
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.4372119815668203
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.4386873920552676
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.4390103567318757
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.4445083381253596
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.4459770114942527
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.4462952326249283
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.447187141216992
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.4497991967871484
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.4518348623853212
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.456160458452722
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.457617411225659
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.459072696050372
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.4610983981693364
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.4614065180102918
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.4645714285714284
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10.4    |
| Iteration     | 68       |
| MaximumReturn | -0.118   |
| MinimumReturn | -60.2    |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5526388883590698
Validation loss = 0.5479851961135864
Validation loss = 0.546573281288147
Validation loss = 0.5499008893966675
Validation loss = 0.5505390763282776
Validation loss = 0.5540161728858948
Validation loss = 0.5494204759597778
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5477540493011475
Validation loss = 0.5480054020881653
Validation loss = 0.5480535626411438
Validation loss = 0.5514295101165771
Validation loss = 0.5469843149185181
Validation loss = 0.5455418229103088
Validation loss = 0.546074628829956
Validation loss = 0.5517255067825317
Validation loss = 0.5585120916366577
Validation loss = 0.5566102266311646
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5536141395568848
Validation loss = 0.5434885621070862
Validation loss = 0.5467471480369568
Validation loss = 0.5519790053367615
Validation loss = 0.551240861415863
Validation loss = 0.5505775213241577
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5571537613868713
Validation loss = 0.5537082552909851
Validation loss = 0.5542587041854858
Validation loss = 0.5503265261650085
Validation loss = 0.5521668195724487
Validation loss = 0.5544745326042175
Validation loss = 0.5609419345855713
Validation loss = 0.5546935796737671
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5490151643753052
Validation loss = 0.5540465712547302
Validation loss = 0.5493714809417725
Validation loss = 0.5501248836517334
Validation loss = 0.5557442307472229
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.464877213021131
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.4680365296803655
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.468910439247005
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.4686431014823262
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.4706552706552705
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.4772209567198176
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.4786568013659647
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.4800910125142206
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.4820920977828314
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.484659090909091
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.4894946053378764
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.4909194097616343
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.4912081678956324
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.493197278911565
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.495184135977337
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.4960362400906004
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.497453310696095
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.4994343891402715
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.500847936687394
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.5050847457627117
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.5098814229249014
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.510158013544018
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.5104342921601805
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.5186020293122886
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.51943661971831
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -16.3    |
| Iteration     | 69       |
| MaximumReturn | -0.198   |
| MinimumReturn | -160     |
| TotalSamples  | 118286   |
----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5533066391944885
Validation loss = 0.5500720143318176
Validation loss = 0.5505826473236084
Validation loss = 0.5476893186569214
Validation loss = 0.5510829091072083
Validation loss = 0.5576398968696594
Validation loss = 0.553519606590271
Validation loss = 0.553410530090332
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5591884255409241
Validation loss = 0.553115725517273
Validation loss = 0.5518388152122498
Validation loss = 0.5540192127227783
Validation loss = 0.5563786625862122
Validation loss = 0.5541359186172485
Validation loss = 0.5525621175765991
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.548207700252533
Validation loss = 0.5498062372207642
Validation loss = 0.5485939383506775
Validation loss = 0.5498581528663635
Validation loss = 0.5504570603370667
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5528079271316528
Validation loss = 0.5507570505142212
Validation loss = 0.5539045333862305
Validation loss = 0.5533967018127441
Validation loss = 0.5541183352470398
Validation loss = 0.5537112355232239
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5512652397155762
Validation loss = 0.5479075908660889
Validation loss = 0.5475226044654846
Validation loss = 0.5499313473701477
Validation loss = 0.5502249002456665
Validation loss = 0.5512714982032776
Validation loss = 0.5526968240737915
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.526463963963964
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.529544175576815
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.534308211473566
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.536256323777403
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.5387640449438202
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.541268950028074
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.542648709315376
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.550196298373528
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.552690582959641
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.5585434173669466
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.562709966405375
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.562954672635702
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.562639821029083
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.5645612073784236
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.566480446927374
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.5739810161920715
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.5786830357142856
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.5833798103736756
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.584726867335563
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.5860724233983285
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.59075723830735
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.593767390094602
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.5956618464961068
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.602556976097832
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.6055555555555556
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.57    |
| Iteration     | 70       |
| MaximumReturn | -0.273   |
| MinimumReturn | -50.7    |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5520155429840088
Validation loss = 0.5554551482200623
Validation loss = 0.5487819910049438
Validation loss = 0.5480382442474365
Validation loss = 0.5518366694450378
Validation loss = 0.5556594133377075
Validation loss = 0.5524108409881592
Validation loss = 0.5527997612953186
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5483250021934509
Validation loss = 0.5522735118865967
Validation loss = 0.551330029964447
Validation loss = 0.5566270351409912
Validation loss = 0.5523687601089478
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5544344186782837
Validation loss = 0.5525134801864624
Validation loss = 0.548836350440979
Validation loss = 0.5503318905830383
Validation loss = 0.5504038333892822
Validation loss = 0.552043616771698
Validation loss = 0.5513210892677307
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5540398955345154
Validation loss = 0.5509700775146484
Validation loss = 0.5521119832992554
Validation loss = 0.5562953352928162
Validation loss = 0.5643867254257202
Validation loss = 0.5567967891693115
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5534346699714661
Validation loss = 0.551459014415741
Validation loss = 0.5556544065475464
Validation loss = 0.5504064559936523
Validation loss = 0.5475079417228699
Validation loss = 0.5536545515060425
Validation loss = 0.5556774139404297
Validation loss = 0.5532865524291992
Validation loss = 0.552623987197876
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.6107717934480843
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.6132075471698113
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.6161952301719356
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.618070953436807
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.6204986149584486
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.6234772978959024
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.6281128942999445
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.6327433628318584
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.63681592039801
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.639226519337017
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.6421866372170073
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.643487858719647
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.6497517926089356
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.652701212789416
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.652341597796143
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.6569383259911894
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.658778205833792
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.6611661166116614
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.669048927982408
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.6747252747252745
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.6803953871499178
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.6844127332601535
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.689522764673615
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.6891447368421053
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.690958904109589
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -33.8    |
| Iteration     | 71       |
| MaximumReturn | -0.145   |
| MinimumReturn | -96.6    |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5527907013893127
Validation loss = 0.5548096895217896
Validation loss = 0.5509080290794373
Validation loss = 0.5556334853172302
Validation loss = 0.5581991672515869
Validation loss = 0.5538929104804993
Validation loss = 0.5525338649749756
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5549772381782532
Validation loss = 0.5512194037437439
Validation loss = 0.5486589074134827
Validation loss = 0.5526697039604187
Validation loss = 0.5512880086898804
Validation loss = 0.5531923770904541
Validation loss = 0.5538476705551147
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.550373911857605
Validation loss = 0.5540010929107666
Validation loss = 0.5511519312858582
Validation loss = 0.552484929561615
Validation loss = 0.5513061881065369
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.556817889213562
Validation loss = 0.5542113184928894
Validation loss = 0.5536091923713684
Validation loss = 0.5554946660995483
Validation loss = 0.5624672770500183
Validation loss = 0.5552566051483154
Validation loss = 0.5617397427558899
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.552061915397644
Validation loss = 0.5507001876831055
Validation loss = 0.548496425151825
Validation loss = 0.552240788936615
Validation loss = 0.5543099641799927
Validation loss = 0.5579031705856323
Validation loss = 0.5539411306381226
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.6916757940854326
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.6934865900383143
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.6936542669584247
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.6960087479496995
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.69672131147541
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.6968869470234846
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.699781659388646
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.699945444626296
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.703380588876772
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.7046321525885557
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.704248366013072
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.7098530212302667
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.7127312295973884
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.7161500815660684
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.7168478260869566
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.717544812601847
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.7176981541802387
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.723277265328269
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.725054229934924
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.727371273712737
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.727518959913326
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.731997834325934
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.734848484848485
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.738236884802596
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.7416216216216216
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -34.3    |
| Iteration     | 72       |
| MaximumReturn | -0.295   |
| MinimumReturn | -93.7    |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.556328296661377
Validation loss = 0.5532137155532837
Validation loss = 0.5520859956741333
Validation loss = 0.5501874685287476
Validation loss = 0.5524463057518005
Validation loss = 0.5573928952217102
Validation loss = 0.5531746745109558
Validation loss = 0.5540004372596741
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5505149364471436
Validation loss = 0.5566736459732056
Validation loss = 0.5546266436576843
Validation loss = 0.5552043318748474
Validation loss = 0.5509617924690247
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5528321266174316
Validation loss = 0.5503089427947998
Validation loss = 0.5526110529899597
Validation loss = 0.5501547455787659
Validation loss = 0.5512411594390869
Validation loss = 0.5535062551498413
Validation loss = 0.5593569874763489
Validation loss = 0.5582385659217834
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.552889883518219
Validation loss = 0.5643624663352966
Validation loss = 0.5515836477279663
Validation loss = 0.5566269755363464
Validation loss = 0.5593040585517883
Validation loss = 0.556729793548584
Validation loss = 0.5599311590194702
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5525727272033691
Validation loss = 0.5589039921760559
Validation loss = 0.5608953833580017
Validation loss = 0.5552853941917419
Validation loss = 0.548961877822876
Validation loss = 0.5556716918945312
Validation loss = 0.5552765130996704
Validation loss = 0.5552878975868225
Validation loss = 0.5637298226356506
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.7423014586709886
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.7456803455723544
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.750134916351862
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.7513484358144553
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.7536388140161727
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.753771551724138
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.7555196553581043
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.7572658772874057
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.762775685852609
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.763440860215054
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.766792047286405
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.7701396348012888
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.770799785292539
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.7709227467811157
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.773190348525469
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.7738478027867095
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.7739689341189075
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.777301927194861
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 19
average number of affinization = 2.7859818084537187
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.788235294117647
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.7899518973810795
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.7911324786324787
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.793379604911906
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.797758804695838
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.8016
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -40.3    |
| Iteration     | 73       |
| MaximumReturn | -0.191   |
| MinimumReturn | -172     |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5552619695663452
Validation loss = 0.5472179651260376
Validation loss = 0.5553152561187744
Validation loss = 0.5540107488632202
Validation loss = 0.5543668866157532
Validation loss = 0.554111123085022
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5506699085235596
Validation loss = 0.5518326163291931
Validation loss = 0.5507077574729919
Validation loss = 0.5562955141067505
Validation loss = 0.5528454184532166
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5536608695983887
Validation loss = 0.5492500066757202
Validation loss = 0.5540044903755188
Validation loss = 0.5525217652320862
Validation loss = 0.5513237714767456
Validation loss = 0.5563062429428101
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5572506785392761
Validation loss = 0.5557309985160828
Validation loss = 0.5554239153862
Validation loss = 0.5533071756362915
Validation loss = 0.5571563839912415
Validation loss = 0.561514139175415
Validation loss = 0.5619083046913147
Validation loss = 0.5548030138015747
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5555880069732666
Validation loss = 0.550686240196228
Validation loss = 0.5525421500205994
Validation loss = 0.5520361065864563
Validation loss = 0.5548165440559387
Validation loss = 0.5557202696800232
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.803304904051173
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.805007991475759
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.807241746538871
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.8116019159127195
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.8164893617021276
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.8171185539606594
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.8214665249734323
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.8268720127456186
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.8274946921443735
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.830238726790451
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.8324496288441146
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.836248012718601
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.8395127118644066
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.842244573848597
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.8444444444444446
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.847699629825489
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.854651162790698
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.859482303222398
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.862196409714889
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.865963060686016
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.869198312236287
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.8692672641012122
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.873551106427819
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.877303844128489
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.880526315789474
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -16.8    |
| Iteration     | 74       |
| MaximumReturn | -0.151   |
| MinimumReturn | -57.1    |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5524693727493286
Validation loss = 0.5551212430000305
Validation loss = 0.5545887351036072
Validation loss = 0.5566328763961792
Validation loss = 0.5535277128219604
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5515837073326111
Validation loss = 0.5513830780982971
Validation loss = 0.5514838099479675
Validation loss = 0.5557342171669006
Validation loss = 0.5536536574363708
Validation loss = 0.5561762452125549
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5548093914985657
Validation loss = 0.5525147914886475
Validation loss = 0.5536757111549377
Validation loss = 0.5540921092033386
Validation loss = 0.5541146993637085
Validation loss = 0.5551638007164001
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5551419258117676
Validation loss = 0.5563737750053406
Validation loss = 0.5557554364204407
Validation loss = 0.5575333833694458
Validation loss = 0.5563482642173767
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.555912971496582
Validation loss = 0.5581411719322205
Validation loss = 0.5547976493835449
Validation loss = 0.5555724501609802
Validation loss = 0.5532289147377014
Validation loss = 0.5553057789802551
Validation loss = 0.5558236241340637
Validation loss = 0.5555611848831177
Validation loss = 0.559822142124176
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 18
average number of affinization = 2.8884797475013153
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.8906414300736065
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.8928008407777193
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.8949579831932772
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.898687664041995
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.899265477439664
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.902464604090194
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.9051362683438153
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.9104243059193293
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.9146596858638745
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 20
average number of affinization = 2.923600209314495
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.928870292887029
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.931521170935703
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.93730407523511
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.938903394255875
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.943110647181628
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.948356807511737
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.950990615224192
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.9577905158936946
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.9609375
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.9666840187402395
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.969302809573361
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.9755590223608945
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.9760914760914763
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.9802597402597404
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -13.1    |
| Iteration     | 75       |
| MaximumReturn | -0.166   |
| MinimumReturn | -76.1    |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5521788597106934
Validation loss = 0.5498323440551758
Validation loss = 0.5548257827758789
Validation loss = 0.5523191690444946
Validation loss = 0.5559178590774536
Validation loss = 0.5533015727996826
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5537254810333252
Validation loss = 0.5499693751335144
Validation loss = 0.5535993576049805
Validation loss = 0.5547053813934326
Validation loss = 0.5543354749679565
Validation loss = 0.5506239533424377
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5536526441574097
Validation loss = 0.558476448059082
Validation loss = 0.5467756986618042
Validation loss = 0.5497009754180908
Validation loss = 0.5579773783683777
Validation loss = 0.5553112626075745
Validation loss = 0.5593210458755493
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5567048788070679
Validation loss = 0.5592910051345825
Validation loss = 0.5608057379722595
Validation loss = 0.5558081865310669
Validation loss = 0.5590636730194092
Validation loss = 0.5589522123336792
Validation loss = 0.5598149299621582
Validation loss = 0.5609699487686157
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.556199848651886
Validation loss = 0.5541254281997681
Validation loss = 0.5607355237007141
Validation loss = 0.5562729835510254
Validation loss = 0.5592775344848633
Validation loss = 0.556252121925354
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.9823468328141227
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.9844317592112093
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.9885892116182573
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.991187143597719
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.994818652849741
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.99585706887623
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.998447204968944
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.0025866528711846
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.0067218200620474
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.014987080103359
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 21
average number of affinization = 3.0242768595041323
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.028394424367579
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.0309597523219813
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.03455389375967
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.042783505154639
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.048428645028336
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 21
average number of affinization = 3.057672502574665
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.0663921770458056
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.0694444444444446
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.0719794344473006
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.079136690647482
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.0873138161273754
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.089322381930185
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.097998973832735
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.101025641025641
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.37    |
| Iteration     | 76       |
| MaximumReturn | -0.147   |
| MinimumReturn | -64.4    |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5566635131835938
Validation loss = 0.5635566115379333
Validation loss = 0.5542277097702026
Validation loss = 0.5553857684135437
Validation loss = 0.558316171169281
Validation loss = 0.560939610004425
Validation loss = 0.5609227418899536
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5541921854019165
Validation loss = 0.5537311434745789
Validation loss = 0.5500242114067078
Validation loss = 0.553999125957489
Validation loss = 0.5548198223114014
Validation loss = 0.5552316904067993
Validation loss = 0.5548242926597595
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5531964302062988
Validation loss = 0.5517740249633789
Validation loss = 0.5574970245361328
Validation loss = 0.5543816089630127
Validation loss = 0.5546636581420898
Validation loss = 0.5548880696296692
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5641025900840759
Validation loss = 0.562825620174408
Validation loss = 0.5575017929077148
Validation loss = 0.5590899586677551
Validation loss = 0.5602530837059021
Validation loss = 0.5628912448883057
Validation loss = 0.5575425028800964
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5623857975006104
Validation loss = 0.5555080771446228
Validation loss = 0.5599179267883301
Validation loss = 0.5586265325546265
Validation loss = 0.5606354475021362
Validation loss = 0.5580480098724365
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.1040492055356226
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.1086065573770494
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.1116231438812085
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.115660184237462
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.118670076726343
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 3.1186094069529653
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.1200817577925397
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.1225740551583248
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.1276161306789176
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.13265306122449
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 3.1320754716981134
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.13302752293578
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.1349974528782476
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.1425661914460283
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.146564885496183
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.1485249237029502
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.1535332994407725
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.154979674796748
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.1579481970543424
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.1624365482233503
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.1664129883307965
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.1713995943204867
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.1738469336036492
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.1752786220871325
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.1777215189873416
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.949   |
| Iteration     | 77       |
| MaximumReturn | -0.184   |
| MinimumReturn | -3.13    |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5581082105636597
Validation loss = 0.55508953332901
Validation loss = 0.5573623180389404
Validation loss = 0.5542833209037781
Validation loss = 0.5529378652572632
Validation loss = 0.5574635863304138
Validation loss = 0.5566661953926086
Validation loss = 0.5607197284698486
Validation loss = 0.5593448281288147
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5541173815727234
Validation loss = 0.5534350872039795
Validation loss = 0.5526584386825562
Validation loss = 0.5584937930107117
Validation loss = 0.5550677180290222
Validation loss = 0.5601868033409119
Validation loss = 0.5581072568893433
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5582877397537231
Validation loss = 0.5543326735496521
Validation loss = 0.5527603030204773
Validation loss = 0.5554035902023315
Validation loss = 0.5562690496444702
Validation loss = 0.560555100440979
Validation loss = 0.5569287538528442
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5611015558242798
Validation loss = 0.5560687780380249
Validation loss = 0.5599492192268372
Validation loss = 0.5600782036781311
Validation loss = 0.5640008449554443
Validation loss = 0.5673656463623047
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5567610263824463
Validation loss = 0.5557206273078918
Validation loss = 0.5617808699607849
Validation loss = 0.5597160458564758
Validation loss = 0.5591059923171997
Validation loss = 0.5570088624954224
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.1837044534412957
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.189681335356601
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.192618806875632
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 4
average number of affinization = 3.1930267812026276
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.195959595959596
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.2024230186774356
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.2058526740665996
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.2128088754412505
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.2172379032258065
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 3.216624685138539
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.2220543806646527
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.225968797181681
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.227364185110664
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.233785822021116
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.2371859296482413
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.2405826217980915
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.2444779116465865
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.247365780230808
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.2487462387161483
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.254135338345865
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.2600200400801604
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.2643965948923386
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 3.2627627627627627
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.268134067033517
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.2705
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -16.7    |
| Iteration     | 78       |
| MaximumReturn | -0.188   |
| MinimumReturn | -108     |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5589316487312317
Validation loss = 0.5575217604637146
Validation loss = 0.560587465763092
Validation loss = 0.5553097128868103
Validation loss = 0.557334840297699
Validation loss = 0.5622525811195374
Validation loss = 0.5619972348213196
Validation loss = 0.5601826906204224
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5570690035820007
Validation loss = 0.5535112023353577
Validation loss = 0.557887077331543
Validation loss = 0.5538342595100403
Validation loss = 0.5615813136100769
Validation loss = 0.5599769353866577
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5633190274238586
Validation loss = 0.5580558776855469
Validation loss = 0.5540685653686523
Validation loss = 0.557499885559082
Validation loss = 0.5623271465301514
Validation loss = 0.5583851933479309
Validation loss = 0.5627894401550293
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5618818402290344
Validation loss = 0.5568848252296448
Validation loss = 0.5591443777084351
Validation loss = 0.5604211688041687
Validation loss = 0.5578992366790771
Validation loss = 0.5584336519241333
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5563283562660217
Validation loss = 0.5571892261505127
Validation loss = 0.5604627728462219
Validation loss = 0.5628061294555664
Validation loss = 0.5588719844818115
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.2773613193403297
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.28021978021978
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.285571642536196
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.2879241516966067
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.290274314214464
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.2981056829511464
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.3044344793223717
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.307768924302789
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.3091090094574414
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.3129353233830847
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.3137742416708105
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.320079522862823
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.321410829607551
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.322740814299901
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 3.3225806451612905
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.327876984126984
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.330193356470005
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.3354806739345886
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 28
average number of affinization = 3.347696879643388
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.3504950495049504
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.354280059376546
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.361028684470821
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 21
average number of affinization = 3.369747899159664
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.375494071146245
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.38320987654321
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.3    |
| Iteration     | 79       |
| MaximumReturn | -0.245   |
| MinimumReturn | -117     |
| TotalSamples  | 134946   |
----------------------------
