Logging to experiments/invertedPendulum/nov2/IPO01w350e3_seed3214
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7427970170974731
Validation loss = 0.6238535046577454
Validation loss = 0.582811176776886
Validation loss = 0.5657731890678406
Validation loss = 0.5420613288879395
Validation loss = 0.5415967106819153
Validation loss = 0.526361882686615
Validation loss = 0.5224274396896362
Validation loss = 0.520880401134491
Validation loss = 0.5050879716873169
Validation loss = 0.49598565697669983
Validation loss = 0.49383851885795593
Validation loss = 0.4923574924468994
Validation loss = 0.48241302371025085
Validation loss = 0.48384350538253784
Validation loss = 0.48192664980888367
Validation loss = 0.4894208014011383
Validation loss = 0.47824010252952576
Validation loss = 0.4765744209289551
Validation loss = 0.4605194628238678
Validation loss = 0.4657009243965149
Validation loss = 0.47016313672065735
Validation loss = 0.459960401058197
Validation loss = 0.46191972494125366
Validation loss = 0.48825138807296753
Validation loss = 0.4692845940589905
Validation loss = 0.46058306097984314
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7326635122299194
Validation loss = 0.6188936233520508
Validation loss = 0.5668202638626099
Validation loss = 0.5454614758491516
Validation loss = 0.542344331741333
Validation loss = 0.5303007960319519
Validation loss = 0.5165689587593079
Validation loss = 0.5027784705162048
Validation loss = 0.5002089142799377
Validation loss = 0.5062774419784546
Validation loss = 0.4922094941139221
Validation loss = 0.48460787534713745
Validation loss = 0.4819474518299103
Validation loss = 0.47638052701950073
Validation loss = 0.4838899075984955
Validation loss = 0.4689549505710602
Validation loss = 0.4733896553516388
Validation loss = 0.46524545550346375
Validation loss = 0.46433278918266296
Validation loss = 0.46103790402412415
Validation loss = 0.4508846700191498
Validation loss = 0.448340505361557
Validation loss = 0.44199642539024353
Validation loss = 0.44115597009658813
Validation loss = 0.4414514899253845
Validation loss = 0.4465813636779785
Validation loss = 0.4505319893360138
Validation loss = 0.439583420753479
Validation loss = 0.43659910559654236
Validation loss = 0.4365341365337372
Validation loss = 0.43302038311958313
Validation loss = 0.4259174168109894
Validation loss = 0.43525439500808716
Validation loss = 0.4426818788051605
Validation loss = 0.4268474578857422
Validation loss = 0.4348544180393219
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7430105209350586
Validation loss = 0.6043571829795837
Validation loss = 0.5684316158294678
Validation loss = 0.5531761050224304
Validation loss = 0.5443450212478638
Validation loss = 0.5152246952056885
Validation loss = 0.5133718848228455
Validation loss = 0.5067971348762512
Validation loss = 0.5014562606811523
Validation loss = 0.5011985898017883
Validation loss = 0.49556756019592285
Validation loss = 0.49532660841941833
Validation loss = 0.4864600896835327
Validation loss = 0.4906361997127533
Validation loss = 0.47699639201164246
Validation loss = 0.47733569145202637
Validation loss = 0.4847988784313202
Validation loss = 0.4728817641735077
Validation loss = 0.4745737612247467
Validation loss = 0.47556623816490173
Validation loss = 0.463661253452301
Validation loss = 0.46471133828163147
Validation loss = 0.4589190185070038
Validation loss = 0.44968369603157043
Validation loss = 0.45164379477500916
Validation loss = 0.45100638270378113
Validation loss = 0.45132818818092346
Validation loss = 0.46539872884750366
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7280536890029907
Validation loss = 0.6123934388160706
Validation loss = 0.586410641670227
Validation loss = 0.572753369808197
Validation loss = 0.5443710684776306
Validation loss = 0.5403220653533936
Validation loss = 0.5287886261940002
Validation loss = 0.5225464105606079
Validation loss = 0.5126233696937561
Validation loss = 0.5021174550056458
Validation loss = 0.5039066076278687
Validation loss = 0.49882248044013977
Validation loss = 0.5009272694587708
Validation loss = 0.489043265581131
Validation loss = 0.48619362711906433
Validation loss = 0.4767884314060211
Validation loss = 0.4756603538990021
Validation loss = 0.46880048513412476
Validation loss = 0.46873390674591064
Validation loss = 0.46467703580856323
Validation loss = 0.4608221650123596
Validation loss = 0.4520077705383301
Validation loss = 0.4583287239074707
Validation loss = 0.45223402976989746
Validation loss = 0.44940879940986633
Validation loss = 0.4453863799571991
Validation loss = 0.44863826036453247
Validation loss = 0.4581313729286194
Validation loss = 0.43897518515586853
Validation loss = 0.4427705407142639
Validation loss = 0.4508380889892578
Validation loss = 0.4393988251686096
Validation loss = 0.4374133050441742
Validation loss = 0.43438076972961426
Validation loss = 0.43613654375076294
Validation loss = 0.4491331875324249
Validation loss = 0.4376806318759918
Validation loss = 0.43269187211990356
Validation loss = 0.4297269880771637
Validation loss = 0.4511135220527649
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7231229543685913
Validation loss = 0.6247551441192627
Validation loss = 0.5665438175201416
Validation loss = 0.5405266284942627
Validation loss = 0.5396912097930908
Validation loss = 0.518600344657898
Validation loss = 0.5132609605789185
Validation loss = 0.5039635300636292
Validation loss = 0.4923870861530304
Validation loss = 0.49847841262817383
Validation loss = 0.48996758460998535
Validation loss = 0.48646897077560425
Validation loss = 0.4872545301914215
Validation loss = 0.48151957988739014
Validation loss = 0.4779510200023651
Validation loss = 0.4770902395248413
Validation loss = 0.4654659330844879
Validation loss = 0.470825731754303
Validation loss = 0.4693579375743866
Validation loss = 0.4643929898738861
Validation loss = 0.46044838428497314
Validation loss = 0.4545153081417084
Validation loss = 0.45623117685317993
Validation loss = 0.4434923231601715
Validation loss = 0.4592705965042114
Validation loss = 0.44179415702819824
Validation loss = 0.44450095295906067
Validation loss = 0.4580364227294922
Validation loss = 0.43739062547683716
Validation loss = 0.4523773491382599
Validation loss = 0.44179531931877136
Validation loss = 0.4473445415496826
Validation loss = 0.4495941698551178
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.038461538461538464
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.037037037037037035
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03571428571428571
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.034482758620689655
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03333333333333333
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03225806451612903
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03125
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.030303030303030304
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.029411764705882353
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.02857142857142857
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.05555555555555555
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08108108108108109
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07894736842105263
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07692307692307693
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0975609756097561
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09523809523809523
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09302325581395349
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.11363636363636363
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.13333333333333333
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15217391304347827
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14893617021276595
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14583333333333334
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14285714285714285
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10.1    |
| Iteration     | 0        |
| MaximumReturn | -0.116   |
| MinimumReturn | -41.1    |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5612372756004333
Validation loss = 0.49501699209213257
Validation loss = 0.466315895318985
Validation loss = 0.4562818109989166
Validation loss = 0.45265993475914
Validation loss = 0.4456024765968323
Validation loss = 0.44540026783943176
Validation loss = 0.4396290183067322
Validation loss = 0.43758276104927063
Validation loss = 0.43207958340644836
Validation loss = 0.43101006746292114
Validation loss = 0.435857355594635
Validation loss = 0.43105608224868774
Validation loss = 0.4286651909351349
Validation loss = 0.42071276903152466
Validation loss = 0.4239654242992401
Validation loss = 0.4168394207954407
Validation loss = 0.4223741590976715
Validation loss = 0.4257218539714813
Validation loss = 0.4322308301925659
Validation loss = 0.4185401201248169
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6468586325645447
Validation loss = 0.4887983202934265
Validation loss = 0.4735073745250702
Validation loss = 0.46214181184768677
Validation loss = 0.45467236638069153
Validation loss = 0.44656142592430115
Validation loss = 0.4397326111793518
Validation loss = 0.43176156282424927
Validation loss = 0.43934983015060425
Validation loss = 0.42596435546875
Validation loss = 0.42781898379325867
Validation loss = 0.42608770728111267
Validation loss = 0.4208270013332367
Validation loss = 0.4285659193992615
Validation loss = 0.44508153200149536
Validation loss = 0.423179566860199
Validation loss = 0.4168878197669983
Validation loss = 0.4179353713989258
Validation loss = 0.4203546345233917
Validation loss = 0.4095587134361267
Validation loss = 0.42317742109298706
Validation loss = 0.41212207078933716
Validation loss = 0.41548043489456177
Validation loss = 0.41526442766189575
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5807209610939026
Validation loss = 0.49063652753829956
Validation loss = 0.46731001138687134
Validation loss = 0.45966899394989014
Validation loss = 0.45269152522087097
Validation loss = 0.44320374727249146
Validation loss = 0.4433032274246216
Validation loss = 0.4441532492637634
Validation loss = 0.439607709646225
Validation loss = 0.43526995182037354
Validation loss = 0.432125985622406
Validation loss = 0.43720543384552
Validation loss = 0.4332997500896454
Validation loss = 0.43002989888191223
Validation loss = 0.4181117117404938
Validation loss = 0.4361568093299866
Validation loss = 0.42454394698143005
Validation loss = 0.43039441108703613
Validation loss = 0.4196610152721405
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6269542574882507
Validation loss = 0.4927232563495636
Validation loss = 0.4724092185497284
Validation loss = 0.46378904581069946
Validation loss = 0.4526045620441437
Validation loss = 0.447359174489975
Validation loss = 0.4396112859249115
Validation loss = 0.4371855556964874
Validation loss = 0.43270280957221985
Validation loss = 0.4326547682285309
Validation loss = 0.43032315373420715
Validation loss = 0.4329838156700134
Validation loss = 0.424820214509964
Validation loss = 0.4333818852901459
Validation loss = 0.4258895218372345
Validation loss = 0.4222103953361511
Validation loss = 0.4265870451927185
Validation loss = 0.4239415228366852
Validation loss = 0.42195436358451843
Validation loss = 0.41475221514701843
Validation loss = 0.4152427315711975
Validation loss = 0.42055124044418335
Validation loss = 0.4112008213996887
Validation loss = 0.41637203097343445
Validation loss = 0.43678367137908936
Validation loss = 0.4108642637729645
Validation loss = 0.4156505763530731
Validation loss = 0.41178447008132935
Validation loss = 0.4115017354488373
Validation loss = 0.41464975476264954
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6174260377883911
Validation loss = 0.489532470703125
Validation loss = 0.4636131525039673
Validation loss = 0.451262891292572
Validation loss = 0.44682368636131287
Validation loss = 0.4358294606208801
Validation loss = 0.430276483297348
Validation loss = 0.42987364530563354
Validation loss = 0.4335757791996002
Validation loss = 0.43439781665802
Validation loss = 0.42503753304481506
Validation loss = 0.42004913091659546
Validation loss = 0.41655775904655457
Validation loss = 0.42311224341392517
Validation loss = 0.4163847863674164
Validation loss = 0.4151296019554138
Validation loss = 0.415073424577713
Validation loss = 0.41800758242607117
Validation loss = 0.4240356981754303
Validation loss = 0.41641801595687866
Validation loss = 0.4208841323852539
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13725490196078433
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1346153846153846
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1320754716981132
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12962962962962962
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12727272727272726
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12280701754385964
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1206896551724138
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11864406779661017
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11666666666666667
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11475409836065574
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11290322580645161
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1111111111111111
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.109375
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1076923076923077
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10606060606060606
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1044776119402985
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10294117647058823
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10144927536231885
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09859154929577464
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09722222222222222
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0958904109589041
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0945945945945946
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.2     |
| Iteration     | 1        |
| MaximumReturn | -0.0485  |
| MinimumReturn | -38.2    |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.45901989936828613
Validation loss = 0.4158535599708557
Validation loss = 0.4135799705982208
Validation loss = 0.4123985767364502
Validation loss = 0.40822941064834595
Validation loss = 0.4166017174720764
Validation loss = 0.41634368896484375
Validation loss = 0.40755677223205566
Validation loss = 0.4146866500377655
Validation loss = 0.4214029908180237
Validation loss = 0.4143062233924866
Validation loss = 0.414190411567688
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4797116816043854
Validation loss = 0.415573388338089
Validation loss = 0.41026878356933594
Validation loss = 0.41635531187057495
Validation loss = 0.4116320013999939
Validation loss = 0.4079863429069519
Validation loss = 0.412077933549881
Validation loss = 0.4067150354385376
Validation loss = 0.4081003665924072
Validation loss = 0.40876656770706177
Validation loss = 0.41045981645584106
Validation loss = 0.4070814251899719
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4388098120689392
Validation loss = 0.4130362272262573
Validation loss = 0.41523027420043945
Validation loss = 0.40701615810394287
Validation loss = 0.40553760528564453
Validation loss = 0.41102588176727295
Validation loss = 0.4069369435310364
Validation loss = 0.4099065065383911
Validation loss = 0.4300142824649811
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4756915867328644
Validation loss = 0.42258644104003906
Validation loss = 0.41378116607666016
Validation loss = 0.4248591661453247
Validation loss = 0.4156024754047394
Validation loss = 0.41033419966697693
Validation loss = 0.42310547828674316
Validation loss = 0.4179174304008484
Validation loss = 0.4092335104942322
Validation loss = 0.4179046154022217
Validation loss = 0.43668487668037415
Validation loss = 0.419319212436676
Validation loss = 0.42205023765563965
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.46535640954971313
Validation loss = 0.417008638381958
Validation loss = 0.4108957052230835
Validation loss = 0.41871726512908936
Validation loss = 0.4062502086162567
Validation loss = 0.40782541036605835
Validation loss = 0.4073868691921234
Validation loss = 0.40589791536331177
Validation loss = 0.3991577625274658
Validation loss = 0.4031612277030945
Validation loss = 0.4111420512199402
Validation loss = 0.40343865752220154
Validation loss = 0.40686261653900146
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09210526315789473
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09090909090909091
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10256410256410256
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10126582278481013
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1111111111111111
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12195121951219512
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.13253012048192772
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13095238095238096
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1411764705882353
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13953488372093023
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13793103448275862
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13636363636363635
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1348314606741573
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13333333333333333
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13186813186813187
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13043478260869565
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12903225806451613
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1276595744680851
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12631578947368421
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12371134020618557
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12244897959183673
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12121212121212122
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.74    |
| Iteration     | 2        |
| MaximumReturn | -0.03    |
| MinimumReturn | -17.6    |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.41233325004577637
Validation loss = 0.3831728994846344
Validation loss = 0.3947416841983795
Validation loss = 0.3884022533893585
Validation loss = 0.3871886432170868
Validation loss = 0.3818023204803467
Validation loss = 0.38552799820899963
Validation loss = 0.39482495188713074
Validation loss = 0.3861841857433319
Validation loss = 0.3912983238697052
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4167693853378296
Validation loss = 0.3874841630458832
Validation loss = 0.37995076179504395
Validation loss = 0.3850167989730835
Validation loss = 0.3794805109500885
Validation loss = 0.375361829996109
Validation loss = 0.38430091738700867
Validation loss = 0.3866334855556488
Validation loss = 0.37909433245658875
Validation loss = 0.38809022307395935
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.40764865279197693
Validation loss = 0.390026718378067
Validation loss = 0.3865635395050049
Validation loss = 0.3853691518306732
Validation loss = 0.38424691557884216
Validation loss = 0.3874720335006714
Validation loss = 0.3901941776275635
Validation loss = 0.3806331157684326
Validation loss = 0.38830408453941345
Validation loss = 0.3857475221157074
Validation loss = 0.38761675357818604
Validation loss = 0.3926466405391693
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.42523130774497986
Validation loss = 0.38763391971588135
Validation loss = 0.3933769166469574
Validation loss = 0.38333311676979065
Validation loss = 0.3910585343837738
Validation loss = 0.39304015040397644
Validation loss = 0.39089104533195496
Validation loss = 0.38697192072868347
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4202112853527069
Validation loss = 0.38181209564208984
Validation loss = 0.38041791319847107
Validation loss = 0.37677252292633057
Validation loss = 0.3811957538127899
Validation loss = 0.3858112096786499
Validation loss = 0.3850417137145996
Validation loss = 0.3887280523777008
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1188118811881188
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11764705882352941
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11650485436893204
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11538461538461539
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11428571428571428
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11320754716981132
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11214953271028037
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1111111111111111
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11009174311926606
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10909090909090909
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10810810810810811
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10714285714285714
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10619469026548672
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10526315789473684
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10434782608695652
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10344827586206896
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10256410256410256
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1016949152542373
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10084033613445378
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09917355371900827
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09836065573770492
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0975609756097561
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0967741935483871
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.096
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.131   |
| Iteration     | 3        |
| MaximumReturn | -0.0225  |
| MinimumReturn | -0.592   |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4198715090751648
Validation loss = 0.3797729015350342
Validation loss = 0.38466787338256836
Validation loss = 0.3796480894088745
Validation loss = 0.38781535625457764
Validation loss = 0.38390129804611206
Validation loss = 0.3886198401451111
Validation loss = 0.3848590850830078
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.38412362337112427
Validation loss = 0.3800293505191803
Validation loss = 0.37941592931747437
Validation loss = 0.3796216547489166
Validation loss = 0.38247233629226685
Validation loss = 0.38834285736083984
Validation loss = 0.3792502284049988
Validation loss = 0.379375159740448
Validation loss = 0.38182416558265686
Validation loss = 0.38086703419685364
Validation loss = 0.37703895568847656
Validation loss = 0.38010600209236145
Validation loss = 0.37776702642440796
Validation loss = 0.39034849405288696
Validation loss = 0.38669848442077637
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.393709659576416
Validation loss = 0.3766256868839264
Validation loss = 0.3793328106403351
Validation loss = 0.37992385029792786
Validation loss = 0.37692686915397644
Validation loss = 0.37907135486602783
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.39800089597702026
Validation loss = 0.3793315887451172
Validation loss = 0.3828667998313904
Validation loss = 0.3880016803741455
Validation loss = 0.3843422532081604
Validation loss = 0.38752835988998413
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.39840927720069885
Validation loss = 0.375931978225708
Validation loss = 0.38128310441970825
Validation loss = 0.3791359066963196
Validation loss = 0.3789823055267334
Validation loss = 0.3786221444606781
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09523809523809523
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09448818897637795
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1015625
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10077519379844961
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09923664122137404
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09848484848484848
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09774436090225563
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09701492537313433
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0962962962962963
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09558823529411764
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0948905109489051
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09420289855072464
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10071942446043165
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10714285714285714
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.11347517730496454
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.11971830985915492
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11888111888111888
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.125
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1310344827586207
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13013698630136986
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1292517006802721
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12837837837837837
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12751677852348994
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12666666666666668
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.78    |
| Iteration     | 4        |
| MaximumReturn | -0.0212  |
| MinimumReturn | -41.7    |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3878338932991028
Validation loss = 0.3850034475326538
Validation loss = 0.3769739270210266
Validation loss = 0.3815469443798065
Validation loss = 0.38006189465522766
Validation loss = 0.38388824462890625
Validation loss = 0.3748418092727661
Validation loss = 0.3833155035972595
Validation loss = 0.37859123945236206
Validation loss = 0.38214924931526184
Validation loss = 0.3864735960960388
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.38022345304489136
Validation loss = 0.3833988904953003
Validation loss = 0.3801918029785156
Validation loss = 0.3785930871963501
Validation loss = 0.37158623337745667
Validation loss = 0.3861255943775177
Validation loss = 0.3815470337867737
Validation loss = 0.37891721725463867
Validation loss = 0.37935227155685425
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3883002996444702
Validation loss = 0.3738373816013336
Validation loss = 0.3713753819465637
Validation loss = 0.3715435862541199
Validation loss = 0.37581855058670044
Validation loss = 0.3760371506214142
Validation loss = 0.37309902906417847
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3890426754951477
Validation loss = 0.38756662607192993
Validation loss = 0.3857019543647766
Validation loss = 0.38036057353019714
Validation loss = 0.38441964983940125
Validation loss = 0.37617403268814087
Validation loss = 0.37944838404655457
Validation loss = 0.37922900915145874
Validation loss = 0.3832349181175232
Validation loss = 0.38656312227249146
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.38203510642051697
Validation loss = 0.3736330568790436
Validation loss = 0.3804839849472046
Validation loss = 0.38031548261642456
Validation loss = 0.3698282539844513
Validation loss = 0.37081751227378845
Validation loss = 0.3693384826183319
Validation loss = 0.37204593420028687
Validation loss = 0.3685740530490875
Validation loss = 0.37549877166748047
Validation loss = 0.3783034682273865
Validation loss = 0.3827303647994995
Validation loss = 0.38380759954452515
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.13245033112582782
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.13815789473684212
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1437908496732026
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.14935064935064934
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15483870967741936
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16025641025641027
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16560509554140126
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16455696202531644
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16981132075471697
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16875
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17391304347826086
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17901234567901234
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18404907975460122
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18292682926829268
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18181818181818182
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18674698795180722
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.19161676646706588
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19047619047619047
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1893491124260355
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.19411764705882353
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.19883040935672514
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20348837209302326
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20809248554913296
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20689655172413793
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2057142857142857
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -16.2    |
| Iteration     | 5        |
| MaximumReturn | -0.0378  |
| MinimumReturn | -48      |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.39286017417907715
Validation loss = 0.3865061402320862
Validation loss = 0.3938509523868561
Validation loss = 0.38246482610702515
Validation loss = 0.39278608560562134
Validation loss = 0.3860626816749573
Validation loss = 0.3929827809333801
Validation loss = 0.3878762722015381
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3864133656024933
Validation loss = 0.3784777522087097
Validation loss = 0.3832435607910156
Validation loss = 0.39733853936195374
Validation loss = 0.3776048719882965
Validation loss = 0.38615259528160095
Validation loss = 0.3889205753803253
Validation loss = 0.3848927617073059
Validation loss = 0.3910689949989319
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3824028968811035
Validation loss = 0.37206006050109863
Validation loss = 0.3781278431415558
Validation loss = 0.3706279695034027
Validation loss = 0.3764737844467163
Validation loss = 0.37712904810905457
Validation loss = 0.37458306550979614
Validation loss = 0.3752343952655792
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.39047443866729736
Validation loss = 0.38221946358680725
Validation loss = 0.38864150643348694
Validation loss = 0.3858475983142853
Validation loss = 0.39000242948532104
Validation loss = 0.38911280035972595
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.38591670989990234
Validation loss = 0.3882749080657959
Validation loss = 0.3828114867210388
Validation loss = 0.3820268213748932
Validation loss = 0.38367319107055664
Validation loss = 0.3806057572364807
Validation loss = 0.38749033212661743
Validation loss = 0.3793954849243164
Validation loss = 0.3838599920272827
Validation loss = 0.3869186043739319
Validation loss = 0.38936540484428406
Validation loss = 0.39697185158729553
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20454545454545456
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2033898305084746
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20224719101123595
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2011173184357542
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19889502762430938
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1978021978021978
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19672131147540983
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1956521739130435
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1945945945945946
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1935483870967742
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1925133689839572
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19148936170212766
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19047619047619047
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18947368421052632
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18848167539267016
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1875
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18652849740932642
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18556701030927836
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18461538461538463
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1836734693877551
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18274111675126903
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18181818181818182
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18090452261306533
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.128   |
| Iteration     | 6        |
| MaximumReturn | -0.0478  |
| MinimumReturn | -0.488   |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3749968707561493
Validation loss = 0.3789997100830078
Validation loss = 0.3751577138900757
Validation loss = 0.390065461397171
Validation loss = 0.37929537892341614
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.37595483660697937
Validation loss = 0.3840571939945221
Validation loss = 0.3785182535648346
Validation loss = 0.38205695152282715
Validation loss = 0.3796166181564331
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.37387630343437195
Validation loss = 0.3715703785419464
Validation loss = 0.3766072988510132
Validation loss = 0.3661079406738281
Validation loss = 0.37215375900268555
Validation loss = 0.37944331765174866
Validation loss = 0.3709665536880493
Validation loss = 0.3744192123413086
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3863222897052765
Validation loss = 0.3753570318222046
Validation loss = 0.37260186672210693
Validation loss = 0.3749842643737793
Validation loss = 0.37307605147361755
Validation loss = 0.38373076915740967
Validation loss = 0.3764728009700775
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.38501474261283875
Validation loss = 0.37206268310546875
Validation loss = 0.37618955969810486
Validation loss = 0.37961408495903015
Validation loss = 0.38168835639953613
Validation loss = 0.37842118740081787
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18407960199004975
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18316831683168316
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18719211822660098
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18627450980392157
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18536585365853658
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18446601941747573
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18357487922705315
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18269230769230768
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18660287081339713
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18571428571428572
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1848341232227488
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18396226415094338
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18309859154929578
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1822429906542056
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1813953488372093
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18055555555555555
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17972350230414746
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1834862385321101
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1872146118721461
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18636363636363637
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18552036199095023
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18468468468468469
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18385650224215247
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18303571428571427
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18666666666666668
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.29    |
| Iteration     | 7        |
| MaximumReturn | -0.0301  |
| MinimumReturn | -28.5    |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.37982410192489624
Validation loss = 0.37342384457588196
Validation loss = 0.3743816018104553
Validation loss = 0.3783753514289856
Validation loss = 0.37276336550712585
Validation loss = 0.37783581018447876
Validation loss = 0.3743758797645569
Validation loss = 0.3813757002353668
Validation loss = 0.38425180315971375
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.37408336997032166
Validation loss = 0.37071457505226135
Validation loss = 0.3750026822090149
Validation loss = 0.3782508671283722
Validation loss = 0.3815588355064392
Validation loss = 0.3773822784423828
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3698326647281647
Validation loss = 0.36701369285583496
Validation loss = 0.36651352047920227
Validation loss = 0.3672287166118622
Validation loss = 0.3771332800388336
Validation loss = 0.3684796988964081
Validation loss = 0.37452906370162964
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3702786862850189
Validation loss = 0.3728647232055664
Validation loss = 0.3710768520832062
Validation loss = 0.37621909379959106
Validation loss = 0.3772223889827728
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.37166714668273926
Validation loss = 0.3722374141216278
Validation loss = 0.37280696630477905
Validation loss = 0.3727269470691681
Validation loss = 0.3777134418487549
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18584070796460178
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18502202643171806
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18421052631578946
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18340611353711792
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1826086956521739
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18181818181818182
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1810344827586207
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18025751072961374
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1794871794871795
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17872340425531916
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17796610169491525
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17721518987341772
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17647058823529413
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17573221757322174
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.175
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17427385892116182
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17355371900826447
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1728395061728395
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1721311475409836
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17142857142857143
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17073170731707318
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1700404858299595
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1693548387096774
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1686746987951807
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.168
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0927  |
| Iteration     | 8        |
| MaximumReturn | -0.0318  |
| MinimumReturn | -0.239   |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.37935152649879456
Validation loss = 0.38032782077789307
Validation loss = 0.3794577121734619
Validation loss = 0.3787570595741272
Validation loss = 0.38070616126060486
Validation loss = 0.38939356803894043
Validation loss = 0.3787693977355957
Validation loss = 0.38306254148483276
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.37408626079559326
Validation loss = 0.3773077726364136
Validation loss = 0.3787846565246582
Validation loss = 0.37667784094810486
Validation loss = 0.38001930713653564
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.37016528844833374
Validation loss = 0.3690744638442993
Validation loss = 0.37191715836524963
Validation loss = 0.37824708223342896
Validation loss = 0.37242692708969116
Validation loss = 0.3701086640357971
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3801538944244385
Validation loss = 0.3761815130710602
Validation loss = 0.38948142528533936
Validation loss = 0.3767967224121094
Validation loss = 0.3742271661758423
Validation loss = 0.37874338030815125
Validation loss = 0.3744988739490509
Validation loss = 0.3740507960319519
Validation loss = 0.37642377614974976
Validation loss = 0.37780529260635376
Validation loss = 0.37989628314971924
Validation loss = 0.37572771310806274
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.37653717398643494
Validation loss = 0.37620651721954346
Validation loss = 0.37327224016189575
Validation loss = 0.3734610378742218
Validation loss = 0.3744398355484009
Validation loss = 0.3723251223564148
Validation loss = 0.37866467237472534
Validation loss = 0.3743838667869568
Validation loss = 0.3805873394012451
Validation loss = 0.37956905364990234
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17131474103585656
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1746031746031746
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17786561264822134
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17716535433070865
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17647058823529413
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17578125
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17898832684824903
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1821705426356589
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18146718146718147
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18076923076923077
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18007662835249041
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17938931297709923
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18250950570342206
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18181818181818182
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18490566037735848
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18421052631578946
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18726591760299627
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1865671641791045
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1895910780669145
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1925925925925926
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.19557195571955718
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.19852941176470587
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1978021978021978
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20072992700729927
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20363636363636364
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.5    |
| Iteration     | 9        |
| MaximumReturn | -0.061   |
| MinimumReturn | -95.9    |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.382922887802124
Validation loss = 0.3770015835762024
Validation loss = 0.3874291181564331
Validation loss = 0.3841899633407593
Validation loss = 0.3819422423839569
Validation loss = 0.3805476427078247
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3689820170402527
Validation loss = 0.3703571557998657
Validation loss = 0.37735992670059204
Validation loss = 0.36862966418266296
Validation loss = 0.37168389558792114
Validation loss = 0.3765954077243805
Validation loss = 0.3777819871902466
Validation loss = 0.37664559483528137
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3681564927101135
Validation loss = 0.37049779295921326
Validation loss = 0.3734423816204071
Validation loss = 0.36698830127716064
Validation loss = 0.3684690296649933
Validation loss = 0.36767107248306274
Validation loss = 0.37176433205604553
Validation loss = 0.37114018201828003
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3767491579055786
Validation loss = 0.37756696343421936
Validation loss = 0.3723868429660797
Validation loss = 0.3786565065383911
Validation loss = 0.37861981987953186
Validation loss = 0.3790794610977173
Validation loss = 0.3810347318649292
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.37584438920021057
Validation loss = 0.37455224990844727
Validation loss = 0.3851225972175598
Validation loss = 0.37639594078063965
Validation loss = 0.37522655725479126
Validation loss = 0.3758614957332611
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2028985507246377
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20216606498194944
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20503597122302158
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20430107526881722
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20357142857142857
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20284697508896798
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20212765957446807
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20141342756183744
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2007042253521127
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1993006993006993
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1986062717770035
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19791666666666666
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1972318339100346
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19655172413793104
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1958762886597938
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1952054794520548
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1945392491467577
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19387755102040816
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19322033898305085
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19256756756756757
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1919191919191919
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1912751677852349
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19063545150501673
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.114   |
| Iteration     | 10       |
| MaximumReturn | -0.0225  |
| MinimumReturn | -0.448   |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.38739532232284546
Validation loss = 0.390686571598053
Validation loss = 0.38434022665023804
Validation loss = 0.3822941780090332
Validation loss = 0.3782877027988434
Validation loss = 0.38610348105430603
Validation loss = 0.39584583044052124
Validation loss = 0.38677653670310974
Validation loss = 0.3894450068473816
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.38194942474365234
Validation loss = 0.3797237277030945
Validation loss = 0.3781232237815857
Validation loss = 0.37656888365745544
Validation loss = 0.3812434673309326
Validation loss = 0.374420166015625
Validation loss = 0.37889742851257324
Validation loss = 0.38075417280197144
Validation loss = 0.3828878104686737
Validation loss = 0.3853411078453064
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3777563273906708
Validation loss = 0.3745284378528595
Validation loss = 0.37463706731796265
Validation loss = 0.37580445408821106
Validation loss = 0.37590163946151733
Validation loss = 0.3748275935649872
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.38305848836898804
Validation loss = 0.37781548500061035
Validation loss = 0.37678414583206177
Validation loss = 0.38515904545783997
Validation loss = 0.3776501715183258
Validation loss = 0.3830210864543915
Validation loss = 0.38366156816482544
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3774634599685669
Validation loss = 0.37979090213775635
Validation loss = 0.3785071074962616
Validation loss = 0.3802999258041382
Validation loss = 0.38097113370895386
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1893687707641196
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18874172185430463
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18811881188118812
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1875
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18688524590163935
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18627450980392157
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18566775244299674
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18506493506493507
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18446601941747573
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18387096774193548
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1832797427652733
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18269230769230768
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18210862619808307
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18152866242038215
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18412698412698414
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18354430379746836
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1829652996845426
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18238993710691823
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18181818181818182
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18125
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1806853582554517
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18012422360248448
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17956656346749225
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17901234567901234
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17846153846153845
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.1     |
| Iteration     | 11       |
| MaximumReturn | -0.0441  |
| MinimumReturn | -0.378   |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.39614954590797424
Validation loss = 0.3896130323410034
Validation loss = 0.3953952193260193
Validation loss = 0.3926054835319519
Validation loss = 0.3976583480834961
Validation loss = 0.3990798592567444
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.38463348150253296
Validation loss = 0.38488471508026123
Validation loss = 0.38358965516090393
Validation loss = 0.3868570327758789
Validation loss = 0.3826196491718292
Validation loss = 0.3907983899116516
Validation loss = 0.38781148195266724
Validation loss = 0.3889003396034241
Validation loss = 0.3899332880973816
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.37585878372192383
Validation loss = 0.3739493191242218
Validation loss = 0.37620267271995544
Validation loss = 0.38022151589393616
Validation loss = 0.38407906889915466
Validation loss = 0.37757331132888794
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3830856382846832
Validation loss = 0.3823789060115814
Validation loss = 0.3776501417160034
Validation loss = 0.38768237829208374
Validation loss = 0.3888337016105652
Validation loss = 0.38890787959098816
Validation loss = 0.3856067657470703
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.37773945927619934
Validation loss = 0.3807903826236725
Validation loss = 0.3800000548362732
Validation loss = 0.3800834119319916
Validation loss = 0.3835674226284027
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17791411042944785
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17737003058103976
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17682926829268292
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1762917933130699
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17575757575757575
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17522658610271905
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1746987951807229
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17717717717717718
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17664670658682635
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1761194029850746
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17559523809523808
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17507418397626112
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17751479289940827
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17699115044247787
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17647058823529413
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17595307917888564
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17543859649122806
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1749271137026239
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1744186046511628
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17391304347826086
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17341040462427745
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1729106628242075
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1724137931034483
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17191977077363896
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17142857142857143
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.08    |
| Iteration     | 12       |
| MaximumReturn | -0.0679  |
| MinimumReturn | -15.3    |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3911789655685425
Validation loss = 0.40276554226875305
Validation loss = 0.3978477418422699
Validation loss = 0.401936799287796
Validation loss = 0.39996060729026794
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.38848885893821716
Validation loss = 0.38693735003471375
Validation loss = 0.3949434459209442
Validation loss = 0.38735267519950867
Validation loss = 0.3901804983615875
Validation loss = 0.389818012714386
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.37981241941452026
Validation loss = 0.38096103072166443
Validation loss = 0.3814758062362671
Validation loss = 0.38374367356300354
Validation loss = 0.38189542293548584
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.39976128935813904
Validation loss = 0.383148193359375
Validation loss = 0.38258102536201477
Validation loss = 0.3906114399433136
Validation loss = 0.3918864130973816
Validation loss = 0.39114415645599365
Validation loss = 0.3905688226222992
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3772370517253876
Validation loss = 0.3766806423664093
Validation loss = 0.3852597177028656
Validation loss = 0.38325145840644836
Validation loss = 0.3836410343647003
Validation loss = 0.38812944293022156
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17094017094017094
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17045454545454544
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17280453257790368
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17231638418079095
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17183098591549295
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17134831460674158
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17086834733893558
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17039106145251395
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17270194986072424
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17222222222222222
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1745152354570637
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17403314917127072
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17355371900826447
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17307692307692307
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1726027397260274
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1721311475409836
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17166212534059946
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17119565217391305
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17344173441734417
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17297297297297298
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1725067385444744
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17204301075268819
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17158176943699732
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1711229946524064
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17066666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.937   |
| Iteration     | 13       |
| MaximumReturn | -0.0237  |
| MinimumReturn | -12.1    |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4025534391403198
Validation loss = 0.39729705452919006
Validation loss = 0.3994457721710205
Validation loss = 0.40200042724609375
Validation loss = 0.40134358406066895
Validation loss = 0.40500593185424805
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3944530189037323
Validation loss = 0.3887268006801605
Validation loss = 0.3894056975841522
Validation loss = 0.3924189507961273
Validation loss = 0.3988420069217682
Validation loss = 0.3953390121459961
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3815670907497406
Validation loss = 0.38351181149482727
Validation loss = 0.383992999792099
Validation loss = 0.38511791825294495
Validation loss = 0.39388135075569153
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.39251959323883057
Validation loss = 0.396138072013855
Validation loss = 0.3924407660961151
Validation loss = 0.40361738204956055
Validation loss = 0.3901067078113556
Validation loss = 0.3975922167301178
Validation loss = 0.4003276526927948
Validation loss = 0.40663352608680725
Validation loss = 0.4029351472854614
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.38266921043395996
Validation loss = 0.38145771622657776
Validation loss = 0.3865021765232086
Validation loss = 0.38853946328163147
Validation loss = 0.39325228333473206
Validation loss = 0.39202919602394104
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1702127659574468
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16976127320954906
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.1746031746031746
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1741424802110818
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1736842105263158
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1732283464566929
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17277486910994763
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.1801566579634465
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1796875
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18181818181818182
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18134715025906736
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18087855297157623
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18041237113402062
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17994858611825193
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1794871794871795
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1815856777493606
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18112244897959184
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1806615776081425
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1802030456852792
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17974683544303796
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17929292929292928
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17884130982367757
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17839195979899497
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17794486215538846
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1775
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.22    |
| Iteration     | 14       |
| MaximumReturn | -0.0447  |
| MinimumReturn | -45.8    |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.39866936206817627
Validation loss = 0.4012088477611542
Validation loss = 0.4011983573436737
Validation loss = 0.40472501516342163
Validation loss = 0.40865084528923035
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.39253732562065125
Validation loss = 0.39721110463142395
Validation loss = 0.3973906636238098
Validation loss = 0.3953830897808075
Validation loss = 0.40223249793052673
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3888484537601471
Validation loss = 0.3832997679710388
Validation loss = 0.38560670614242554
Validation loss = 0.3858701288700104
Validation loss = 0.390876829624176
Validation loss = 0.38949424028396606
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4038983881473541
Validation loss = 0.3983156383037567
Validation loss = 0.40385901927948
Validation loss = 0.406019389629364
Validation loss = 0.4034506678581238
Validation loss = 0.40680795907974243
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3861185312271118
Validation loss = 0.38506439328193665
Validation loss = 0.3958401083946228
Validation loss = 0.39140450954437256
Validation loss = 0.3938903510570526
Validation loss = 0.39605462551116943
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1770573566084788
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17661691542288557
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1761786600496278
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17574257425742573
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17530864197530865
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1748768472906404
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17444717444717445
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17401960784313725
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17359413202933985
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.17804878048780487
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17761557177615572
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17718446601941748
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17675544794188863
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17632850241545894
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17590361445783131
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17548076923076922
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1750599520383693
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17464114832535885
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17422434367541767
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1738095238095238
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17339667458432304
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17298578199052134
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17257683215130024
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1721698113207547
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17176470588235293
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.5     |
| Iteration     | 15       |
| MaximumReturn | -0.0518  |
| MinimumReturn | -34.3    |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.40055543184280396
Validation loss = 0.4022528827190399
Validation loss = 0.40809541940689087
Validation loss = 0.40742114186286926
Validation loss = 0.41259267926216125
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.39655211567878723
Validation loss = 0.39656156301498413
Validation loss = 0.39693233370780945
Validation loss = 0.39575281739234924
Validation loss = 0.4032396972179413
Validation loss = 0.40097230672836304
Validation loss = 0.40879663825035095
Validation loss = 0.407665491104126
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.38349685072898865
Validation loss = 0.38944360613822937
Validation loss = 0.3885008692741394
Validation loss = 0.39540567994117737
Validation loss = 0.3919842541217804
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.40079209208488464
Validation loss = 0.40389344096183777
Validation loss = 0.4027915298938751
Validation loss = 0.4095746874809265
Validation loss = 0.4034160077571869
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.39012226462364197
Validation loss = 0.39280083775520325
Validation loss = 0.3894818425178528
Validation loss = 0.3949699401855469
Validation loss = 0.39571815729141235
Validation loss = 0.397173136472702
Validation loss = 0.4002806544303894
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17136150234741784
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17096018735362997
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1705607476635514
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17016317016317017
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1697674418604651
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16937354988399073
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16898148148148148
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16859122401847576
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16820276497695852
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.167816091954023
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16743119266055045
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16704805491990846
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16666666666666666
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1662870159453303
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16590909090909092
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1655328798185941
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16515837104072398
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16478555304740405
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16441441441441443
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16404494382022472
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16591928251121077
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16554809843400448
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16517857142857142
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16481069042316257
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16444444444444445
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.13    |
| Iteration     | 16       |
| MaximumReturn | -0.0452  |
| MinimumReturn | -0.382   |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4046986401081085
Validation loss = 0.4061334431171417
Validation loss = 0.41064709424972534
Validation loss = 0.4180682897567749
Validation loss = 0.41322141885757446
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3994441032409668
Validation loss = 0.405317485332489
Validation loss = 0.40722718834877014
Validation loss = 0.404755562543869
Validation loss = 0.4129024147987366
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3894580900669098
Validation loss = 0.39146357774734497
Validation loss = 0.39138877391815186
Validation loss = 0.3936656713485718
Validation loss = 0.3957604169845581
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4028339684009552
Validation loss = 0.4026661813259125
Validation loss = 0.4104599952697754
Validation loss = 0.4132140278816223
Validation loss = 0.41732025146484375
Validation loss = 0.41404685378074646
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4004739820957184
Validation loss = 0.40362459421157837
Validation loss = 0.40190762281417847
Validation loss = 0.4022662937641144
Validation loss = 0.40736180543899536
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.164079822616408
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16371681415929204
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16335540838852097
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16299559471365638
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16263736263736264
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16228070175438597
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16192560175054704
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1615720524017467
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16122004357298475
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1608695652173913
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16052060737527116
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16017316017316016
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15982721382289417
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15948275862068967
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15913978494623657
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15879828326180256
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15845824411134904
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1581196581196581
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15778251599147122
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1574468085106383
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15711252653927812
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15677966101694915
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15644820295983086
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15611814345991562
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15578947368421053
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.142   |
| Iteration     | 17       |
| MaximumReturn | -0.0619  |
| MinimumReturn | -0.247   |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.412956565618515
Validation loss = 0.4159082770347595
Validation loss = 0.4172070622444153
Validation loss = 0.4175284802913666
Validation loss = 0.4202064275741577
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4102446138858795
Validation loss = 0.41048020124435425
Validation loss = 0.4135337471961975
Validation loss = 0.41581055521965027
Validation loss = 0.41768309473991394
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3958379626274109
Validation loss = 0.39919909834861755
Validation loss = 0.39560747146606445
Validation loss = 0.3998657166957855
Validation loss = 0.39918169379234314
Validation loss = 0.40734508633613586
Validation loss = 0.4023212790489197
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.41205766797065735
Validation loss = 0.4158196449279785
Validation loss = 0.4115220010280609
Validation loss = 0.41689199209213257
Validation loss = 0.4235234260559082
Validation loss = 0.42251718044281006
Validation loss = 0.42281559109687805
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.39758822321891785
Validation loss = 0.4091096818447113
Validation loss = 0.40442076325416565
Validation loss = 0.40567511320114136
Validation loss = 0.40684306621551514
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15546218487394958
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15723270440251572
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15690376569037656
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15657620041753653
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15625
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15592515592515593
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15767634854771784
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15734989648033126
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15702479338842976
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15670103092783505
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15637860082304528
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15811088295687886
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15778688524590165
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1574642126789366
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15714285714285714
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15682281059063136
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1565040650406504
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15618661257606492
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15587044534412955
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15757575757575756
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.16129032258064516
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16096579476861167
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1606425702811245
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16032064128256512
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.06    |
| Iteration     | 18       |
| MaximumReturn | -0.0611  |
| MinimumReturn | -25.6    |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4160725176334381
Validation loss = 0.41581258177757263
Validation loss = 0.42054328322410583
Validation loss = 0.421099454164505
Validation loss = 0.4247249960899353
Validation loss = 0.4333916902542114
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.410052627325058
Validation loss = 0.4140169322490692
Validation loss = 0.4130052328109741
Validation loss = 0.41607534885406494
Validation loss = 0.42172709107398987
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3990214169025421
Validation loss = 0.40755507349967957
Validation loss = 0.40915998816490173
Validation loss = 0.404757022857666
Validation loss = 0.41335806250572205
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4208533763885498
Validation loss = 0.418864369392395
Validation loss = 0.4238360524177551
Validation loss = 0.4229414463043213
Validation loss = 0.4289824366569519
Validation loss = 0.4239260256290436
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.40423035621643066
Validation loss = 0.4082624316215515
Validation loss = 0.40890616178512573
Validation loss = 0.40929359197616577
Validation loss = 0.41467952728271484
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1596806387225549
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1593625498007968
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15904572564612326
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16071428571428573
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1603960396039604
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1600790513833992
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15976331360946747
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16141732283464566
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16306483300589392
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1627450980392157
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1643835616438356
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1640625
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16569200779727095
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16536964980544747
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1650485436893204
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16472868217054262
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1644100580270793
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1640926640926641
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16377649325626203
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16346153846153846
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16314779270633398
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.16666666666666666
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16634799235181644
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16603053435114504
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1657142857142857
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.16    |
| Iteration     | 19       |
| MaximumReturn | -0.0464  |
| MinimumReturn | -66.3    |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.41842931509017944
Validation loss = 0.4201309382915497
Validation loss = 0.43350905179977417
Validation loss = 0.4237726926803589
Validation loss = 0.4313535988330841
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4159264862537384
Validation loss = 0.418610155582428
Validation loss = 0.4221665859222412
Validation loss = 0.42649561166763306
Validation loss = 0.42833590507507324
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4075642228126526
Validation loss = 0.4108600318431854
Validation loss = 0.4166651666164398
Validation loss = 0.40969979763031006
Validation loss = 0.41438594460487366
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4237283170223236
Validation loss = 0.42248740792274475
Validation loss = 0.42465662956237793
Validation loss = 0.4437085688114166
Validation loss = 0.43724989891052246
Validation loss = 0.44144803285598755
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.40946516394615173
Validation loss = 0.4173092246055603
Validation loss = 0.4208791255950928
Validation loss = 0.4227798879146576
Validation loss = 0.4188770055770874
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16539923954372623
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1650853889943074
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16477272727272727
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16446124763705103
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1660377358490566
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1657250470809793
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16541353383458646
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1651031894934334
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1647940074906367
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16448598130841122
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16417910447761194
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16387337057728119
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16356877323420074
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16326530612244897
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16296296296296298
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16266173752310537
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16420664206642066
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16390423572744015
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.16911764705882354
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1688073394495413
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1684981684981685
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16819012797074953
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1678832116788321
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16757741347905283
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16727272727272727
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.471   |
| Iteration     | 20       |
| MaximumReturn | -0.106   |
| MinimumReturn | -3.77    |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.43215370178222656
Validation loss = 0.4331870377063751
Validation loss = 0.441528856754303
Validation loss = 0.4393782913684845
Validation loss = 0.43904468417167664
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.42100128531455994
Validation loss = 0.42426466941833496
Validation loss = 0.4263726472854614
Validation loss = 0.43011385202407837
Validation loss = 0.43336158990859985
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4166407585144043
Validation loss = 0.41290903091430664
Validation loss = 0.410562127828598
Validation loss = 0.41865554451942444
Validation loss = 0.4198199510574341
Validation loss = 0.42524534463882446
Validation loss = 0.43127119541168213
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4424718916416168
Validation loss = 0.4348241984844208
Validation loss = 0.4377140700817108
Validation loss = 0.4368210732936859
Validation loss = 0.4373999238014221
Validation loss = 0.4473244249820709
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.41875553131103516
Validation loss = 0.42058369517326355
Validation loss = 0.4224473237991333
Validation loss = 0.42018190026283264
Validation loss = 0.42611172795295715
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16696914700544466
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16666666666666666
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16636528028933092
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16606498194945848
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16576576576576577
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16546762589928057
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1651705565529623
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16487455197132617
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16636851520572452
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16607142857142856
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1657754010695187
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.1708185053380783
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1705150976909414
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1702127659574468
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16991150442477876
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1696113074204947
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1693121693121693
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16901408450704225
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1687170474516696
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16842105263157894
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1681260945709282
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16783216783216784
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16753926701570682
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1672473867595819
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16695652173913045
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.886   |
| Iteration     | 21       |
| MaximumReturn | -0.0307  |
| MinimumReturn | -11.8    |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4362300634384155
Validation loss = 0.43582212924957275
Validation loss = 0.4407196044921875
Validation loss = 0.43983855843544006
Validation loss = 0.4437282383441925
Validation loss = 0.4445967972278595
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4347982406616211
Validation loss = 0.42738717794418335
Validation loss = 0.43196362257003784
Validation loss = 0.4309059977531433
Validation loss = 0.4355809986591339
Validation loss = 0.4398970901966095
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4258298873901367
Validation loss = 0.4308205246925354
Validation loss = 0.433919757604599
Validation loss = 0.42799779772758484
Validation loss = 0.43357762694358826
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4410165846347809
Validation loss = 0.44071900844573975
Validation loss = 0.4588196873664856
Validation loss = 0.45143866539001465
Validation loss = 0.45022252202033997
Validation loss = 0.45412343740463257
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4226880371570587
Validation loss = 0.42406678199768066
Validation loss = 0.4219787120819092
Validation loss = 0.42990580201148987
Validation loss = 0.4296416640281677
Validation loss = 0.43360635638237
Validation loss = 0.4429868757724762
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16666666666666666
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1663778162911612
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16608996539792387
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16580310880829016
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16551724137931034
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16523235800344235
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16494845360824742
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1646655231560892
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1643835616438356
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1641025641025641
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16382252559726962
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1635434412265758
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16326530612244897
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16298811544991512
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16271186440677965
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16243654822335024
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16216216216216217
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16188870151770657
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16161616161616163
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16134453781512606
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1610738255033557
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16080402010050251
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1605351170568562
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16026711185308848
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.254   |
| Iteration     | 22       |
| MaximumReturn | -0.0764  |
| MinimumReturn | -0.892   |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4459143579006195
Validation loss = 0.44951701164245605
Validation loss = 0.4496501386165619
Validation loss = 0.4574516713619232
Validation loss = 0.4568006992340088
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4392422139644623
Validation loss = 0.4425501823425293
Validation loss = 0.4447513222694397
Validation loss = 0.45228061079978943
Validation loss = 0.4478274881839752
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4336507320404053
Validation loss = 0.4386819303035736
Validation loss = 0.4369569420814514
Validation loss = 0.43964657187461853
Validation loss = 0.44126468896865845
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4465406835079193
Validation loss = 0.4587187170982361
Validation loss = 0.4546542167663574
Validation loss = 0.4569229483604431
Validation loss = 0.4604280889034271
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4375407099723816
Validation loss = 0.4358249604701996
Validation loss = 0.4401496350765228
Validation loss = 0.4469049572944641
Validation loss = 0.44474801421165466
Validation loss = 0.4507318139076233
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15973377703826955
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15946843853820597
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15920398009950248
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15894039735099338
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15867768595041323
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15841584158415842
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15815485996705106
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15789473684210525
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15763546798029557
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15737704918032788
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15711947626841244
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1568627450980392
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1566068515497553
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1563517915309446
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15609756097560976
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15584415584415584
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15559157212317667
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1553398058252427
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15508885298869143
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15483870967741936
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15458937198067632
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15434083601286175
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15409309791332262
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15384615384615385
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1536
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.261   |
| Iteration     | 23       |
| MaximumReturn | -0.0774  |
| MinimumReturn | -1.11    |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4489344656467438
Validation loss = 0.4512732923030853
Validation loss = 0.4542105197906494
Validation loss = 0.4525997042655945
Validation loss = 0.46114176511764526
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4448127746582031
Validation loss = 0.4470067620277405
Validation loss = 0.45171722769737244
Validation loss = 0.4540931284427643
Validation loss = 0.4538451135158539
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4377250671386719
Validation loss = 0.4384731352329254
Validation loss = 0.4401276111602783
Validation loss = 0.4469999670982361
Validation loss = 0.4501218795776367
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.45712652802467346
Validation loss = 0.4652037024497986
Validation loss = 0.45866626501083374
Validation loss = 0.47301775217056274
Validation loss = 0.46587657928466797
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4452691674232483
Validation loss = 0.44695883989334106
Validation loss = 0.4507187008857727
Validation loss = 0.45183438062667847
Validation loss = 0.45253896713256836
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15335463258785942
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15311004784688995
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15286624203821655
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15262321144674085
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15396825396825398
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1537242472266244
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15348101265822786
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15323854660347552
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1529968454258675
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15275590551181104
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15251572327044025
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15384615384615385
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1536050156739812
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15336463223787167
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.153125
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15288611544461778
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1526479750778816
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.15707620528771385
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15683229813664595
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15658914728682172
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1563467492260062
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1561051004636785
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1574074074074074
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15716486902927582
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15692307692307692
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.24    |
| Iteration     | 24       |
| MaximumReturn | -0.117   |
| MinimumReturn | -20.7    |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4556354880332947
Validation loss = 0.45290350914001465
Validation loss = 0.4576447010040283
Validation loss = 0.4636521637439728
Validation loss = 0.46017715334892273
Validation loss = 0.46857210993766785
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.45113182067871094
Validation loss = 0.45693066716194153
Validation loss = 0.457397997379303
Validation loss = 0.4630533754825592
Validation loss = 0.46512743830680847
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.44573694467544556
Validation loss = 0.45119789242744446
Validation loss = 0.44703537225723267
Validation loss = 0.45400142669677734
Validation loss = 0.4634856581687927
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4683573246002197
Validation loss = 0.46395206451416016
Validation loss = 0.46258944272994995
Validation loss = 0.4693591594696045
Validation loss = 0.4706549346446991
Validation loss = 0.47275620698928833
Validation loss = 0.4757034182548523
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4541216492652893
Validation loss = 0.4467260241508484
Validation loss = 0.45853960514068604
Validation loss = 0.45509302616119385
Validation loss = 0.46039894223213196
Validation loss = 0.46505942940711975
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15668202764976957
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15644171779141106
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1562021439509954
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1559633027522936
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15572519083969466
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15548780487804878
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1552511415525114
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15501519756838905
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15477996965098634
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15454545454545454
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15431164901664146
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1540785498489426
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15384615384615385
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1536144578313253
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15338345864661654
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15315315315315314
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15292353823088456
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15269461077844312
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15246636771300448
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15223880597014924
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15201192250372578
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15178571428571427
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1515601783060921
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1513353115727003
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1511111111111111
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.629   |
| Iteration     | 25       |
| MaximumReturn | -0.14    |
| MinimumReturn | -1.84    |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.46759557723999023
Validation loss = 0.4669859707355499
Validation loss = 0.4644259512424469
Validation loss = 0.4700917899608612
Validation loss = 0.4853915870189667
Validation loss = 0.47605839371681213
Validation loss = 0.47509682178497314
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4586523175239563
Validation loss = 0.46214959025382996
Validation loss = 0.461412250995636
Validation loss = 0.46478042006492615
Validation loss = 0.4653340280056
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.45962104201316833
Validation loss = 0.4518662393093109
Validation loss = 0.4508194923400879
Validation loss = 0.458670437335968
Validation loss = 0.460335373878479
Validation loss = 0.46674785017967224
Validation loss = 0.46871888637542725
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.47064048051834106
Validation loss = 0.46977493166923523
Validation loss = 0.48789167404174805
Validation loss = 0.4834217131137848
Validation loss = 0.478352814912796
Validation loss = 0.4861048460006714
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.460849404335022
Validation loss = 0.45790061354637146
Validation loss = 0.45743560791015625
Validation loss = 0.4645709693431854
Validation loss = 0.47081658244132996
Validation loss = 0.4720904529094696
Validation loss = 0.4731083810329437
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15088757396449703
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15066469719350073
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1504424778761062
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15022091310751104
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14977973568281938
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1495601173020528
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1493411420204978
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.15204678362573099
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15328467153284672
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15306122448979592
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15283842794759825
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15261627906976744
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15239477503628446
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1536231884057971
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15340086830680175
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1531791907514451
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15295815295815296
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15273775216138327
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1539568345323741
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15373563218390804
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15351506456241032
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15329512893982808
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1530758226037196
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15285714285714286
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.567   |
| Iteration     | 26       |
| MaximumReturn | -0.126   |
| MinimumReturn | -1.89    |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4798526465892792
Validation loss = 0.48305097222328186
Validation loss = 0.48078975081443787
Validation loss = 0.4753887355327606
Validation loss = 0.48636454343795776
Validation loss = 0.4822492301464081
Validation loss = 0.48661842942237854
Validation loss = 0.48497241735458374
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4632640480995178
Validation loss = 0.46712028980255127
Validation loss = 0.46919703483581543
Validation loss = 0.4703316390514374
Validation loss = 0.47499626874923706
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.46317175030708313
Validation loss = 0.4659841060638428
Validation loss = 0.46779221296310425
Validation loss = 0.4752724766731262
Validation loss = 0.47592854499816895
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.48275527358055115
Validation loss = 0.4854452311992645
Validation loss = 0.48542627692222595
Validation loss = 0.4936293363571167
Validation loss = 0.49083366990089417
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.47142016887664795
Validation loss = 0.4689263105392456
Validation loss = 0.476656049489975
Validation loss = 0.4826669692993164
Validation loss = 0.4899291694164276
Validation loss = 0.49224692583084106
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15263908701854492
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15242165242165243
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15220483641536273
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15198863636363635
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15177304964539007
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15155807365439095
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15134370579915135
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15112994350282485
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15091678420310295
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15070422535211267
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15049226441631505
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1502808988764045
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15007012622720897
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14985994397759103
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14965034965034965
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1494413407821229
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1492329149232915
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.149025069637883
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14881780250347706
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1486111111111111
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14840499306518723
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1481994459833795
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1479944674965422
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1477900552486188
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14758620689655172
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.67    |
| Iteration     | 27       |
| MaximumReturn | -0.0506  |
| MinimumReturn | -60.5    |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4903357923030853
Validation loss = 0.4868655204772949
Validation loss = 0.49162042140960693
Validation loss = 0.4936344623565674
Validation loss = 0.4974386394023895
Validation loss = 0.4976961612701416
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.48263323307037354
Validation loss = 0.4696022570133209
Validation loss = 0.47926631569862366
Validation loss = 0.48335757851600647
Validation loss = 0.4832221567630768
Validation loss = 0.48089146614074707
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4757598340511322
Validation loss = 0.4797368049621582
Validation loss = 0.47749194502830505
Validation loss = 0.4819856584072113
Validation loss = 0.4879538118839264
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.48942771553993225
Validation loss = 0.4907342195510864
Validation loss = 0.4914863109588623
Validation loss = 0.4907503128051758
Validation loss = 0.4981210231781006
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4881185293197632
Validation loss = 0.4818912744522095
Validation loss = 0.4887579381465912
Validation loss = 0.49091991782188416
Validation loss = 0.491375207901001
Validation loss = 0.4924803674221039
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.15013774104683195
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1499312242090784
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14972527472527472
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.149519890260631
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14931506849315068
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1491108071135431
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1489071038251366
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.15143246930422918
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15122615803814715
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1510204081632653
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15081521739130435
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15061058344640435
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15176151761517614
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15155615696887687
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15135135135135136
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15114709851551958
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1509433962264151
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.15746971736204576
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15725806451612903
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.1597315436241611
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15951742627345844
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15930388219544847
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1590909090909091
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1588785046728972
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15866666666666668
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.484   |
| Iteration     | 28       |
| MaximumReturn | -0.0498  |
| MinimumReturn | -2.37    |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49258899688720703
Validation loss = 0.5061489343643188
Validation loss = 0.4960891604423523
Validation loss = 0.5025025010108948
Validation loss = 0.5038700699806213
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4843209981918335
Validation loss = 0.48162734508514404
Validation loss = 0.4808841049671173
Validation loss = 0.48443880677223206
Validation loss = 0.49181485176086426
Validation loss = 0.499059796333313
Validation loss = 0.4947234094142914
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4800630807876587
Validation loss = 0.4807504713535309
Validation loss = 0.4881117343902588
Validation loss = 0.48832929134368896
Validation loss = 0.4875292479991913
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5000000596046448
Validation loss = 0.49494469165802
Validation loss = 0.4954923987388611
Validation loss = 0.4994578957557678
Validation loss = 0.5023539066314697
Validation loss = 0.5022367835044861
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.48566359281539917
Validation loss = 0.4879101514816284
Validation loss = 0.48228999972343445
Validation loss = 0.49641257524490356
Validation loss = 0.49888771772384644
Validation loss = 0.49904072284698486
Validation loss = 0.49829620122909546
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1584553928095872
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15824468085106383
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1593625498007968
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16047745358090185
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16026490066225166
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16005291005291006
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16116248348745046
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16094986807387862
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16073781291172595
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16052631578947368
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.16294349540078842
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.16666666666666666
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16644823066841416
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1662303664921466
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.16993464052287582
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16971279373368145
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17079530638852672
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17057291666666666
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17035110533159947
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17012987012987013
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16990920881971466
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17098445595854922
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17205692108667528
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.17571059431524547
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17548387096774193
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.82    |
| Iteration     | 29       |
| MaximumReturn | -0.0615  |
| MinimumReturn | -56.7    |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4943467974662781
Validation loss = 0.4977075159549713
Validation loss = 0.5014004111289978
Validation loss = 0.5034818053245544
Validation loss = 0.5012862682342529
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.49008139967918396
Validation loss = 0.48540133237838745
Validation loss = 0.494127094745636
Validation loss = 0.49138760566711426
Validation loss = 0.4941636025905609
Validation loss = 0.4964601397514343
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4926830232143402
Validation loss = 0.4918796420097351
Validation loss = 0.48960641026496887
Validation loss = 0.49720779061317444
Validation loss = 0.49796128273010254
Validation loss = 0.49529537558555603
Validation loss = 0.5000948905944824
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4975011348724365
Validation loss = 0.5052626729011536
Validation loss = 0.5040926337242126
Validation loss = 0.5047318935394287
Validation loss = 0.5060151815414429
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4944974184036255
Validation loss = 0.4990546703338623
Validation loss = 0.5055326223373413
Validation loss = 0.4960763156414032
Validation loss = 0.4996967613697052
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17654639175257733
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17631917631917632
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17609254498714652
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17586649550706032
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17692307692307693
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17797695262483995
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.18286445012787725
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1826309067688378
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18239795918367346
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18343949044585986
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.183206106870229
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18297331639135958
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18274111675126903
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18250950570342206
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18227848101265823
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1820480404551201
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1830808080808081
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1828499369482976
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18387909319899245
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18364779874213835
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.18592964824120603
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18569636135508155
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18546365914786966
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18523153942428036
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.1875
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.7    |
| Iteration     | 30       |
| MaximumReturn | -0.0978  |
| MinimumReturn | -72.2    |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5037710666656494
Validation loss = 0.5046745538711548
Validation loss = 0.5066202878952026
Validation loss = 0.5087965130805969
Validation loss = 0.5106011033058167
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5005300045013428
Validation loss = 0.49876734614372253
Validation loss = 0.49881711602211
Validation loss = 0.5046467781066895
Validation loss = 0.5084063410758972
Validation loss = 0.5131694674491882
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4974246323108673
Validation loss = 0.4954667389392853
Validation loss = 0.5054389238357544
Validation loss = 0.5005868673324585
Validation loss = 0.4998483955860138
Validation loss = 0.5110588073730469
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5080028772354126
Validation loss = 0.5029681324958801
Validation loss = 0.5103164911270142
Validation loss = 0.5106873512268066
Validation loss = 0.5077158808708191
Validation loss = 0.5128527283668518
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5017043352127075
Validation loss = 0.5012223124504089
Validation loss = 0.5060915350914001
Validation loss = 0.5065877437591553
Validation loss = 0.5111047029495239
Validation loss = 0.512498140335083
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.18976279650436953
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.19077306733167082
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19053549190535493
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.19776119402985073
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19751552795031055
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19727047146401985
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1982651796778191
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2004950495049505
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20148331273176762
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.2074074074074074
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20715166461159062
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2105911330049261
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2115621156211562
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.214987714987715
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2147239263803681
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.21691176470588236
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21664626682986537
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21638141809290953
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.21855921855921856
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21829268292682927
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2192448233861145
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22019464720194648
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21992709599027946
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2196601941747573
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2206060606060606
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -23      |
| Iteration     | 31       |
| MaximumReturn | -0.042   |
| MinimumReturn | -82.9    |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5056690573692322
Validation loss = 0.5092629194259644
Validation loss = 0.511413037776947
Validation loss = 0.5162608027458191
Validation loss = 0.5129014253616333
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5032020807266235
Validation loss = 0.5063107013702393
Validation loss = 0.5091028809547424
Validation loss = 0.5078950524330139
Validation loss = 0.5109716057777405
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.505510687828064
Validation loss = 0.5058028697967529
Validation loss = 0.5042619705200195
Validation loss = 0.5120165348052979
Validation loss = 0.5090115666389465
Validation loss = 0.512432336807251
Validation loss = 0.5113526582717896
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5139517784118652
Validation loss = 0.5062849521636963
Validation loss = 0.5106554627418518
Validation loss = 0.5136004686355591
Validation loss = 0.5159781575202942
Validation loss = 0.5138586759567261
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5087546110153198
Validation loss = 0.5078145265579224
Validation loss = 0.5110623240470886
Validation loss = 0.5151536464691162
Validation loss = 0.5158908367156982
Validation loss = 0.5147011876106262
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22033898305084745
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22007255139056833
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21980676328502416
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2195416164053076
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21927710843373494
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2190132370637786
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.22115384615384615
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22088835534213686
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22182254196642687
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2215568862275449
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22129186602870812
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22102747909199522
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.22673031026252982
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22646007151370678
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22738095238095238
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22711058263971462
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.23040380047505937
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23013048635824437
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22985781990521326
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2319526627218935
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.23522458628841608
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2361275088547816
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2358490566037736
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23674911660777384
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.24
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.11    |
| Iteration     | 32       |
| MaximumReturn | -0.0861  |
| MinimumReturn | -29.8    |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5157405138015747
Validation loss = 0.5095887780189514
Validation loss = 0.5128015279769897
Validation loss = 0.5165625214576721
Validation loss = 0.5179190635681152
Validation loss = 0.5150009989738464
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5128000974655151
Validation loss = 0.5078536868095398
Validation loss = 0.5127412676811218
Validation loss = 0.516873300075531
Validation loss = 0.5183249115943909
Validation loss = 0.5124982595443726
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.508351743221283
Validation loss = 0.5102106332778931
Validation loss = 0.5123717188835144
Validation loss = 0.5131390690803528
Validation loss = 0.511161744594574
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5159499645233154
Validation loss = 0.5210147500038147
Validation loss = 0.5163944363594055
Validation loss = 0.5169646143913269
Validation loss = 0.5269330143928528
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5153621435165405
Validation loss = 0.5132250785827637
Validation loss = 0.5189892649650574
Validation loss = 0.5200567245483398
Validation loss = 0.5237683057785034
Validation loss = 0.5265718698501587
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.24324324324324326
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24295774647887325
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24384525205158264
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24355971896955503
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24444444444444444
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2441588785046729
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24387397899649943
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24475524475524477
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24447031431897556
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2453488372093023
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2462253193960511
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2459396751740139
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.25028968713789107
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.25317919075144507
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.25519630484988454
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2549019607843137
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.25806451612903225
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25776754890678943
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.25862068965517243
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.2629161882893226
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2626146788990826
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.26575028636884307
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.26773455377574373
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.26857142857142857
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -37.5    |
| Iteration     | 33       |
| MaximumReturn | -0.142   |
| MinimumReturn | -125     |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5113136768341064
Validation loss = 0.5090893507003784
Validation loss = 0.5189685821533203
Validation loss = 0.5203373432159424
Validation loss = 0.5252935886383057
Validation loss = 0.5304034948348999
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5127710700035095
Validation loss = 0.5103031396865845
Validation loss = 0.5150402188301086
Validation loss = 0.5230884552001953
Validation loss = 0.516709566116333
Validation loss = 0.5167897343635559
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5131977200508118
Validation loss = 0.5150718092918396
Validation loss = 0.5150414109230042
Validation loss = 0.5168957710266113
Validation loss = 0.5188609957695007
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5142844915390015
Validation loss = 0.5232473611831665
Validation loss = 0.5197808146476746
Validation loss = 0.5230934023857117
Validation loss = 0.522681474685669
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5174136161804199
Validation loss = 0.5192517042160034
Validation loss = 0.5137107968330383
Validation loss = 0.5179568529129028
Validation loss = 0.5232447385787964
Validation loss = 0.523158848285675
Validation loss = 0.528805673122406
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2705479452054795
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2702394526795895
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2722095671981777
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.27417519908987487
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2761363636363636
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.27809307604994327
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2800453514739229
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2797281993204983
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.2839366515837104
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2858757062146893
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.28555304740406323
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2852311161217587
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.28716216216216217
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.29133858267716534
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2910112359550562
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.29180695847362514
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2914798206278027
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2933930571108623
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.29642058165548096
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.29832402234636873
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29799107142857145
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.29988851727982163
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.3040089086859688
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.303670745272525
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3055555555555556
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -129     |
| Iteration     | 34       |
| MaximumReturn | -68.3    |
| MinimumReturn | -161     |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5151232481002808
Validation loss = 0.5121216177940369
Validation loss = 0.514269232749939
Validation loss = 0.5214738249778748
Validation loss = 0.5199319124221802
Validation loss = 0.5244359374046326
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5068097710609436
Validation loss = 0.5153248310089111
Validation loss = 0.516928493976593
Validation loss = 0.5177697539329529
Validation loss = 0.5206835269927979
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5039141178131104
Validation loss = 0.5135260820388794
Validation loss = 0.5139362215995789
Validation loss = 0.5133722424507141
Validation loss = 0.5163357257843018
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5160316824913025
Validation loss = 0.5233057141304016
Validation loss = 0.5181862711906433
Validation loss = 0.5184513926506042
Validation loss = 0.5198085904121399
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5135157108306885
Validation loss = 0.5192240476608276
Validation loss = 0.5197914838790894
Validation loss = 0.5247738361358643
Validation loss = 0.521231472492218
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3052164261931188
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3048780487804878
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30454042081949056
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30420353982300885
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30386740331491713
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3057395143487859
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3054024255788313
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30506607929515417
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3047304730473047
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30439560439560437
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3040614709110867
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30372807017543857
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30339539978094193
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30306345733041573
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30273224043715846
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30240174672489084
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.302071973827699
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3017429193899782
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.30359085963003263
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3032608695652174
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30293159609120524
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30260303687635576
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30227518959913324
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30194805194805197
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3016216216216216
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -159     |
| Iteration     | 35       |
| MaximumReturn | -64.7    |
| MinimumReturn | -189     |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5034376382827759
Validation loss = 0.5152184367179871
Validation loss = 0.5151305794715881
Validation loss = 0.5194697976112366
Validation loss = 0.521509051322937
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5089675784111023
Validation loss = 0.5064811110496521
Validation loss = 0.5132828950881958
Validation loss = 0.5147982239723206
Validation loss = 0.5202274918556213
Validation loss = 0.5215176343917847
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5089284777641296
Validation loss = 0.5086449384689331
Validation loss = 0.5169569253921509
Validation loss = 0.5212643146514893
Validation loss = 0.5148595571517944
Validation loss = 0.5111767649650574
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5068126916885376
Validation loss = 0.5105258822441101
Validation loss = 0.5224640965461731
Validation loss = 0.5151063799858093
Validation loss = 0.5268299579620361
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5086429715156555
Validation loss = 0.512917697429657
Validation loss = 0.5154792666435242
Validation loss = 0.5203063488006592
Validation loss = 0.5286110639572144
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30129589632829373
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30097087378640774
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30064655172413796
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.302475780409042
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3021505376344086
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30182599355531686
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.30364806866952787
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3054662379421222
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.30728051391862954
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.306951871657754
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30662393162393164
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3062966915688367
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30597014925373134
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.30670926517571884
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30638297872340425
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30605738575983
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3057324840764331
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3054082714740191
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3072033898305085
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.30793650793650795
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30761099365750527
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.307286166842661
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3069620253164557
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3087460484720759
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30842105263157893
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -133     |
| Iteration     | 36       |
| MaximumReturn | -60.8    |
| MinimumReturn | -176     |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.500615656375885
Validation loss = 0.503439724445343
Validation loss = 0.5056569576263428
Validation loss = 0.5069518089294434
Validation loss = 0.5140025019645691
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4981589615345001
Validation loss = 0.5027961730957031
Validation loss = 0.5063374638557434
Validation loss = 0.5087748169898987
Validation loss = 0.511695921421051
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.495029091835022
Validation loss = 0.5014539361000061
Validation loss = 0.5040074586868286
Validation loss = 0.5173876881599426
Validation loss = 0.5111826062202454
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5043683052062988
Validation loss = 0.5044254064559937
Validation loss = 0.5066354274749756
Validation loss = 0.5112453103065491
Validation loss = 0.5152300596237183
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.502265453338623
Validation loss = 0.5047642588615417
Validation loss = 0.5102464556694031
Validation loss = 0.5103199481964111
Validation loss = 0.5132752060890198
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30809674027339645
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3077731092436975
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3074501573976915
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30712788259958074
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3089005235602094
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30857740585774057
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30825496342737724
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3079331941544885
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30761209593326383
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3072916666666667
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30697190426638915
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3076923076923077
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3073727933541018
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3070539419087137
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3067357512953368
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3064182194616977
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30610134436401243
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30578512396694213
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30546955624355004
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30515463917525776
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30484037075180226
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3045267489711934
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3062692702980473
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3080082135523614
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3076923076923077
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -145     |
| Iteration     | 37       |
| MaximumReturn | -96.8    |
| MinimumReturn | -185     |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5002606511116028
Validation loss = 0.5041426420211792
Validation loss = 0.5073251724243164
Validation loss = 0.5119438171386719
Validation loss = 0.5111268758773804
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5015134811401367
Validation loss = 0.5060597062110901
Validation loss = 0.5071475505828857
Validation loss = 0.5114762783050537
Validation loss = 0.5082134008407593
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5025098323822021
Validation loss = 0.5015943050384521
Validation loss = 0.5055524706840515
Validation loss = 0.5117990970611572
Validation loss = 0.5098043084144592
Validation loss = 0.5121778845787048
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5047944784164429
Validation loss = 0.5069082975387573
Validation loss = 0.5129008293151855
Validation loss = 0.5101916790008545
Validation loss = 0.5188730955123901
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5053438544273376
Validation loss = 0.5103628039360046
Validation loss = 0.5097814798355103
Validation loss = 0.5132430791854858
Validation loss = 0.5151332020759583
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.3114754098360656
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3111566018423746
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.310838445807771
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.313585291113381
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.31326530612244896
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3129459734964322
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.31262729124236255
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3143438453713123
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.31910569105691056
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.3238578680203046
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3235294117647059
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3232016210739615
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3228744939271255
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3225480283114257
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.32727272727272727
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32694248234106965
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32661290322580644
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32628398791540786
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32595573440643866
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3256281407035176
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3253012048192771
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.324974924774323
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.32665330661322645
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3263263263263263
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.329
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -100     |
| Iteration     | 38       |
| MaximumReturn | -7.18    |
| MinimumReturn | -184     |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5031582117080688
Validation loss = 0.5051246285438538
Validation loss = 0.5043105483055115
Validation loss = 0.5115346312522888
Validation loss = 0.5108957290649414
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.502959668636322
Validation loss = 0.5035783648490906
Validation loss = 0.5020663738250732
Validation loss = 0.5234188437461853
Validation loss = 0.5110023617744446
Validation loss = 0.5092732906341553
Validation loss = 0.5105656981468201
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5013775825500488
Validation loss = 0.5045458078384399
Validation loss = 0.5042927265167236
Validation loss = 0.5089026093482971
Validation loss = 0.5153181552886963
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5018579959869385
Validation loss = 0.5070522427558899
Validation loss = 0.5090371966362
Validation loss = 0.5084348917007446
Validation loss = 0.5113133192062378
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5023155212402344
Validation loss = 0.5044280290603638
Validation loss = 0.5081146955490112
Validation loss = 0.5087936520576477
Validation loss = 0.5045808553695679
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.32967032967032966
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3313373253493014
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.33100697906281157
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.33067729083665337
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.33034825870646767
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.33001988071570576
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32969215491559084
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32936507936507936
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3290386521308226
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3287128712871287
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3283877349159248
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32806324110671936
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32773938795656465
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.32840236686390534
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32807881773399017
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.327755905511811
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3274336283185841
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32711198428290766
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3267909715407262
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3264705882352941
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32615083251714005
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3258317025440313
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3255131964809384
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3251953125
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3248780487804878
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -134     |
| Iteration     | 39       |
| MaximumReturn | -69.7    |
| MinimumReturn | -172     |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5004134178161621
Validation loss = 0.5093920826911926
Validation loss = 0.5112302899360657
Validation loss = 0.5071074366569519
Validation loss = 0.5119547247886658
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5015535950660706
Validation loss = 0.5070649981498718
Validation loss = 0.5129131078720093
Validation loss = 0.5113556385040283
Validation loss = 0.5079777240753174
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5000455975532532
Validation loss = 0.5064786672592163
Validation loss = 0.5078023076057434
Validation loss = 0.5060690641403198
Validation loss = 0.512928307056427
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5074331164360046
Validation loss = 0.5076904892921448
Validation loss = 0.5103865265846252
Validation loss = 0.5131824016571045
Validation loss = 0.513742208480835
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5004004836082458
Validation loss = 0.5074223875999451
Validation loss = 0.5122344493865967
Validation loss = 0.5107596516609192
Validation loss = 0.5134781002998352
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.32748538011695905
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3291139240506329
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.3336575875486381
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.33430515063168126
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.33689320388349514
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.3394762366634336
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.34108527131782945
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3426911907066796
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.34332688588007737
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.3458937198067633
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3474903474903475
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3471552555448409
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3468208092485549
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.35129932627526467
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.35192307692307695
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3515850144092219
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3512476007677543
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.3537871524448706
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.35727969348659006
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3569377990430622
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3565965583173996
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.35721107927411655
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.35877862595419846
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.35843660629170637
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3580952380952381
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -64      |
| Iteration     | 40       |
| MaximumReturn | -0.295   |
| MinimumReturn | -137     |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5047307014465332
Validation loss = 0.5015827417373657
Validation loss = 0.5091177821159363
Validation loss = 0.5055726170539856
Validation loss = 0.5056753158569336
Validation loss = 0.5128298997879028
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5087416768074036
Validation loss = 0.5061042308807373
Validation loss = 0.5101243853569031
Validation loss = 0.5087546706199646
Validation loss = 0.5100566744804382
Validation loss = 0.5189831256866455
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5052932500839233
Validation loss = 0.5017357468605042
Validation loss = 0.5047768354415894
Validation loss = 0.5082307457923889
Validation loss = 0.5133911371231079
Validation loss = 0.5157873630523682
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5050570368766785
Validation loss = 0.5077533721923828
Validation loss = 0.5072748064994812
Validation loss = 0.512277364730835
Validation loss = 0.5173346400260925
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5064513683319092
Validation loss = 0.5086424946784973
Validation loss = 0.5112970471382141
Validation loss = 0.5101032257080078
Validation loss = 0.5114812254905701
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.36060894386298764
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.36216730038022815
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3627730294396961
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.36527514231499053
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.36492890995260663
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.36742424242424243
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3680227057710501
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3686200378071834
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.37016052880075545
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.37169811320754714
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37134778510838834
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3709981167608286
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.37064910630291625
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.3731203007518797
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.3755868544600939
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.3780487804878049
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3776944704779756
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.38108614232209737
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.38260056127221703
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.38317757009345793
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3828197945845005
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3824626865671642
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.38397017707362535
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.38733705772811916
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3869767441860465
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.2    |
| Iteration     | 41       |
| MaximumReturn | -0.114   |
| MinimumReturn | -76.2    |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5074557065963745
Validation loss = 0.5127476453781128
Validation loss = 0.5094444751739502
Validation loss = 0.5056520700454712
Validation loss = 0.5130581259727478
Validation loss = 0.5165093541145325
Validation loss = 0.5145971179008484
Validation loss = 0.5174930691719055
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5068943500518799
Validation loss = 0.5091759562492371
Validation loss = 0.5096376538276672
Validation loss = 0.5122494697570801
Validation loss = 0.5114775896072388
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.50445955991745
Validation loss = 0.5085358619689941
Validation loss = 0.5087102651596069
Validation loss = 0.5105039477348328
Validation loss = 0.5091872215270996
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5081799626350403
Validation loss = 0.5091202855110168
Validation loss = 0.5113541483879089
Validation loss = 0.5155669450759888
Validation loss = 0.5140666365623474
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5088340044021606
Validation loss = 0.5078220963478088
Validation loss = 0.5091034770011902
Validation loss = 0.510394275188446
Validation loss = 0.5112044811248779
Validation loss = 0.5139479637145996
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.387546468401487
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3881151346332405
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.38868274582560297
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.3911028730305839
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3907407407407407
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.391304347826087
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.39186691312384475
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.39335180055401664
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.39391143911439114
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.39447004608294933
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.39502762430939226
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.39558417663293466
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3961397058823529
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3957759412304867
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3963302752293578
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.3996333638863428
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4001831501831502
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.40073193046660566
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4012797074954296
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4018264840182648
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4023722627737226
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4029170464904284
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.40437158469945356
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4049135577797998
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.40545454545454546
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -110     |
| Iteration     | 42       |
| MaximumReturn | -42.3    |
| MinimumReturn | -155     |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5164700150489807
Validation loss = 0.5175623893737793
Validation loss = 0.5152477622032166
Validation loss = 0.518769383430481
Validation loss = 0.5214547514915466
Validation loss = 0.5228543281555176
Validation loss = 0.5301018953323364
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5121064782142639
Validation loss = 0.5108987092971802
Validation loss = 0.5144187211990356
Validation loss = 0.5200589895248413
Validation loss = 0.5159698724746704
Validation loss = 0.517516553401947
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.508643388748169
Validation loss = 0.5112537145614624
Validation loss = 0.5109161138534546
Validation loss = 0.5139279961585999
Validation loss = 0.5154967904090881
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5100739002227783
Validation loss = 0.509661078453064
Validation loss = 0.5127418041229248
Validation loss = 0.5150021910667419
Validation loss = 0.5193029642105103
Validation loss = 0.5191017985343933
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5121384263038635
Validation loss = 0.5183550119400024
Validation loss = 0.5134766101837158
Validation loss = 0.5184983611106873
Validation loss = 0.518671452999115
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.40599455040871935
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4074410163339383
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.41069809610154123
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.411231884057971
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4117647058823529
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.41500904159132007
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4146341463414634
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.41787003610108303
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.41839495040577096
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4198198198198198
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.41944194419441944
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4199640287769784
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.4222821203953279
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4236983842010772
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4233183856502242
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.42383512544802865
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4252461951656222
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.426654740608229
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.42716711349419123
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4276785714285714
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.4299732381801962
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4304812834224599
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.43098842386464825
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.43238434163701067
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4328888888888889
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -86.1    |
| Iteration     | 43       |
| MaximumReturn | -0.312   |
| MinimumReturn | -152     |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5164613723754883
Validation loss = 0.510951817035675
Validation loss = 0.515799880027771
Validation loss = 0.5134421586990356
Validation loss = 0.517578125
Validation loss = 0.5210562348365784
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5085291862487793
Validation loss = 0.5146982669830322
Validation loss = 0.51836758852005
Validation loss = 0.5167096853256226
Validation loss = 0.5124202370643616
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5071645975112915
Validation loss = 0.5137930512428284
Validation loss = 0.514855682849884
Validation loss = 0.5130412578582764
Validation loss = 0.5164838433265686
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5148041248321533
Validation loss = 0.509730339050293
Validation loss = 0.5164084434509277
Validation loss = 0.5108461380004883
Validation loss = 0.5168447494506836
Validation loss = 0.5135550498962402
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5095289349555969
Validation loss = 0.5096673369407654
Validation loss = 0.5134264230728149
Validation loss = 0.5141792297363281
Validation loss = 0.5169302225112915
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.4369449378330373
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.43744454303460517
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4370567375886525
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.43932683790965454
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.4424778761061947
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.4465075154730327
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.44876325088339225
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.44836716681376876
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.4506172839506173
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.45462555066079297
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4551056338028169
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4547053649956025
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.45430579964850615
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4556628621597893
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.45526315789473687
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.45574057843996496
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4553415061295972
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4549431321084864
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.45454545454545453
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.45414847161572053
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.456369982547993
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.45771578029642546
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.45905923344947736
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4586597040905135
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4582608695652174
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -24.4    |
| Iteration     | 44       |
| MaximumReturn | -0.13    |
| MinimumReturn | -133     |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5185523629188538
Validation loss = 0.517791748046875
Validation loss = 0.5159788727760315
Validation loss = 0.5236737132072449
Validation loss = 0.5171157717704773
Validation loss = 0.5194681882858276
Validation loss = 0.5183824300765991
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5140289664268494
Validation loss = 0.5116750001907349
Validation loss = 0.5126112103462219
Validation loss = 0.5165024995803833
Validation loss = 0.5139428973197937
Validation loss = 0.5190085172653198
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5090007185935974
Validation loss = 0.513648509979248
Validation loss = 0.5106253623962402
Validation loss = 0.5089611411094666
Validation loss = 0.5151029229164124
Validation loss = 0.5172125697135925
Validation loss = 0.5201309323310852
Validation loss = 0.515657901763916
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5097136497497559
Validation loss = 0.514986515045166
Validation loss = 0.5123814940452576
Validation loss = 0.5171963572502136
Validation loss = 0.5231685042381287
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5121299624443054
Validation loss = 0.5093706250190735
Validation loss = 0.5109763741493225
Validation loss = 0.517920732498169
Validation loss = 0.5145314335823059
Validation loss = 0.5146178007125854
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4587315377932233
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4583333333333333
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.45793581960104074
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.46013864818024264
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.46060606060606063
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4610726643598616
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.46153846153846156
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46113989637305697
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 10
average number of affinization = 0.46937014667817084
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4698275862068966
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4702842377260982
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47074010327022375
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.471195184866724
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47079037800687284
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47124463519313303
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4716981132075472
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4730077120822622
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4743150684931507
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4739093242087254
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47435897435897434
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.47651579846285225
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47696245733788395
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4774083546462063
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4778534923339012
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.48
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -59.5    |
| Iteration     | 45       |
| MaximumReturn | -0.115   |
| MinimumReturn | -131     |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5188193917274475
Validation loss = 0.5256816148757935
Validation loss = 0.5240322947502136
Validation loss = 0.5184558033943176
Validation loss = 0.5215100646018982
Validation loss = 0.5217000246047974
Validation loss = 0.5196918845176697
Validation loss = 0.5261316299438477
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5095467567443848
Validation loss = 0.5124120116233826
Validation loss = 0.5150870084762573
Validation loss = 0.5115350484848022
Validation loss = 0.5147591829299927
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5121469497680664
Validation loss = 0.5167641639709473
Validation loss = 0.5155821442604065
Validation loss = 0.5164715051651001
Validation loss = 0.5245438814163208
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5132725238800049
Validation loss = 0.5150493383407593
Validation loss = 0.5149861574172974
Validation loss = 0.515125036239624
Validation loss = 0.519221842288971
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5131019353866577
Validation loss = 0.5179087519645691
Validation loss = 0.5157418847084045
Validation loss = 0.5123385787010193
Validation loss = 0.5192229747772217
Validation loss = 0.5219693183898926
Validation loss = 0.5172659158706665
Validation loss = 0.5215591192245483
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.48214285714285715
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4817332200509771
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.48217317487266553
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.48346055979643765
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4830508474576271
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.48348856900931414
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.4856175972927242
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.48605240912933223
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.48817567567567566
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.4919831223628692
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49156829679595276
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4928390901432182
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.49326599326599324
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.4978973927670311
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49747899159663866
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.4995801847187238
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5008389261744967
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5020955574182733
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5033500837520938
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.502928870292887
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.504180602006689
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.506265664160401
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.506677796327212
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5062552126772311
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5066666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -49      |
| Iteration     | 46       |
| MaximumReturn | -0.164   |
| MinimumReturn | -132     |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5230237245559692
Validation loss = 0.5200532674789429
Validation loss = 0.5198366641998291
Validation loss = 0.5230447053909302
Validation loss = 0.5208117365837097
Validation loss = 0.5291018486022949
Validation loss = 0.5281698107719421
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5150494575500488
Validation loss = 0.5079435110092163
Validation loss = 0.5110458135604858
Validation loss = 0.5159571766853333
Validation loss = 0.5150403380393982
Validation loss = 0.5210792422294617
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5162588953971863
Validation loss = 0.511612057685852
Validation loss = 0.5189113616943359
Validation loss = 0.517001211643219
Validation loss = 0.5212634801864624
Validation loss = 0.5210325121879578
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5162752270698547
Validation loss = 0.5155020952224731
Validation loss = 0.5207886695861816
Validation loss = 0.5150550007820129
Validation loss = 0.5174622535705566
Validation loss = 0.518641471862793
Validation loss = 0.5172101259231567
Validation loss = 0.522778332233429
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5202892422676086
Validation loss = 0.5216663479804993
Validation loss = 0.5161832571029663
Validation loss = 0.5180664658546448
Validation loss = 0.5163866877555847
Validation loss = 0.5158486366271973
Validation loss = 0.5241340398788452
Validation loss = 0.5297189950942993
Validation loss = 0.5224674940109253
Validation loss = 0.5213298201560974
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5087427144046628
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5083194675540765
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5095594347464671
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5091362126245847
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5095435684647303
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5099502487562189
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5111847555923777
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5107615894039735
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.511166253101737
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5107438016528926
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5111478117258464
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5132013201320133
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5136026380873866
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5131795716639209
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5135802469135803
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5139802631578947
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.514379622021364
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5164203612479474
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5159967186218212
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5172131147540984
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5176085176085176
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5180032733224222
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5183973834832379
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5212418300653595
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5216326530612245
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -101     |
| Iteration     | 47       |
| MaximumReturn | -0.26    |
| MinimumReturn | -144     |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5219391584396362
Validation loss = 0.5215344429016113
Validation loss = 0.5227869153022766
Validation loss = 0.5274665355682373
Validation loss = 0.5287531614303589
Validation loss = 0.5256191492080688
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5126503109931946
Validation loss = 0.5156342387199402
Validation loss = 0.5155982971191406
Validation loss = 0.5197359323501587
Validation loss = 0.5181038975715637
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5162984132766724
Validation loss = 0.5185483694076538
Validation loss = 0.5180172920227051
Validation loss = 0.5232992768287659
Validation loss = 0.5276576280593872
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5202294588088989
Validation loss = 0.5230382680892944
Validation loss = 0.5120911598205566
Validation loss = 0.5175484418869019
Validation loss = 0.5226329565048218
Validation loss = 0.5196213722229004
Validation loss = 0.521278440952301
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.521880567073822
Validation loss = 0.5191643834114075
Validation loss = 0.52275151014328
Validation loss = 0.5180689096450806
Validation loss = 0.526852548122406
Validation loss = 0.5248732566833496
Validation loss = 0.5236402750015259
Validation loss = 0.5237003564834595
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5236541598694943
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5248573757131214
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5260586319218241
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5288852725793328
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5300813008130081
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5304630381803412
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5316558441558441
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5344687753446877
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.5388978930307942
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5417004048582996
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5420711974110033
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5432497978981407
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5444264943457189
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.5480225988700564
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.55
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5527800161160354
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.5571658615136876
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5599356395816573
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.5643086816720257
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5654618473895582
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5666131621187801
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5693664795509222
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5721153846153846
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.5764611689351481
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5776
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -95.6    |
| Iteration     | 48       |
| MaximumReturn | -33.5    |
| MinimumReturn | -140     |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.515620231628418
Validation loss = 0.5226930379867554
Validation loss = 0.5148272514343262
Validation loss = 0.518604576587677
Validation loss = 0.517071545124054
Validation loss = 0.5301395654678345
Validation loss = 0.5207698941230774
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5053901672363281
Validation loss = 0.5040292739868164
Validation loss = 0.5137580633163452
Validation loss = 0.5050803422927856
Validation loss = 0.5109241008758545
Validation loss = 0.5120126008987427
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5071104168891907
Validation loss = 0.5111975073814392
Validation loss = 0.5193031430244446
Validation loss = 0.5127694606781006
Validation loss = 0.5163094997406006
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5143745541572571
Validation loss = 0.5149855017662048
Validation loss = 0.5158897638320923
Validation loss = 0.5193461179733276
Validation loss = 0.5152733325958252
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5190718770027161
Validation loss = 0.5132523775100708
Validation loss = 0.5159874558448792
Validation loss = 0.5158951878547668
Validation loss = 0.5214802622795105
Validation loss = 0.5168941617012024
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5779376498800959
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5798722044728435
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.581803671189146
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5829346092503987
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5832669322709163
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5851910828025477
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5871121718377088
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5890302066772655
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5885623510722796
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5888888888888889
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5892149088025377
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.589540412044374
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5898653998416469
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5901898734177216
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.591304347826087
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.593996840442338
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5943172849250198
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5962145110410094
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5965327029156816
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5984251968503937
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6011014948859166
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6022012578616353
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6040848389630793
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6043956043956044
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6062745098039216
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -84.8    |
| Iteration     | 49       |
| MaximumReturn | -0.537   |
| MinimumReturn | -150     |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5169190168380737
Validation loss = 0.516079843044281
Validation loss = 0.5243737101554871
Validation loss = 0.5264565348625183
Validation loss = 0.5208547711372375
Validation loss = 0.517065703868866
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5081543326377869
Validation loss = 0.5072058439254761
Validation loss = 0.509890079498291
Validation loss = 0.5079413652420044
Validation loss = 0.5079996585845947
Validation loss = 0.5125612616539001
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5117201805114746
Validation loss = 0.5109721422195435
Validation loss = 0.511016845703125
Validation loss = 0.5217012763023376
Validation loss = 0.5107719302177429
Validation loss = 0.5096598863601685
Validation loss = 0.5135370492935181
Validation loss = 0.5179842710494995
Validation loss = 0.517483651638031
Validation loss = 0.5196636915206909
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5136885046958923
Validation loss = 0.5097595453262329
Validation loss = 0.5162367224693298
Validation loss = 0.5149840116500854
Validation loss = 0.5170342922210693
Validation loss = 0.5176494717597961
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5161358118057251
Validation loss = 0.5151941776275635
Validation loss = 0.517532467842102
Validation loss = 0.5190427899360657
Validation loss = 0.5164738893508911
Validation loss = 0.5177628397941589
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6065830721003135
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6068911511354738
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.607981220657277
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6075058639562158
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.60703125
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6073380171740828
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6068642745709828
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6063912704598597
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6059190031152648
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.6101167315175098
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.609642301710731
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6091686091686092
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6110248447204969
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.6152055857253685
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6170542635658914
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6181254841208366
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.618421052631579
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6187161639597835
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6190108191653787
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6193050193050194
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6195987654320988
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6198920585967618
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6194144838212635
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6212471131639723
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.6253846153846154
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -106     |
| Iteration     | 50       |
| MaximumReturn | -15.8    |
| MinimumReturn | -145     |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5162323713302612
Validation loss = 0.5110523104667664
Validation loss = 0.5159857273101807
Validation loss = 0.5193427801132202
Validation loss = 0.5238768458366394
Validation loss = 0.527473509311676
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5094113945960999
Validation loss = 0.5092092752456665
Validation loss = 0.5079299211502075
Validation loss = 0.5168164968490601
Validation loss = 0.5105217099189758
Validation loss = 0.5102588534355164
Validation loss = 0.512336015701294
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5140171647071838
Validation loss = 0.5123414993286133
Validation loss = 0.5155860781669617
Validation loss = 0.5156051516532898
Validation loss = 0.5173628330230713
Validation loss = 0.5209825038909912
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5113387107849121
Validation loss = 0.5102386474609375
Validation loss = 0.5128544569015503
Validation loss = 0.5155321359634399
Validation loss = 0.5144320726394653
Validation loss = 0.5185613632202148
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5150647163391113
Validation loss = 0.5120143890380859
Validation loss = 0.5140149593353271
Validation loss = 0.5177702903747559
Validation loss = 0.5169663429260254
Validation loss = 0.5176333785057068
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6264411990776326
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6267281105990783
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6293169608595549
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6326687116564417
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6352490421455939
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6362940275650842
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6381025248661056
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6406727828746177
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.640183346065699
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6404580152671756
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6407322654462243
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6432926829268293
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6458492003046459
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6461187214611872
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6471482889733841
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6466565349544073
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6476841305998481
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.6517450682852808
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.6557998483699773
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6583333333333333
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6578349735049205
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6588502269288956
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6598639455782312
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6608761329305136
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6641509433962264
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -50      |
| Iteration     | 51       |
| MaximumReturn | -0.303   |
| MinimumReturn | -127     |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5199970602989197
Validation loss = 0.5210469365119934
Validation loss = 0.5191150307655334
Validation loss = 0.5173183083534241
Validation loss = 0.5173522233963013
Validation loss = 0.5213655829429626
Validation loss = 0.5198106169700623
Validation loss = 0.523581862449646
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5072156190872192
Validation loss = 0.5059385895729065
Validation loss = 0.5131267309188843
Validation loss = 0.5130299925804138
Validation loss = 0.5116276144981384
Validation loss = 0.514556348323822
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5146426558494568
Validation loss = 0.5179077386856079
Validation loss = 0.5166133046150208
Validation loss = 0.5170873403549194
Validation loss = 0.5180971622467041
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.511908769607544
Validation loss = 0.5131935477256775
Validation loss = 0.5167587995529175
Validation loss = 0.5139495730400085
Validation loss = 0.5124748349189758
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5121220350265503
Validation loss = 0.5128120183944702
Validation loss = 0.5179890990257263
Validation loss = 0.5137155055999756
Validation loss = 0.5148111581802368
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.665158371040724
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6661642803315749
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6671686746987951
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6681715575620768
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6691729323308271
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6694214876033058
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6719219219219219
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6721680420105026
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6754122938530734
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6764044943820224
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6758982035928144
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.675392670157068
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6763826606875935
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6758775205377147
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6791044776119403
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6808351976137211
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.680327868852459
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6798212956068503
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6808035714285714
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6802973977695167
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6797919762258544
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.682256867112101
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.685459940652819
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.687175685693106
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6881481481481482
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -91.6    |
| Iteration     | 52       |
| MaximumReturn | -0.684   |
| MinimumReturn | -137     |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5182207226753235
Validation loss = 0.5188234448432922
Validation loss = 0.5169243216514587
Validation loss = 0.5172247290611267
Validation loss = 0.5140968561172485
Validation loss = 0.5209994912147522
Validation loss = 0.5224032402038574
Validation loss = 0.5232515335083008
Validation loss = 0.5218462944030762
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5098103284835815
Validation loss = 0.5113050937652588
Validation loss = 0.5105252265930176
Validation loss = 0.5072822570800781
Validation loss = 0.5153818130493164
Validation loss = 0.514399528503418
Validation loss = 0.5093213319778442
Validation loss = 0.512236475944519
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5229755640029907
Validation loss = 0.5084871053695679
Validation loss = 0.5132062435150146
Validation loss = 0.5132694840431213
Validation loss = 0.5145995020866394
Validation loss = 0.5150355100631714
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5096499919891357
Validation loss = 0.5101597905158997
Validation loss = 0.5094736814498901
Validation loss = 0.5173586010932922
Validation loss = 0.5120989680290222
Validation loss = 0.5139597058296204
Validation loss = 0.5114383101463318
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.515940248966217
Validation loss = 0.5094625949859619
Validation loss = 0.5131269097328186
Validation loss = 0.5165249109268188
Validation loss = 0.5107294321060181
Validation loss = 0.5119696855545044
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.689859363434493
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6893491124260355
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6895787139689579
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6927621861152142
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.692250922509225
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6917404129793511
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6941783345615328
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6958762886597938
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.695364238410596
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6948529411764706
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6943423952975754
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6938325991189427
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6947909024211298
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6950146627565983
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6945054945054945
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6939970717423133
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6949524506217996
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6944444444444444
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6939371804236669
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6948905109489051
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6943836615609045
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6938775510204082
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6933721777130372
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6943231441048034
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.696
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -81.8    |
| Iteration     | 53       |
| MaximumReturn | -6.24    |
| MinimumReturn | -145     |
| TotalSamples  | 91630    |
----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5204620361328125
Validation loss = 0.5185717940330505
Validation loss = 0.5228784680366516
Validation loss = 0.5176751613616943
Validation loss = 0.5233955979347229
Validation loss = 0.5234515070915222
Validation loss = 0.5272101759910583
Validation loss = 0.5203492641448975
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5098262429237366
Validation loss = 0.5144118666648865
Validation loss = 0.5122565627098083
Validation loss = 0.5118821263313293
Validation loss = 0.5115216374397278
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5143869519233704
Validation loss = 0.5100123286247253
Validation loss = 0.5155434012413025
Validation loss = 0.5187089443206787
Validation loss = 0.5112186074256897
Validation loss = 0.5122125148773193
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5140680074691772
Validation loss = 0.5121282935142517
Validation loss = 0.513961672782898
Validation loss = 0.51579350233078
Validation loss = 0.5186507105827332
Validation loss = 0.5146015882492065
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5088360905647278
Validation loss = 0.5153042078018188
Validation loss = 0.5139074921607971
Validation loss = 0.5132424831390381
Validation loss = 0.5150840878486633
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6969476744186046
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6964415395787945
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6959361393323658
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.696881798404641
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.696376811594203
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6965966690803765
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6960926193921853
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.695589298626175
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6965317919075145
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6981949458483755
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6998556998556998
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.7036770007209805
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.7038904899135446
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7062634989200864
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7086330935251799
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7081236520488857
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.7119252873563219
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7114142139267767
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7130559540889526
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7154121863799283
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7177650429799427
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.717967072297781
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.7224606580829757
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7248034310221587
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7271428571428571
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -62.7    |
| Iteration     | 54       |
| MaximumReturn | -0.213   |
| MinimumReturn | -128     |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5155478119850159
Validation loss = 0.5150851011276245
Validation loss = 0.5137827396392822
Validation loss = 0.5169655084609985
Validation loss = 0.5169926881790161
Validation loss = 0.5207922458648682
Validation loss = 0.5256785750389099
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5140526294708252
Validation loss = 0.5067188143730164
Validation loss = 0.5118089318275452
Validation loss = 0.5058931708335876
Validation loss = 0.5066624283790588
Validation loss = 0.5088335275650024
Validation loss = 0.5105134844779968
Validation loss = 0.5129194855690002
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5101270079612732
Validation loss = 0.5115099549293518
Validation loss = 0.5074275135993958
Validation loss = 0.5197912454605103
Validation loss = 0.5102642774581909
Validation loss = 0.5166213512420654
Validation loss = 0.514182448387146
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5077157020568848
Validation loss = 0.506935715675354
Validation loss = 0.5099859833717346
Validation loss = 0.5118722915649414
Validation loss = 0.5109283328056335
Validation loss = 0.5135445594787598
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5086061358451843
Validation loss = 0.5102186799049377
Validation loss = 0.5060781836509705
Validation loss = 0.5148522853851318
Validation loss = 0.5083016157150269
Validation loss = 0.5113213658332825
Validation loss = 0.5133767127990723
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7266238401142041
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.7303851640513552
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7320028510334996
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7329059829059829
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7323843416370107
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7332859174964438
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7348969438521677
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.734375
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7366926898509581
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.7368794326241135
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.7406094968107725
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.740084985835694
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7395612172682237
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7425742574257426
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 9
average number of affinization = 0.7484098939929329
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7478813559322034
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7508821453775583
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7524682651622003
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.751937984496124
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7549295774647887
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.7586206896551724
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.760196905766526
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.763879128601546
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7661516853932584
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7670175438596492
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -78.9    |
| Iteration     | 55       |
| MaximumReturn | -3.24    |
| MinimumReturn | -131     |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5080441832542419
Validation loss = 0.51195228099823
Validation loss = 0.5144427418708801
Validation loss = 0.5147952437400818
Validation loss = 0.5090117454528809
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5022544264793396
Validation loss = 0.5022307634353638
Validation loss = 0.5032106041908264
Validation loss = 0.501539945602417
Validation loss = 0.5047324895858765
Validation loss = 0.5067948698997498
Validation loss = 0.5074958801269531
Validation loss = 0.5051828026771545
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5126840472221375
Validation loss = 0.5041803121566772
Validation loss = 0.507552981376648
Validation loss = 0.501829206943512
Validation loss = 0.5088716745376587
Validation loss = 0.5074952244758606
Validation loss = 0.509697437286377
Validation loss = 0.5113205313682556
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5033736824989319
Validation loss = 0.5064651370048523
Validation loss = 0.5025511384010315
Validation loss = 0.5073968172073364
Validation loss = 0.5090968608856201
Validation loss = 0.5073102712631226
Validation loss = 0.5103864669799805
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5077090859413147
Validation loss = 0.5021212697029114
Validation loss = 0.5015933513641357
Validation loss = 0.5063343644142151
Validation loss = 0.5041391253471375
Validation loss = 0.5067256093025208
Validation loss = 0.5080528855323792
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7699859747545582
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7694463910301331
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7724089635854342
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7732680195941217
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7741258741258741
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7763801537386443
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7772346368715084
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.776692254012561
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7761506276150628
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7770034843205574
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7778551532033426
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.7780097425191371
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7809457579972183
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.7810979847116053
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7805555555555556
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7827897293546149
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7836338418862691
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7851697851697852
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7867036011080333
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.7903114186851211
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7918395573997233
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.7961299239806496
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7976519337016574
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8012422360248447
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8020689655172414
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -41.6    |
| Iteration     | 56       |
| MaximumReturn | -0.457   |
| MinimumReturn | -99.2    |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5129805207252502
Validation loss = 0.5072572231292725
Validation loss = 0.5056046843528748
Validation loss = 0.5129379630088806
Validation loss = 0.5059619545936584
Validation loss = 0.5112544894218445
Validation loss = 0.5126869678497314
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5034255385398865
Validation loss = 0.500962495803833
Validation loss = 0.504582941532135
Validation loss = 0.5042163729667664
Validation loss = 0.504479169845581
Validation loss = 0.5070231556892395
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5084289908409119
Validation loss = 0.5074326395988464
Validation loss = 0.5040757060050964
Validation loss = 0.5076274871826172
Validation loss = 0.5111717581748962
Validation loss = 0.50696861743927
Validation loss = 0.5073010325431824
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5050289034843445
Validation loss = 0.5000255703926086
Validation loss = 0.5028627514839172
Validation loss = 0.5073641538619995
Validation loss = 0.5100567936897278
Validation loss = 0.5090175867080688
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5024312138557434
Validation loss = 0.5031307935714722
Validation loss = 0.5059378743171692
Validation loss = 0.5078454613685608
Validation loss = 0.5048560500144958
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8035837353549277
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8057851239669421
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.805230557467309
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8081155433287482
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8096219931271478
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8111263736263736
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8119423472889499
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8148148148148148
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8176833447566827
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8205479452054795
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8199863107460643
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8228454172366622
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8229665071770335
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.825136612021858
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.8300341296928327
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8321964529331515
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8323108384458078
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8317438692098093
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8332198774676651
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8353741496598639
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8388851121685927
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8403532608695652
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8424983027834352
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8419267299864315
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8433898305084746
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -53.1    |
| Iteration     | 57       |
| MaximumReturn | -0.229   |
| MinimumReturn | -127     |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5045690536499023
Validation loss = 0.5033879280090332
Validation loss = 0.5052207112312317
Validation loss = 0.5097195506095886
Validation loss = 0.505479633808136
Validation loss = 0.5057101845741272
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5023791790008545
Validation loss = 0.49848246574401855
Validation loss = 0.4982524812221527
Validation loss = 0.500735342502594
Validation loss = 0.5025662183761597
Validation loss = 0.5045048594474792
Validation loss = 0.5025973320007324
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49802809953689575
Validation loss = 0.4987722933292389
Validation loss = 0.5120894908905029
Validation loss = 0.5015831589698792
Validation loss = 0.5032734274864197
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49749845266342163
Validation loss = 0.5069279074668884
Validation loss = 0.5016508102416992
Validation loss = 0.5000371932983398
Validation loss = 0.5053654909133911
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5046693682670593
Validation loss = 0.5004536509513855
Validation loss = 0.498962938785553
Validation loss = 0.5062273144721985
Validation loss = 0.5010401606559753
Validation loss = 0.5038091540336609
Validation loss = 0.5088378190994263
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8468834688346883
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8483412322274881
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8491204330175913
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8485463150777552
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8493243243243244
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8507765023632681
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8502024291497976
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.851652056641942
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.851078167115903
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8518518518518519
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8553162853297442
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8547410894418291
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8581989247311828
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8596373404969778
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8610738255033556
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8618376928236083
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8612600536193029
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8613529805760214
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8634538152610441
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8655518394648829
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8649732620320856
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8657314629258517
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8651535380507344
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.866577718478986
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8686666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26.2    |
| Iteration     | 58       |
| MaximumReturn | -0.148   |
| MinimumReturn | -82      |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.506493091583252
Validation loss = 0.5021806955337524
Validation loss = 0.5074971914291382
Validation loss = 0.5094722509384155
Validation loss = 0.5037521123886108
Validation loss = 0.5067121386528015
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4971250295639038
Validation loss = 0.49967652559280396
Validation loss = 0.49902451038360596
Validation loss = 0.5019584894180298
Validation loss = 0.5083580017089844
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5051536560058594
Validation loss = 0.5009686946868896
Validation loss = 0.5030462145805359
Validation loss = 0.5044674873352051
Validation loss = 0.5057268142700195
Validation loss = 0.5019688010215759
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49989521503448486
Validation loss = 0.49965861439704895
Validation loss = 0.49963459372520447
Validation loss = 0.5087518095970154
Validation loss = 0.504082441329956
Validation loss = 0.5030319094657898
Validation loss = 0.5039018988609314
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5081673860549927
Validation loss = 0.5022790431976318
Validation loss = 0.4989554286003113
Validation loss = 0.5000796318054199
Validation loss = 0.5031651258468628
Validation loss = 0.5035208463668823
Validation loss = 0.5035563707351685
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8700866089273818
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8695073235685752
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8715901530272788
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8710106382978723
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8717607973421927
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8725099601593626
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8732581287325812
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.873342175066313
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8760768721007289
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8768211920529801
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8775645268034414
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.876984126984127
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8790482485128883
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8811096433289299
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8818481848184818
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8825857519788918
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8826631509558339
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8820816864295126
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8841342988808426
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8835526315789474
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8842866535174227
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8876478318002629
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8870650032829941
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.889763779527559
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8904918032786885
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -18.1    |
| Iteration     | 59       |
| MaximumReturn | -0.323   |
| MinimumReturn | -80.6    |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5088594555854797
Validation loss = 0.5059860944747925
Validation loss = 0.5112723112106323
Validation loss = 0.5034301280975342
Validation loss = 0.5064031481742859
Validation loss = 0.5060614347457886
Validation loss = 0.506910502910614
Validation loss = 0.5126546621322632
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.49749991297721863
Validation loss = 0.49842312932014465
Validation loss = 0.50185227394104
Validation loss = 0.498609721660614
Validation loss = 0.4999503791332245
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.503150999546051
Validation loss = 0.504902184009552
Validation loss = 0.5006093382835388
Validation loss = 0.5019809007644653
Validation loss = 0.5051824450492859
Validation loss = 0.5041305422782898
Validation loss = 0.5052487254142761
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5010250806808472
Validation loss = 0.49833759665489197
Validation loss = 0.4989299774169922
Validation loss = 0.499099999666214
Validation loss = 0.5065234899520874
Validation loss = 0.5012752413749695
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5052863359451294
Validation loss = 0.4988568127155304
Validation loss = 0.5039317607879639
Validation loss = 0.5018134117126465
Validation loss = 0.5049334764480591
Validation loss = 0.5046612024307251
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8931847968545217
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8952193844138834
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8965968586387435
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8979725310660562
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9026779882429784
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9053524804177546
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9073711676451403
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9080834419817471
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9074918566775244
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9069010416666666
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.907612231620039
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 10
average number of affinization = 0.9135240572171651
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9148797920727745
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9142857142857143
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9149902660609993
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.914396887159533
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.915100453661698
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.9183937823834197
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.9216828478964402
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9236739974126779
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9256625727213963
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9276485788113695
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9302775984506133
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9296774193548387
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -30.2    |
| Iteration     | 60       |
| MaximumReturn | -0.334   |
| MinimumReturn | -120     |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5069243311882019
Validation loss = 0.5059197545051575
Validation loss = 0.510377824306488
Validation loss = 0.5077359676361084
Validation loss = 0.509101390838623
Validation loss = 0.5155931115150452
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.49666211009025574
Validation loss = 0.4964223802089691
Validation loss = 0.4992261528968811
Validation loss = 0.5010784268379211
Validation loss = 0.5023176074028015
Validation loss = 0.5010570287704468
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5038431882858276
Validation loss = 0.5101369023323059
Validation loss = 0.5052092671394348
Validation loss = 0.5067622065544128
Validation loss = 0.5033611059188843
Validation loss = 0.5048472285270691
Validation loss = 0.5060296654701233
Validation loss = 0.5111290812492371
Validation loss = 0.5090423226356506
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.500215470790863
Validation loss = 0.5015328526496887
Validation loss = 0.4999730587005615
Validation loss = 0.5012487173080444
Validation loss = 0.5054683685302734
Validation loss = 0.5036184191703796
Validation loss = 0.5061888098716736
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4994215965270996
Validation loss = 0.505409300327301
Validation loss = 0.5024473667144775
Validation loss = 0.49939417839050293
Validation loss = 0.5040223002433777
Validation loss = 0.5029610991477966
Validation loss = 0.5059593915939331
Validation loss = 0.5047656893730164
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9297227595099935
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9323453608247423
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9330328396651641
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 9
average number of affinization = 0.9382239382239382
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.940192926045016
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9395886889460154
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9409120102761721
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9403080872913993
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.9441949967928159
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9442307692307692
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9461883408071748
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9462227912932138
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.946896992962252
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9482097186700768
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.949520766773163
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9501915708812261
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9527760051052967
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9528061224489796
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.9566602931803697
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9592356687898089
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9618077657542966
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9611959287531806
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9624920534011443
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9618805590851334
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9619047619047619
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26.2    |
| Iteration     | 61       |
| MaximumReturn | -0.397   |
| MinimumReturn | -86.2    |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5079804062843323
Validation loss = 0.5082660913467407
Validation loss = 0.5101667046546936
Validation loss = 0.5088294744491577
Validation loss = 0.5063410997390747
Validation loss = 0.5122754573822021
Validation loss = 0.5138781070709229
Validation loss = 0.5116792917251587
Validation loss = 0.5171545743942261
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5024023056030273
Validation loss = 0.50432950258255
Validation loss = 0.5030760169029236
Validation loss = 0.5031545162200928
Validation loss = 0.5038307905197144
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.511094331741333
Validation loss = 0.505969762802124
Validation loss = 0.5080366134643555
Validation loss = 0.5041252374649048
Validation loss = 0.514728307723999
Validation loss = 0.5093157887458801
Validation loss = 0.509721577167511
Validation loss = 0.5121419429779053
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5093539953231812
Validation loss = 0.4997158348560333
Validation loss = 0.5044804811477661
Validation loss = 0.5043843984603882
Validation loss = 0.5071991682052612
Validation loss = 0.5064858198165894
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5102239847183228
Validation loss = 0.5011920928955078
Validation loss = 0.5038321018218994
Validation loss = 0.5038740634918213
Validation loss = 0.5059521198272705
Validation loss = 0.505010187625885
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9638324873096447
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9651236525047558
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9657794676806084
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9651678277390754
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9664556962025317
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9658444022770398
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9652338811630847
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9658875552747946
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9652777777777778
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9646687697160883
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9653215636822194
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9659735349716446
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.966624685138539
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9660163624921334
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9679245283018868
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9685732243871779
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.967964824120603
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9686126804770873
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9692597239648683
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 9
average number of affinization = 0.974294670846395
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.974937343358396
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9743268628678773
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.976846057571965
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9787367104440275
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.979375
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.71    |
| Iteration     | 62       |
| MaximumReturn | -0.352   |
| MinimumReturn | -66.6    |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5153980851173401
Validation loss = 0.5122623443603516
Validation loss = 0.5113865733146667
Validation loss = 0.5139106512069702
Validation loss = 0.5131450891494751
Validation loss = 0.5103914737701416
Validation loss = 0.5109350681304932
Validation loss = 0.5173559188842773
Validation loss = 0.5160465240478516
Validation loss = 0.5166375041007996
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5074072480201721
Validation loss = 0.5035504698753357
Validation loss = 0.5045303702354431
Validation loss = 0.5027370452880859
Validation loss = 0.5009958744049072
Validation loss = 0.5082920789718628
Validation loss = 0.5098823308944702
Validation loss = 0.5092560648918152
Validation loss = 0.5084536075592041
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5091615319252014
Validation loss = 0.5066547989845276
Validation loss = 0.5083497166633606
Validation loss = 0.5093112587928772
Validation loss = 0.5127493739128113
Validation loss = 0.5108610987663269
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5158268809318542
Validation loss = 0.5052498579025269
Validation loss = 0.5065564513206482
Validation loss = 0.5096367597579956
Validation loss = 0.5083656311035156
Validation loss = 0.5106746554374695
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5093544721603394
Validation loss = 0.5017560124397278
Validation loss = 0.5069313049316406
Validation loss = 0.512383222579956
Validation loss = 0.5099601745605469
Validation loss = 0.5085477232933044
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9793878825733916
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9812734082397003
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9806612601372426
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9800498753117207
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9794392523364486
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9800747198007472
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9794648413192284
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.9825870646766169
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9825978868862648
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9832298136645963
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9832402234636871
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9826302729528535
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9832610043397396
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.983271375464684
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9826625386996904
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9826732673267327
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9833024118738405
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9826946847960445
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.983940704138357
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9858024691358025
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.9888957433682912
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9882860665844636
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9876771410967344
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9870689655172413
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.9901538461538462
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.11    |
| Iteration     | 63       |
| MaximumReturn | -0.259   |
| MinimumReturn | -109     |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5192368626594543
Validation loss = 0.5140650868415833
Validation loss = 0.5198731422424316
Validation loss = 0.5199947953224182
Validation loss = 0.5210988521575928
Validation loss = 0.5174184441566467
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5069890022277832
Validation loss = 0.5136567950248718
Validation loss = 0.5145941376686096
Validation loss = 0.5117213129997253
Validation loss = 0.5104342699050903
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5165133476257324
Validation loss = 0.5143412947654724
Validation loss = 0.5108928084373474
Validation loss = 0.51120525598526
Validation loss = 0.5169719457626343
Validation loss = 0.5133360624313354
Validation loss = 0.515326738357544
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5107225775718689
Validation loss = 0.5092026591300964
Validation loss = 0.5089960694313049
Validation loss = 0.5129383206367493
Validation loss = 0.5138617753982544
Validation loss = 0.5161765813827515
Validation loss = 0.5154024958610535
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.50984787940979
Validation loss = 0.5090557932853699
Validation loss = 0.5065288543701172
Validation loss = 0.5068031549453735
Validation loss = 0.5097239017486572
Validation loss = 0.5137056708335876
Validation loss = 0.5114524960517883
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9920049200492005
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9932390903503381
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9926289926289926
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9938612645794966
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9950920245398773
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9944819129368485
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9938725490196079
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9932639314145744
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.99265605875153
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9926605504587156
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.9963325183374083
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9975565058032987
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0006105006105006
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9993906154783668
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.001217285453439
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0036496350364963
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.004255319148936
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0042527339003646
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0048573163327261
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0066747572815533
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0060642813826561
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0066666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.04    |
| Iteration     | 64       |
| MaximumReturn | -0.428   |
| MinimumReturn | -50.8    |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5217307209968567
Validation loss = 0.5142688155174255
Validation loss = 0.5184893012046814
Validation loss = 0.5157009363174438
Validation loss = 0.5191634893417358
Validation loss = 0.5198718309402466
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5100842118263245
Validation loss = 0.5075722336769104
Validation loss = 0.507843554019928
Validation loss = 0.5098308324813843
Validation loss = 0.5078956484794617
Validation loss = 0.5113018751144409
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5195994973182678
Validation loss = 0.5168373584747314
Validation loss = 0.5198979377746582
Validation loss = 0.5147126913070679
Validation loss = 0.5186429023742676
Validation loss = 0.5166810750961304
Validation loss = 0.5169811844825745
Validation loss = 0.5205920338630676
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5167055130004883
Validation loss = 0.5103638768196106
Validation loss = 0.5140761733055115
Validation loss = 0.5128419995307922
Validation loss = 0.5150145888328552
Validation loss = 0.5145000219345093
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5088237524032593
Validation loss = 0.5112805366516113
Validation loss = 0.5100802779197693
Validation loss = 0.5136336088180542
Validation loss = 0.5128819346427917
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0060569351907935
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0066585956416465
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0084694494857833
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0078597339782345
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0078549848942597
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0072463768115942
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0066385033192518
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.006031363088058
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0054249547920433
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0066265060240964
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0072245635159542
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.00661853188929
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.006013229104029
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0054086538461537
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0048048048048048
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0066026410564226
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.005998800239952
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0053956834532374
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0047932893948472
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0047904191616766
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0041891083183723
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0053827751196172
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0047818290496116
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0041816009557945
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0035820895522387
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.35    |
| Iteration     | 65       |
| MaximumReturn | -0.228   |
| MinimumReturn | -6.98    |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5207023024559021
Validation loss = 0.5170194506645203
Validation loss = 0.5154402852058411
Validation loss = 0.5197977423667908
Validation loss = 0.5179359316825867
Validation loss = 0.5198602080345154
Validation loss = 0.5229886174201965
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5118862390518188
Validation loss = 0.5086473226547241
Validation loss = 0.5195158123970032
Validation loss = 0.5140628218650818
Validation loss = 0.515744149684906
Validation loss = 0.5129350423812866
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5186362862586975
Validation loss = 0.5216078758239746
Validation loss = 0.5201953649520874
Validation loss = 0.5203747749328613
Validation loss = 0.5184218883514404
Validation loss = 0.5218706727027893
Validation loss = 0.5245497226715088
Validation loss = 0.5259446501731873
Validation loss = 0.5263475179672241
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5129380226135254
Validation loss = 0.5123372673988342
Validation loss = 0.5113704204559326
Validation loss = 0.5138540267944336
Validation loss = 0.5101088285446167
Validation loss = 0.5167760848999023
Validation loss = 0.517935574054718
Validation loss = 0.5217446088790894
Validation loss = 0.5200130939483643
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.512403130531311
Validation loss = 0.5118274092674255
Validation loss = 0.512762188911438
Validation loss = 0.5181271433830261
Validation loss = 0.5153696537017822
Validation loss = 0.5151733756065369
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0047732696897376
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0041741204531902
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0035756853396902
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.004169148302561
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.005952380952381
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.006543723973825
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0059453032104637
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.005941770647653
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0065320665083135
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0059347181008902
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0065243179122183
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.00592768227623
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0053317535545023
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.004736530491415
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.006508875739645
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0076877587226494
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.008274231678487
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0082693443591257
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0076741440377803
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.008849557522124
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.008254716981132
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0076605774896876
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.007656065959953
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.007651559741024
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0094117647058825
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.17    |
| Iteration     | 66       |
| MaximumReturn | -0.223   |
| MinimumReturn | -27.2    |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5232463479042053
Validation loss = 0.5150248408317566
Validation loss = 0.5181537866592407
Validation loss = 0.5173414349555969
Validation loss = 0.5233925580978394
Validation loss = 0.5222812294960022
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5144628882408142
Validation loss = 0.5130999088287354
Validation loss = 0.5114370584487915
Validation loss = 0.513023853302002
Validation loss = 0.5135549306869507
Validation loss = 0.5164439082145691
Validation loss = 0.5141111016273499
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5236561894416809
Validation loss = 0.5216703414916992
Validation loss = 0.5205537676811218
Validation loss = 0.5227434039115906
Validation loss = 0.5248019099235535
Validation loss = 0.5200178027153015
Validation loss = 0.5266276597976685
Validation loss = 0.5271731615066528
Validation loss = 0.5258917808532715
Validation loss = 0.5291056036949158
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.515735387802124
Validation loss = 0.518072247505188
Validation loss = 0.522121012210846
Validation loss = 0.515661358833313
Validation loss = 0.5201940536499023
Validation loss = 0.5209941267967224
Validation loss = 0.5248803496360779
Validation loss = 0.5219154953956604
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5133669376373291
Validation loss = 0.5111221075057983
Validation loss = 0.5196012854576111
Validation loss = 0.5177432894706726
Validation loss = 0.5128332376480103
Validation loss = 0.5201049447059631
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0088183421516754
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.009988249118684
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.009395184967704
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0088028169014085
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0082111436950147
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0093786635404456
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0105448154657293
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0117096018735363
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0111176126389703
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0105263157894737
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0099357101110462
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0110981308411215
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0116754232340923
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0110851808634773
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0104956268221574
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0110722610722611
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0104834012813046
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0116414435389989
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0110529377545083
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0104651162790699
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0098779779198142
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0104529616724738
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0098665118978525
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0092807424593968
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.008695652173913
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.627   |
| Iteration     | 67       |
| MaximumReturn | -0.125   |
| MinimumReturn | -2.3     |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5226388573646545
Validation loss = 0.5173366665840149
Validation loss = 0.5237209796905518
Validation loss = 0.5217097997665405
Validation loss = 0.5286152958869934
Validation loss = 0.5251601934432983
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5217652916908264
Validation loss = 0.5129052996635437
Validation loss = 0.5138314366340637
Validation loss = 0.5185542106628418
Validation loss = 0.5211220383644104
Validation loss = 0.51950603723526
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5300384759902954
Validation loss = 0.5217788219451904
Validation loss = 0.5284339785575867
Validation loss = 0.5239377617835999
Validation loss = 0.5258538126945496
Validation loss = 0.5253469347953796
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5214806199073792
Validation loss = 0.5215856432914734
Validation loss = 0.5257920622825623
Validation loss = 0.5228284001350403
Validation loss = 0.524774432182312
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5183559060096741
Validation loss = 0.5170484185218811
Validation loss = 0.5168484449386597
Validation loss = 0.5147963166236877
Validation loss = 0.5198468565940857
Validation loss = 0.5204935073852539
Validation loss = 0.5232807993888855
Validation loss = 0.5213097929954529
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0081112398609502
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0092646207295888
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.0133101851851851
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.012724117987276
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0121387283236993
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0115540150202196
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0109699769053118
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0103866128101557
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0098039215686274
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0097982708933717
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.01036866359447
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0097869890616005
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0092059838895282
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0086256469235193
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.013793103448276
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0149339460080413
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0143513203214696
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0160642570281124
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0154816513761469
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0148997134670488
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0166093928980526
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0171722953634803
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0200228832951945
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0194396798170382
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0217142857142858
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.326   |
| Iteration     | 68       |
| MaximumReturn | -0.0804  |
| MinimumReturn | -0.816   |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5209406018257141
Validation loss = 0.5207057595252991
Validation loss = 0.5192610621452332
Validation loss = 0.5211336016654968
Validation loss = 0.5210263729095459
Validation loss = 0.5227043032646179
Validation loss = 0.5235225558280945
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5229488611221313
Validation loss = 0.5133280754089355
Validation loss = 0.5143255591392517
Validation loss = 0.5206253528594971
Validation loss = 0.5153365731239319
Validation loss = 0.515921950340271
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5223073959350586
Validation loss = 0.5215235948562622
Validation loss = 0.5266954302787781
Validation loss = 0.5291849374771118
Validation loss = 0.5232280492782593
Validation loss = 0.5205132961273193
Validation loss = 0.5265174508094788
Validation loss = 0.5255993008613586
Validation loss = 0.5283989310264587
Validation loss = 0.5300746560096741
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5195159316062927
Validation loss = 0.5173129439353943
Validation loss = 0.5206717252731323
Validation loss = 0.5189018845558167
Validation loss = 0.5193105936050415
Validation loss = 0.5228304862976074
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5276150703430176
Validation loss = 0.5159918069839478
Validation loss = 0.517944872379303
Validation loss = 0.5180617570877075
Validation loss = 0.522270679473877
Validation loss = 0.5205570459365845
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.02170188463735
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0216894977168949
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0239589275527667
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.023945267958951
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0233618233618234
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.0267653758542141
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0278884462151394
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.027872582480091
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0284252416145536
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0289772727272728
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.028960817717206
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0283768444948922
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.027793533749291
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.028344671201814
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.0322946175637393
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0328425821064553
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0350877192982457
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.036764705882353
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0390050876201244
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0389830508474576
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0406549971767363
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0411963882618511
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0406091370558375
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0417136414881623
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0445070422535212
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.5    |
| Iteration     | 69       |
| MaximumReturn | -0.15    |
| MinimumReturn | -67.3    |
| TotalSamples  | 118286   |
----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5267220139503479
Validation loss = 0.5203431844711304
Validation loss = 0.5267447233200073
Validation loss = 0.5260144472122192
Validation loss = 0.5232822895050049
Validation loss = 0.5253603458404541
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5135365724563599
Validation loss = 0.5130780935287476
Validation loss = 0.5169267058372498
Validation loss = 0.5183178782463074
Validation loss = 0.5179739594459534
Validation loss = 0.5227704048156738
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5271589756011963
Validation loss = 0.5239229202270508
Validation loss = 0.5273281335830688
Validation loss = 0.5227903723716736
Validation loss = 0.5289307832717896
Validation loss = 0.5264527797698975
Validation loss = 0.5325284600257874
Validation loss = 0.5350531339645386
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5162160992622375
Validation loss = 0.5178089737892151
Validation loss = 0.5176907181739807
Validation loss = 0.5256990790367126
Validation loss = 0.5242288112640381
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.526393473148346
Validation loss = 0.5199803113937378
Validation loss = 0.5203415751457214
Validation loss = 0.5247126817703247
Validation loss = 0.5222974419593811
Validation loss = 0.5204260349273682
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0456081081081081
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0467079347214405
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.047244094488189
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.047779651489601
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.049438202247191
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0522178551375632
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0527497194163862
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.055524397083567
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0582959641255605
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.057703081232493
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0582306830907056
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0609960828203693
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.063199105145414
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0653996646171044
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0659217877094973
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.065326633165829
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0669642857142858
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0669269380925823
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.069119286510591
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.071309192200557
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0723830734966593
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0717863105175292
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.07119021134594
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0728182323513062
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.076111111111111
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.87    |
| Iteration     | 70       |
| MaximumReturn | -0.203   |
| MinimumReturn | -43.6    |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5245593786239624
Validation loss = 0.5267449617385864
Validation loss = 0.526390016078949
Validation loss = 0.5283725261688232
Validation loss = 0.5253685712814331
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5224071145057678
Validation loss = 0.5223295092582703
Validation loss = 0.5188812017440796
Validation loss = 0.5247480273246765
Validation loss = 0.5208441019058228
Validation loss = 0.5300915837287903
Validation loss = 0.5228791236877441
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5356169939041138
Validation loss = 0.5316831469535828
Validation loss = 0.5278661847114563
Validation loss = 0.5302215218544006
Validation loss = 0.5320927500724792
Validation loss = 0.5293239951133728
Validation loss = 0.5320821404457092
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5248550772666931
Validation loss = 0.5214727520942688
Validation loss = 0.5251634120941162
Validation loss = 0.5245820879936218
Validation loss = 0.5237311124801636
Validation loss = 0.523104727268219
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5206723809242249
Validation loss = 0.5215699076652527
Validation loss = 0.520572304725647
Validation loss = 0.5251774191856384
Validation loss = 0.5239240527153015
Validation loss = 0.5271776914596558
Validation loss = 0.5278914570808411
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.076068850638534
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0776914539400666
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0798668885191347
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0803769401330376
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.080332409972299
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.080841638981174
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.081350304371887
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0807522123893805
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.080154781647319
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0795580110497238
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0800662617338488
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0827814569536425
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0838389409817981
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0848952590959207
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.084297520661157
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0859030837004404
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0869565217391304
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0874587458745875
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0874106652006597
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.089010989010989
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0884129599121362
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0900109769484083
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0894130554031816
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0904605263157894
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0904109589041096
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7       |
| Iteration     | 71       |
| MaximumReturn | -0.575   |
| MinimumReturn | -67.4    |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5211547613143921
Validation loss = 0.5221827030181885
Validation loss = 0.5285153985023499
Validation loss = 0.5264776945114136
Validation loss = 0.5282219648361206
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5184024572372437
Validation loss = 0.5196906328201294
Validation loss = 0.5202325582504272
Validation loss = 0.5244966745376587
Validation loss = 0.5257071852684021
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5333619713783264
Validation loss = 0.5285166501998901
Validation loss = 0.5314474701881409
Validation loss = 0.5325177311897278
Validation loss = 0.5314458012580872
Validation loss = 0.5318394303321838
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5242717862129211
Validation loss = 0.5236837863922119
Validation loss = 0.526814877986908
Validation loss = 0.5224236249923706
Validation loss = 0.5246034264564514
Validation loss = 0.5278365015983582
Validation loss = 0.5221647024154663
Validation loss = 0.526229739189148
Validation loss = 0.5263367295265198
Validation loss = 0.5258012413978577
Validation loss = 0.5315563678741455
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.524243950843811
Validation loss = 0.5221414566040039
Validation loss = 0.5243855714797974
Validation loss = 0.5280006527900696
Validation loss = 0.5246736407279968
Validation loss = 0.5285650491714478
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0914567360350493
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0908593322386426
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0913566739606126
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0907599781301258
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0923497267759563
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0933915892954669
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0938864628820961
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0949263502454991
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0959651035986915
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.095367847411444
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.0985838779956427
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0990745781164943
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0990206746463547
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.099510603588907
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0989130434782608
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0983161325366648
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0982627578718784
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.097666847531199
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0970715835140998
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0964769647696477
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0969664138678223
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0990795885219276
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0995670995670996
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1005949161709032
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19      |
| Iteration     | 72       |
| MaximumReturn | -1.64    |
| MinimumReturn | -113     |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5273910164833069
Validation loss = 0.5211907029151917
Validation loss = 0.5275464653968811
Validation loss = 0.5241684317588806
Validation loss = 0.5247077345848083
Validation loss = 0.5263300538063049
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5179355144500732
Validation loss = 0.5192961096763611
Validation loss = 0.5211026668548584
Validation loss = 0.5196313261985779
Validation loss = 0.523449718952179
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5307245254516602
Validation loss = 0.5301171541213989
Validation loss = 0.5303264856338501
Validation loss = 0.5311569571495056
Validation loss = 0.5279132127761841
Validation loss = 0.5329000949859619
Validation loss = 0.5331305265426636
Validation loss = 0.5334680676460266
Validation loss = 0.5337308049201965
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5278255939483643
Validation loss = 0.5269684791564941
Validation loss = 0.5264188051223755
Validation loss = 0.5284480452537537
Validation loss = 0.5283524990081787
Validation loss = 0.5256727337837219
Validation loss = 0.5281768441200256
Validation loss = 0.5275360941886902
Validation loss = 0.5315877795219421
Validation loss = 0.5296120047569275
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.532560408115387
Validation loss = 0.5215604305267334
Validation loss = 0.5270779728889465
Validation loss = 0.5238445401191711
Validation loss = 0.5266347527503967
Validation loss = 0.5244385004043579
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0994057266342518
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.099892008639309
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0992984349703183
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.098705501618123
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0997304582210243
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.099676724137931
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0996230479267637
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.10010764262648
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.099515868746638
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0999462654486836
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0993555316863588
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1009125067096082
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1003218884120172
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1013404825737265
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1012861736334405
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1017675415104446
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1027837259100643
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1021936864633495
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1016042780748663
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1015499732763228
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1009615384615385
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1014415376401494
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1029882604055496
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1024
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.69    |
| Iteration     | 73       |
| MaximumReturn | -0.862   |
| MinimumReturn | -10.7    |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5312144756317139
Validation loss = 0.5305324196815491
Validation loss = 0.526565432548523
Validation loss = 0.5300045609474182
Validation loss = 0.5322520732879639
Validation loss = 0.5286772847175598
Validation loss = 0.5318961143493652
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5259800553321838
Validation loss = 0.5237785577774048
Validation loss = 0.5276510119438171
Validation loss = 0.525116503238678
Validation loss = 0.5242530107498169
Validation loss = 0.5247138142585754
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5365721583366394
Validation loss = 0.5382316708564758
Validation loss = 0.5365261435508728
Validation loss = 0.5365016460418701
Validation loss = 0.5351241827011108
Validation loss = 0.5385884642601013
Validation loss = 0.5386227965354919
Validation loss = 0.5366779565811157
Validation loss = 0.5376495718955994
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5328876376152039
Validation loss = 0.5300745964050293
Validation loss = 0.5309826731681824
Validation loss = 0.5279175043106079
Validation loss = 0.5292229056358337
Validation loss = 0.5344757437705994
Validation loss = 0.53387451171875
Validation loss = 0.5362605452537537
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5274943709373474
Validation loss = 0.526268720626831
Validation loss = 0.5316936373710632
Validation loss = 0.5304314494132996
Validation loss = 0.5294644832611084
Validation loss = 0.5296468138694763
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.10181236673774
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1022908897176344
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1017039403620874
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1027142096860032
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1026595744680852
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.103136629452419
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1025504782146653
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1019649495485926
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1029723991507432
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1045092838196287
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1039236479321315
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1033386327503976
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.103813559322034
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.10428798305982
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.1068783068783068
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.106292966684294
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.105708245243129
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1051241415742208
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.105596620908131
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1065963060686015
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1070675105485233
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1075382182393252
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1069546891464699
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1084781463928384
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.11
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.34    |
| Iteration     | 74       |
| MaximumReturn | -0.255   |
| MinimumReturn | -17.3    |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.532505989074707
Validation loss = 0.5292137861251831
Validation loss = 0.5353609323501587
Validation loss = 0.5350467562675476
Validation loss = 0.533366322517395
Validation loss = 0.5347855091094971
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5289885997772217
Validation loss = 0.527110755443573
Validation loss = 0.5252640247344971
Validation loss = 0.5286878943443298
Validation loss = 0.5353099703788757
Validation loss = 0.5275413990020752
Validation loss = 0.5327705144882202
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.536875307559967
Validation loss = 0.5391594171524048
Validation loss = 0.5358177423477173
Validation loss = 0.5362924933433533
Validation loss = 0.5464399456977844
Validation loss = 0.5416220426559448
Validation loss = 0.5400863289833069
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5353312492370605
Validation loss = 0.5340962409973145
Validation loss = 0.5313591361045837
Validation loss = 0.5386572480201721
Validation loss = 0.5398254990577698
Validation loss = 0.5421105027198792
Validation loss = 0.5360893607139587
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5324695110321045
Validation loss = 0.5333414673805237
Validation loss = 0.5385204553604126
Validation loss = 0.5340556502342224
Validation loss = 0.5330971479415894
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.1125723303524462
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.11198738170347
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1114030478192327
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1108193277310925
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.110236220472441
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1096537250786989
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1106449921342423
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.110062893081761
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.110005238344683
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1099476439790577
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1104133961276819
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1114016736401673
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1118661787767903
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1112852664576802
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1112271540469973
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1106471816283925
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1121544079290557
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1120959332638165
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1115164147993746
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1109375
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1103591879229568
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1113423517169616
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.110764430577223
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1101871101871101
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1096103896103897
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10.4    |
| Iteration     | 75       |
| MaximumReturn | -0.177   |
| MinimumReturn | -50      |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5287685394287109
Validation loss = 0.5336347818374634
Validation loss = 0.5320454835891724
Validation loss = 0.5336881875991821
Validation loss = 0.5335445404052734
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5309497117996216
Validation loss = 0.5287065505981445
Validation loss = 0.5280337333679199
Validation loss = 0.5318517684936523
Validation loss = 0.5332791805267334
Validation loss = 0.5321958065032959
Validation loss = 0.5310829877853394
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5404363870620728
Validation loss = 0.539747416973114
Validation loss = 0.5371558666229248
Validation loss = 0.5390805006027222
Validation loss = 0.5397460460662842
Validation loss = 0.5401644706726074
Validation loss = 0.5417500734329224
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5374512672424316
Validation loss = 0.5360084772109985
Validation loss = 0.5345717072486877
Validation loss = 0.5358289480209351
Validation loss = 0.5394776463508606
Validation loss = 0.5383847951889038
Validation loss = 0.535714328289032
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5331171751022339
Validation loss = 0.5274018049240112
Validation loss = 0.5355837345123291
Validation loss = 0.5396082401275635
Validation loss = 0.5364140868186951
Validation loss = 0.5309627056121826
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1090342679127725
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1110534509600416
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1109958506224067
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.110938310005184
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.110362694300518
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.110305541170378
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1107660455486543
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1107087428867046
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1122026887280247
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1131782945736435
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1131198347107438
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1125451729478575
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1135190918472653
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.112944816915936
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1123711340206186
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1117980422462648
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1122554067971164
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1132269686052496
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1152263374485596
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1146529562982006
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1151079136690647
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.115562403697997
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1149897330595482
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1144176500769625
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.113846153846154
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.08    |
| Iteration     | 76       |
| MaximumReturn | -0.107   |
| MinimumReturn | -3.84    |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5359987020492554
Validation loss = 0.5359525680541992
Validation loss = 0.5339674949645996
Validation loss = 0.5363532900810242
Validation loss = 0.5368070602416992
Validation loss = 0.5378060936927795
Validation loss = 0.5428027510643005
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5341854691505432
Validation loss = 0.5323948264122009
Validation loss = 0.5341729521751404
Validation loss = 0.5412093997001648
Validation loss = 0.533541738986969
Validation loss = 0.5340070724487305
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5394079685211182
Validation loss = 0.5379008650779724
Validation loss = 0.5417346358299255
Validation loss = 0.546205461025238
Validation loss = 0.5412048697471619
Validation loss = 0.5425234436988831
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.539931058883667
Validation loss = 0.5415078401565552
Validation loss = 0.537838339805603
Validation loss = 0.5420703887939453
Validation loss = 0.5429964065551758
Validation loss = 0.5408708453178406
Validation loss = 0.5401389598846436
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5387946963310242
Validation loss = 0.5319002866744995
Validation loss = 0.5338433384895325
Validation loss = 0.5414826273918152
Validation loss = 0.534755289554596
Validation loss = 0.5370532274246216
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1137878011276268
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.115266393442623
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.114695340501792
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1141248720573182
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1135549872122763
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1134969325153374
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1129279509453245
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1123595505617978
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.113323124042879
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1153061224489795
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1172870984191738
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1167176350662589
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1171676006113092
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1176171079429735
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1170483460559797
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1169888097660223
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1174377224199288
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.116869918699187
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1183341797866937
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.119289340101523
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1187214611872147
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.118661257606491
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1180942726811962
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1185410334346504
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1179746835443038
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.42    |
| Iteration     | 77       |
| MaximumReturn | -0.148   |
| MinimumReturn | -32.1    |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5398203730583191
Validation loss = 0.5368028879165649
Validation loss = 0.5357211828231812
Validation loss = 0.5400615334510803
Validation loss = 0.5388977527618408
Validation loss = 0.5383857488632202
Validation loss = 0.5376567840576172
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5349249839782715
Validation loss = 0.5343434810638428
Validation loss = 0.5360572338104248
Validation loss = 0.5378748774528503
Validation loss = 0.5347655415534973
Validation loss = 0.5358777046203613
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5400644540786743
Validation loss = 0.544264018535614
Validation loss = 0.5419825315475464
Validation loss = 0.542397141456604
Validation loss = 0.5432490706443787
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5414319038391113
Validation loss = 0.5397921204566956
Validation loss = 0.5406149625778198
Validation loss = 0.5380292534828186
Validation loss = 0.5403167009353638
Validation loss = 0.5432455539703369
Validation loss = 0.5406432151794434
Validation loss = 0.5429179072380066
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5361112952232361
Validation loss = 0.5399066209793091
Validation loss = 0.5342394709587097
Validation loss = 0.5373166799545288
Validation loss = 0.5391629934310913
Validation loss = 0.5414263010025024
Validation loss = 0.5424246191978455
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.117914979757085
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1198786039453719
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1203235591506573
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1197574532592218
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1191919191919193
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1201413427561837
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1195761856710393
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.119011598587998
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1184475806451613
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1178841309823677
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1183282980866063
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1192752893809763
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.118712273641851
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1191553544494721
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1195979899497488
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1190356604721245
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1194779116465863
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1214249874560964
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.123370110330993
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.124310776942356
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.12374749498998
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1251877816725087
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.127127127127127
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1265632816408204
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.126
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.852   |
| Iteration     | 78       |
| MaximumReturn | -0.242   |
| MinimumReturn | -3.11    |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5420020222663879
Validation loss = 0.5382385849952698
Validation loss = 0.5387490391731262
Validation loss = 0.5432486534118652
Validation loss = 0.5403937697410583
Validation loss = 0.5426603555679321
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5375200510025024
Validation loss = 0.5388156771659851
Validation loss = 0.5365922451019287
Validation loss = 0.5395599603652954
Validation loss = 0.5322184562683105
Validation loss = 0.5378614068031311
Validation loss = 0.5383347272872925
Validation loss = 0.5387554168701172
Validation loss = 0.5437176823616028
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5457023978233337
Validation loss = 0.5411522388458252
Validation loss = 0.5474462509155273
Validation loss = 0.5423712134361267
Validation loss = 0.5431056022644043
Validation loss = 0.5443540811538696
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5441597700119019
Validation loss = 0.5419435501098633
Validation loss = 0.5401877760887146
Validation loss = 0.5404770970344543
Validation loss = 0.5415496826171875
Validation loss = 0.5433107614517212
Validation loss = 0.5428359508514404
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5427260398864746
Validation loss = 0.5373532772064209
Validation loss = 0.5384070873260498
Validation loss = 0.539993166923523
Validation loss = 0.5423024892807007
Validation loss = 0.5400808453559875
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1274362818590704
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1273726273726274
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.127808287568647
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.129241516966068
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1301745635910225
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1311066799601197
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1325361235675138
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1324701195219125
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.132404181184669
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1338308457711443
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1357533565390352
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1351888667992047
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.137605563835072
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1385302879841113
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.139454094292804
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1403769841269842
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.143282102131879
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.143211100099108
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1441307578008915
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.147029702970297
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1464621474517565
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1473788328387735
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1473059812160158
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.148715415019763
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1506172839506172
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -23.5    |
| Iteration     | 79       |
| MaximumReturn | -0.276   |
| MinimumReturn | -65.1    |
| TotalSamples  | 134946   |
----------------------------
