Logging to experiments/invertedPendulum/nov1/w350e3_seed2531
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7086811661720276
Validation loss = 0.3953007161617279
Validation loss = 0.3593566417694092
Validation loss = 0.32590940594673157
Validation loss = 0.30718645453453064
Validation loss = 0.2948054075241089
Validation loss = 0.2639285922050476
Validation loss = 0.25493597984313965
Validation loss = 0.24797028303146362
Validation loss = 0.2299869805574417
Validation loss = 0.20840038359165192
Validation loss = 0.2123941034078598
Validation loss = 0.21356645226478577
Validation loss = 0.19268159568309784
Validation loss = 0.1986609250307083
Validation loss = 0.17490097880363464
Validation loss = 0.1719202995300293
Validation loss = 0.15596753358840942
Validation loss = 0.15357226133346558
Validation loss = 0.14441785216331482
Validation loss = 0.13615848124027252
Validation loss = 0.13142257928848267
Validation loss = 0.1316971778869629
Validation loss = 0.11802606284618378
Validation loss = 0.12395477294921875
Validation loss = 0.12599600851535797
Validation loss = 0.11700049042701721
Validation loss = 0.11760707199573517
Validation loss = 0.13961990177631378
Validation loss = 0.11269591003656387
Validation loss = 0.11485797166824341
Validation loss = 0.11310607939958572
Validation loss = 0.12814658880233765
Validation loss = 0.11088444292545319
Validation loss = 0.11176110804080963
Validation loss = 0.10342542827129364
Validation loss = 0.10411464422941208
Validation loss = 0.10538829118013382
Validation loss = 0.0991482362151146
Validation loss = 0.12266133725643158
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.703129768371582
Validation loss = 0.39154356718063354
Validation loss = 0.3514879643917084
Validation loss = 0.3265402317047119
Validation loss = 0.305421382188797
Validation loss = 0.28520944714546204
Validation loss = 0.2588570713996887
Validation loss = 0.24741387367248535
Validation loss = 0.23778364062309265
Validation loss = 0.21466843783855438
Validation loss = 0.2100917100906372
Validation loss = 0.18609997630119324
Validation loss = 0.1824295073747635
Validation loss = 0.17739573121070862
Validation loss = 0.17547596991062164
Validation loss = 0.16989287734031677
Validation loss = 0.16283568739891052
Validation loss = 0.14724555611610413
Validation loss = 0.16019116342067719
Validation loss = 0.13851509988307953
Validation loss = 0.15057732164859772
Validation loss = 0.15117187798023224
Validation loss = 0.1403411328792572
Validation loss = 0.1323196291923523
Validation loss = 0.11876525729894638
Validation loss = 0.13348576426506042
Validation loss = 0.11921949684619904
Validation loss = 0.11337718367576599
Validation loss = 0.10427349805831909
Validation loss = 0.12793543934822083
Validation loss = 0.11633355915546417
Validation loss = 0.11407662183046341
Validation loss = 0.10468564927577972
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.719753623008728
Validation loss = 0.37368762493133545
Validation loss = 0.3442095220088959
Validation loss = 0.32241761684417725
Validation loss = 0.3050122857093811
Validation loss = 0.27340590953826904
Validation loss = 0.254501610994339
Validation loss = 0.23643143475055695
Validation loss = 0.23910580575466156
Validation loss = 0.22855626046657562
Validation loss = 0.22948774695396423
Validation loss = 0.20440909266471863
Validation loss = 0.19266967475414276
Validation loss = 0.18811841309070587
Validation loss = 0.19364036619663239
Validation loss = 0.17636483907699585
Validation loss = 0.16727890074253082
Validation loss = 0.1594407558441162
Validation loss = 0.15593984723091125
Validation loss = 0.15229210257530212
Validation loss = 0.17395281791687012
Validation loss = 0.14365266263484955
Validation loss = 0.14565129578113556
Validation loss = 0.14252476394176483
Validation loss = 0.15005889534950256
Validation loss = 0.13530834019184113
Validation loss = 0.14304935932159424
Validation loss = 0.13820193707942963
Validation loss = 0.12031131982803345
Validation loss = 0.12201999127864838
Validation loss = 0.1147845908999443
Validation loss = 0.12954263389110565
Validation loss = 0.12336458265781403
Validation loss = 0.11576440930366516
Validation loss = 0.11431149393320084
Validation loss = 0.11914820969104767
Validation loss = 0.11889608949422836
Validation loss = 0.10692007094621658
Validation loss = 0.11892848461866379
Validation loss = 0.10764271020889282
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7009303569793701
Validation loss = 0.38443028926849365
Validation loss = 0.34653225541114807
Validation loss = 0.31822770833969116
Validation loss = 0.3006443381309509
Validation loss = 0.270549476146698
Validation loss = 0.25437238812446594
Validation loss = 0.2324308604001999
Validation loss = 0.22361157834529877
Validation loss = 0.21172913908958435
Validation loss = 0.22382177412509918
Validation loss = 0.22221054136753082
Validation loss = 0.19493895769119263
Validation loss = 0.1841977834701538
Validation loss = 0.1838223785161972
Validation loss = 0.18040089309215546
Validation loss = 0.16340692341327667
Validation loss = 0.15443843603134155
Validation loss = 0.15547317266464233
Validation loss = 0.1477057933807373
Validation loss = 0.13862080872058868
Validation loss = 0.1353301852941513
Validation loss = 0.13516074419021606
Validation loss = 0.1296503096818924
Validation loss = 0.14506413042545319
Validation loss = 0.13247528672218323
Validation loss = 0.12900584936141968
Validation loss = 0.11385490745306015
Validation loss = 0.12485560774803162
Validation loss = 0.11936315894126892
Validation loss = 0.11809235066175461
Validation loss = 0.12839743494987488
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7230064868927002
Validation loss = 0.36952388286590576
Validation loss = 0.35737985372543335
Validation loss = 0.3359408974647522
Validation loss = 0.3142234981060028
Validation loss = 0.2793353796005249
Validation loss = 0.2659153640270233
Validation loss = 0.23322078585624695
Validation loss = 0.22086316347122192
Validation loss = 0.20916271209716797
Validation loss = 0.2103155106306076
Validation loss = 0.20019245147705078
Validation loss = 0.18097130954265594
Validation loss = 0.19060660898685455
Validation loss = 0.18241220712661743
Validation loss = 0.17260782420635223
Validation loss = 0.16809579730033875
Validation loss = 0.15050671994686127
Validation loss = 0.16143257915973663
Validation loss = 0.16051948070526123
Validation loss = 0.15602025389671326
Validation loss = 0.14463359117507935
Validation loss = 0.1521688997745514
Validation loss = 0.15922415256500244
Validation loss = 0.14508798718452454
Validation loss = 0.14653325080871582
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.22    |
| Iteration     | 0        |
| MaximumReturn | -0.0257  |
| MinimumReturn | -30.2    |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.41279110312461853
Validation loss = 0.2924133241176605
Validation loss = 0.23475973308086395
Validation loss = 0.21687652170658112
Validation loss = 0.1871422380208969
Validation loss = 0.17345687747001648
Validation loss = 0.16784490644931793
Validation loss = 0.15225212275981903
Validation loss = 0.15800073742866516
Validation loss = 0.14945082366466522
Validation loss = 0.15844151377677917
Validation loss = 0.16343997418880463
Validation loss = 0.15410013496875763
Validation loss = 0.1484479457139969
Validation loss = 0.14484478533267975
Validation loss = 0.14565370976924896
Validation loss = 0.14527557790279388
Validation loss = 0.14793026447296143
Validation loss = 0.1455625742673874
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3967353403568268
Validation loss = 0.2575661540031433
Validation loss = 0.2188856452703476
Validation loss = 0.1894608587026596
Validation loss = 0.17952017486095428
Validation loss = 0.1797097623348236
Validation loss = 0.18241865932941437
Validation loss = 0.17070211470127106
Validation loss = 0.15667422115802765
Validation loss = 0.1641998589038849
Validation loss = 0.17322099208831787
Validation loss = 0.15798942744731903
Validation loss = 0.14562277495861053
Validation loss = 0.1340680718421936
Validation loss = 0.13616614043712616
Validation loss = 0.1469641774892807
Validation loss = 0.1373066008090973
Validation loss = 0.13971012830734253
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.40063053369522095
Validation loss = 0.2818162739276886
Validation loss = 0.231515035033226
Validation loss = 0.2058718204498291
Validation loss = 0.18814469873905182
Validation loss = 0.16596341133117676
Validation loss = 0.16737991571426392
Validation loss = 0.2271207571029663
Validation loss = 0.17024047672748566
Validation loss = 0.16378426551818848
Validation loss = 0.15437611937522888
Validation loss = 0.15540069341659546
Validation loss = 0.14620313048362732
Validation loss = 0.15251702070236206
Validation loss = 0.1447358876466751
Validation loss = 0.14077620208263397
Validation loss = 0.14504101872444153
Validation loss = 0.1381027102470398
Validation loss = 0.1446097195148468
Validation loss = 0.14703474938869476
Validation loss = 0.1365748643875122
Validation loss = 0.14336177706718445
Validation loss = 0.1460755616426468
Validation loss = 0.14754122495651245
Validation loss = 0.14316897094249725
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4762708246707916
Validation loss = 0.3067326545715332
Validation loss = 0.25729304552078247
Validation loss = 0.23583896458148956
Validation loss = 0.21089711785316467
Validation loss = 0.20988260209560394
Validation loss = 0.197354257106781
Validation loss = 0.17099082469940186
Validation loss = 0.17387208342552185
Validation loss = 0.16439396142959595
Validation loss = 0.16449032723903656
Validation loss = 0.15198835730552673
Validation loss = 0.15789510309696198
Validation loss = 0.1605294793844223
Validation loss = 0.15556707978248596
Validation loss = 0.16906271874904633
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.38692858815193176
Validation loss = 0.220992311835289
Validation loss = 0.1939294934272766
Validation loss = 0.17471808195114136
Validation loss = 0.1722884476184845
Validation loss = 0.16183529794216156
Validation loss = 0.1644463986158371
Validation loss = 0.15614211559295654
Validation loss = 0.15283557772636414
Validation loss = 0.16647303104400635
Validation loss = 0.18174853920936584
Validation loss = 0.16691353917121887
Validation loss = 0.13674674928188324
Validation loss = 0.1442389041185379
Validation loss = 0.1479656994342804
Validation loss = 0.13894478976726532
Validation loss = 0.1456076204776764
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.015384615384615385
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.015151515151515152
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014925373134328358
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014705882352941176
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014492753623188406
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014285714285714285
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.014084507042253521
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013888888888888888
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0136986301369863
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013513513513513514
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -44.3    |
| Iteration     | 1        |
| MaximumReturn | -0.709   |
| MinimumReturn | -86.6    |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.22357963025569916
Validation loss = 0.16531167924404144
Validation loss = 0.1449565887451172
Validation loss = 0.12091352790594101
Validation loss = 0.11758530884981155
Validation loss = 0.1090671569108963
Validation loss = 0.09653282165527344
Validation loss = 0.09796950966119766
Validation loss = 0.0980733186006546
Validation loss = 0.10182119905948639
Validation loss = 0.097493015229702
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.19647391140460968
Validation loss = 0.1518312692642212
Validation loss = 0.12794114649295807
Validation loss = 0.10638248920440674
Validation loss = 0.10107024759054184
Validation loss = 0.10718153417110443
Validation loss = 0.10155095905065536
Validation loss = 0.09174217283725739
Validation loss = 0.084377720952034
Validation loss = 0.08927036076784134
Validation loss = 0.08536630868911743
Validation loss = 0.08230255544185638
Validation loss = 0.09688588976860046
Validation loss = 0.08069464564323425
Validation loss = 0.07845747470855713
Validation loss = 0.08308488130569458
Validation loss = 0.0854349285364151
Validation loss = 0.07790244370698929
Validation loss = 0.0782589390873909
Validation loss = 0.07844804972410202
Validation loss = 0.0757756307721138
Validation loss = 0.07884486019611359
Validation loss = 0.07859556376934052
Validation loss = 0.08567342907190323
Validation loss = 0.08230117708444595
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2463420182466507
Validation loss = 0.15095394849777222
Validation loss = 0.12425272166728973
Validation loss = 0.1224885806441307
Validation loss = 0.10343075543642044
Validation loss = 0.09727764129638672
Validation loss = 0.10595875978469849
Validation loss = 0.11727838218212128
Validation loss = 0.09577882289886475
Validation loss = 0.08668462932109833
Validation loss = 0.0908929854631424
Validation loss = 0.1031007468700409
Validation loss = 0.09225821495056152
Validation loss = 0.08495348691940308
Validation loss = 0.1110178753733635
Validation loss = 0.10482532531023026
Validation loss = 0.0957966297864914
Validation loss = 0.08565254509449005
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.21226045489311218
Validation loss = 0.13990487158298492
Validation loss = 0.12489256262779236
Validation loss = 0.11731930077075958
Validation loss = 0.1042369157075882
Validation loss = 0.10209506750106812
Validation loss = 0.11107596755027771
Validation loss = 0.09674052894115448
Validation loss = 0.10492877662181854
Validation loss = 0.09229961782693863
Validation loss = 0.09315335750579834
Validation loss = 0.09259980916976929
Validation loss = 0.09722794592380524
Validation loss = 0.09009307622909546
Validation loss = 0.0828186571598053
Validation loss = 0.09367729723453522
Validation loss = 0.08434223383665085
Validation loss = 0.09162575006484985
Validation loss = 0.08525323122739792
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2165551483631134
Validation loss = 0.15466290712356567
Validation loss = 0.13661052286624908
Validation loss = 0.12025144696235657
Validation loss = 0.1117800772190094
Validation loss = 0.09796090424060822
Validation loss = 0.09237886965274811
Validation loss = 0.09577734023332596
Validation loss = 0.10593148320913315
Validation loss = 0.08979532122612
Validation loss = 0.09378266334533691
Validation loss = 0.09022603183984756
Validation loss = 0.09201065450906754
Validation loss = 0.08513576537370682
Validation loss = 0.0882842093706131
Validation loss = 0.0883115828037262
Validation loss = 0.08292663842439651
Validation loss = 0.08288466185331345
Validation loss = 0.08516230434179306
Validation loss = 0.08634507656097412
Validation loss = 0.08719094842672348
Validation loss = 0.08722476661205292
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.013157894736842105
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.012987012987012988
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.01282051282051282
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.02531645569620253
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.025
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.024691358024691357
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.036585365853658534
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03614457831325301
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.047619047619047616
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.058823529411764705
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.06976744186046512
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06896551724137931
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06818181818181818
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06741573033707865
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06666666666666667
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06593406593406594
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06521739130434782
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.07526881720430108
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.0851063829787234
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08421052631578947
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09375
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09278350515463918
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09183673469387756
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09090909090909091
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.59    |
| Iteration     | 2        |
| MaximumReturn | -0.053   |
| MinimumReturn | -26.2    |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.14461828768253326
Validation loss = 0.11419913917779922
Validation loss = 0.08646741509437561
Validation loss = 0.0814557820558548
Validation loss = 0.08320093154907227
Validation loss = 0.07534629106521606
Validation loss = 0.08084817975759506
Validation loss = 0.08330755680799484
Validation loss = 0.08196297287940979
Validation loss = 0.07093339413404465
Validation loss = 0.0745735689997673
Validation loss = 0.07058900594711304
Validation loss = 0.0717337504029274
Validation loss = 0.06877648085355759
Validation loss = 0.06838224828243256
Validation loss = 0.06955599039793015
Validation loss = 0.06725212186574936
Validation loss = 0.07559030503034592
Validation loss = 0.07029616087675095
Validation loss = 0.0748249739408493
Validation loss = 0.06476756185293198
Validation loss = 0.06530656665563583
Validation loss = 0.07069838792085648
Validation loss = 0.07823982834815979
Validation loss = 0.06417777389287949
Validation loss = 0.07092610746622086
Validation loss = 0.07753414660692215
Validation loss = 0.0751231238245964
Validation loss = 0.06918751448392868
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.12994591891765594
Validation loss = 0.10219947248697281
Validation loss = 0.09282192587852478
Validation loss = 0.08318424224853516
Validation loss = 0.08893346786499023
Validation loss = 0.07848931849002838
Validation loss = 0.09186672419309616
Validation loss = 0.07202346622943878
Validation loss = 0.0682528018951416
Validation loss = 0.07096726447343826
Validation loss = 0.08841625601053238
Validation loss = 0.07526160776615143
Validation loss = 0.06700076907873154
Validation loss = 0.06582403182983398
Validation loss = 0.06218373775482178
Validation loss = 0.07583562284708023
Validation loss = 0.07039887458086014
Validation loss = 0.06432891637086868
Validation loss = 0.06540215760469437
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1617363840341568
Validation loss = 0.10595271736383438
Validation loss = 0.09542527794837952
Validation loss = 0.09333052486181259
Validation loss = 0.08448304980993271
Validation loss = 0.083375483751297
Validation loss = 0.07295536249876022
Validation loss = 0.07616917043924332
Validation loss = 0.07903400808572769
Validation loss = 0.08054659515619278
Validation loss = 0.07796858251094818
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.16685514152050018
Validation loss = 0.10581029206514359
Validation loss = 0.10916920751333237
Validation loss = 0.09125500917434692
Validation loss = 0.08651302009820938
Validation loss = 0.09179586172103882
Validation loss = 0.08180419355630875
Validation loss = 0.08819765597581863
Validation loss = 0.08481226116418839
Validation loss = 0.07493136078119278
Validation loss = 0.0747639611363411
Validation loss = 0.0796479880809784
Validation loss = 0.08683011680841446
Validation loss = 0.07977242767810822
Validation loss = 0.06781240552663803
Validation loss = 0.0725235715508461
Validation loss = 0.07024835795164108
Validation loss = 0.07318998128175735
Validation loss = 0.07023125141859055
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14852659404277802
Validation loss = 0.09732145071029663
Validation loss = 0.08724372833967209
Validation loss = 0.07435683906078339
Validation loss = 0.08042488247156143
Validation loss = 0.07839875668287277
Validation loss = 0.06745761632919312
Validation loss = 0.07555925101041794
Validation loss = 0.0660666674375534
Validation loss = 0.06627343595027924
Validation loss = 0.06335039436817169
Validation loss = 0.06076762080192566
Validation loss = 0.07256823033094406
Validation loss = 0.060587842017412186
Validation loss = 0.06108476594090462
Validation loss = 0.06842835992574692
Validation loss = 0.06672004610300064
Validation loss = 0.06395094096660614
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0891089108910891
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08823529411764706
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08737864077669903
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08653846153846154
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08571428571428572
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08490566037735849
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08411214953271028
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08333333333333333
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08256880733944955
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08181818181818182
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08108108108108109
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08035714285714286
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07964601769911504
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07894736842105263
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0782608695652174
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07758620689655173
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07692307692307693
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07627118644067797
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07563025210084033
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.075
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0743801652892562
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07377049180327869
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07317073170731707
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07258064516129033
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.072
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0333  |
| Iteration     | 3        |
| MaximumReturn | -0.0206  |
| MinimumReturn | -0.0544  |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09217295050621033
Validation loss = 0.06320759654045105
Validation loss = 0.05976425111293793
Validation loss = 0.059950828552246094
Validation loss = 0.05462013557553291
Validation loss = 0.0635065883398056
Validation loss = 0.06068181246519089
Validation loss = 0.05520499125123024
Validation loss = 0.05855225399136543
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07886779308319092
Validation loss = 0.057913295924663544
Validation loss = 0.059395819902420044
Validation loss = 0.059508729726076126
Validation loss = 0.05832681804895401
Validation loss = 0.057864170521497726
Validation loss = 0.048990074545145035
Validation loss = 0.055966272950172424
Validation loss = 0.05914756655693054
Validation loss = 0.06337253749370575
Validation loss = 0.05392627790570259
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09442931413650513
Validation loss = 0.0653725191950798
Validation loss = 0.07064692676067352
Validation loss = 0.05781085044145584
Validation loss = 0.06748966872692108
Validation loss = 0.06476996093988419
Validation loss = 0.0642818957567215
Validation loss = 0.06059319153428078
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07084225118160248
Validation loss = 0.05711141228675842
Validation loss = 0.05568729713559151
Validation loss = 0.05653694272041321
Validation loss = 0.055638786405324936
Validation loss = 0.053917571902275085
Validation loss = 0.0568791925907135
Validation loss = 0.05402679368853569
Validation loss = 0.05895461142063141
Validation loss = 0.07151544094085693
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11416849493980408
Validation loss = 0.0864037424325943
Validation loss = 0.06734011322259903
Validation loss = 0.06395367532968521
Validation loss = 0.05739319697022438
Validation loss = 0.05890047550201416
Validation loss = 0.05465104058384895
Validation loss = 0.05262485146522522
Validation loss = 0.054206348955631256
Validation loss = 0.05396531894803047
Validation loss = 0.052159249782562256
Validation loss = 0.06665628403425217
Validation loss = 0.06002655625343323
Validation loss = 0.05372747778892517
Validation loss = 0.053132589906454086
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07142857142857142
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07086614173228346
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0703125
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06976744186046512
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06923076923076923
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06870229007633588
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06818181818181818
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06766917293233082
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06716417910447761
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06666666666666667
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0661764705882353
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06569343065693431
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06521739130434782
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06474820143884892
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06428571428571428
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06382978723404255
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06338028169014084
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06293706293706294
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0625
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06206896551724138
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06164383561643835
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.061224489795918366
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.060810810810810814
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06040268456375839
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0449  |
| Iteration     | 4        |
| MaximumReturn | -0.0263  |
| MinimumReturn | -0.0603  |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06291139870882034
Validation loss = 0.06089004874229431
Validation loss = 0.05026688426733017
Validation loss = 0.04639499634504318
Validation loss = 0.04834610968828201
Validation loss = 0.043796151876449585
Validation loss = 0.0444682240486145
Validation loss = 0.044981129467487335
Validation loss = 0.05115855857729912
Validation loss = 0.04568871110677719
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07608361542224884
Validation loss = 0.05454910919070244
Validation loss = 0.04689941927790642
Validation loss = 0.046606384217739105
Validation loss = 0.04783833399415016
Validation loss = 0.06108441203832626
Validation loss = 0.05239727348089218
Validation loss = 0.04983651638031006
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.055563800036907196
Validation loss = 0.0477769635617733
Validation loss = 0.04612889140844345
Validation loss = 0.04940972477197647
Validation loss = 0.04387154430150986
Validation loss = 0.04996565356850624
Validation loss = 0.047323234379291534
Validation loss = 0.04645031690597534
Validation loss = 0.04556005448102951
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06196994334459305
Validation loss = 0.056811727583408356
Validation loss = 0.05002506822347641
Validation loss = 0.04992600902915001
Validation loss = 0.04631974548101425
Validation loss = 0.043873369693756104
Validation loss = 0.04967600852251053
Validation loss = 0.05106019228696823
Validation loss = 0.05133356899023056
Validation loss = 0.047082751989364624
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05520734190940857
Validation loss = 0.043578870594501495
Validation loss = 0.04376674443483353
Validation loss = 0.04421629756689072
Validation loss = 0.046661145985126495
Validation loss = 0.04918243736028671
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.059602649006622516
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05921052631578947
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.058823529411764705
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05844155844155844
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05806451612903226
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.057692307692307696
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05732484076433121
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.056962025316455694
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05660377358490566
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05625
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.055900621118012424
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05555555555555555
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05521472392638037
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.054878048780487805
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05454545454545454
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05421686746987952
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05389221556886228
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05357142857142857
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05325443786982249
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.052941176470588235
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05263157894736842
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05232558139534884
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05202312138728324
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05172413793103448
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05142857142857143
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0191  |
| Iteration     | 5        |
| MaximumReturn | -0.0071  |
| MinimumReturn | -0.074   |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0797654315829277
Validation loss = 0.0616416409611702
Validation loss = 0.05715633183717728
Validation loss = 0.0647214949131012
Validation loss = 0.06024885177612305
Validation loss = 0.056225407868623734
Validation loss = 0.05077426880598068
Validation loss = 0.05366228148341179
Validation loss = 0.05502687767148018
Validation loss = 0.050924886018037796
Validation loss = 0.06272415071725845
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.054786574095487595
Validation loss = 0.053177494555711746
Validation loss = 0.05810360237956047
Validation loss = 0.04871543124318123
Validation loss = 0.05037805438041687
Validation loss = 0.05130179971456528
Validation loss = 0.0508166141808033
Validation loss = 0.04709625989198685
Validation loss = 0.05851076915860176
Validation loss = 0.05317674204707146
Validation loss = 0.04824123531579971
Validation loss = 0.050411321222782135
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05735985189676285
Validation loss = 0.05598939210176468
Validation loss = 0.05970563367009163
Validation loss = 0.05222542956471443
Validation loss = 0.055205218493938446
Validation loss = 0.05717343091964722
Validation loss = 0.04732505977153778
Validation loss = 0.05575801804661751
Validation loss = 0.0647633746266365
Validation loss = 0.0505928099155426
Validation loss = 0.05900060385465622
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06765647232532501
Validation loss = 0.05466366931796074
Validation loss = 0.054932259023189545
Validation loss = 0.04680875688791275
Validation loss = 0.05427779629826546
Validation loss = 0.07261498272418976
Validation loss = 0.055399827659130096
Validation loss = 0.05123983696103096
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.051374536007642746
Validation loss = 0.05700446292757988
Validation loss = 0.04697870835661888
Validation loss = 0.04852353408932686
Validation loss = 0.06289757043123245
Validation loss = 0.052210528403520584
Validation loss = 0.05485574156045914
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05113636363636364
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05084745762711865
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05056179775280899
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05027932960893855
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.05
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.049723756906077346
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04945054945054945
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04918032786885246
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04891304347826087
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04864864864864865
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04838709677419355
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0481283422459893
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047872340425531915
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.047619047619047616
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04736842105263158
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04712041884816754
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046875
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046632124352331605
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04639175257731959
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.046153846153846156
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04591836734693878
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04568527918781726
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045454545454545456
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04522613065326633
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.045
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0375  |
| Iteration     | 6        |
| MaximumReturn | -0.024   |
| MinimumReturn | -0.0489  |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06311113387346268
Validation loss = 0.04650069400668144
Validation loss = 0.04258105158805847
Validation loss = 0.04560708627104759
Validation loss = 0.05267879739403725
Validation loss = 0.041280172765254974
Validation loss = 0.06207134202122688
Validation loss = 0.04386395215988159
Validation loss = 0.049614548683166504
Validation loss = 0.043160777539014816
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06951437890529633
Validation loss = 0.04177027940750122
Validation loss = 0.043596301227808
Validation loss = 0.0452975332736969
Validation loss = 0.044362619519233704
Validation loss = 0.05780073627829552
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04967412352561951
Validation loss = 0.0661175474524498
Validation loss = 0.049542754888534546
Validation loss = 0.04512748122215271
Validation loss = 0.043589670211076736
Validation loss = 0.046994373202323914
Validation loss = 0.047361522912979126
Validation loss = 0.043201860040426254
Validation loss = 0.04322242736816406
Validation loss = 0.044461484998464584
Validation loss = 0.04780492186546326
Validation loss = 0.045212600380182266
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06112291291356087
Validation loss = 0.04935723915696144
Validation loss = 0.04625846818089485
Validation loss = 0.043985262513160706
Validation loss = 0.044134363532066345
Validation loss = 0.040758661925792694
Validation loss = 0.049697306007146835
Validation loss = 0.042715564370155334
Validation loss = 0.04850541055202484
Validation loss = 0.04228469729423523
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.044969070702791214
Validation loss = 0.049062445759773254
Validation loss = 0.04596200957894325
Validation loss = 0.043572794646024704
Validation loss = 0.04165708273649216
Validation loss = 0.04246950149536133
Validation loss = 0.048820752650499344
Validation loss = 0.052113715559244156
Validation loss = 0.04728810861706734
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04477611940298507
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04455445544554455
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04433497536945813
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04411764705882353
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04390243902439024
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.043689320388349516
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.043478260869565216
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04326923076923077
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0430622009569378
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04285714285714286
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04265402843601896
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04245283018867924
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04225352112676056
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04205607476635514
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04186046511627907
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041666666666666664
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.041474654377880185
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04128440366972477
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0410958904109589
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04090909090909091
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04072398190045249
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04054054054054054
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04035874439461883
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04017857142857143
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.04
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00881 |
| Iteration     | 7        |
| MaximumReturn | -0.00549 |
| MinimumReturn | -0.0117  |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.051942430436611176
Validation loss = 0.04688924551010132
Validation loss = 0.04423397779464722
Validation loss = 0.040894679725170135
Validation loss = 0.03875984624028206
Validation loss = 0.04081001877784729
Validation loss = 0.03843705728650093
Validation loss = 0.04008277878165245
Validation loss = 0.036971885710954666
Validation loss = 0.04407041147351265
Validation loss = 0.042164094746112823
Validation loss = 0.049521345645189285
Validation loss = 0.0403907336294651
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03952673822641373
Validation loss = 0.041533827781677246
Validation loss = 0.03921185061335564
Validation loss = 0.04021494835615158
Validation loss = 0.037192489951848984
Validation loss = 0.038575612008571625
Validation loss = 0.041637808084487915
Validation loss = 0.03764398768544197
Validation loss = 0.03720274940133095
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04403442144393921
Validation loss = 0.03988827392458916
Validation loss = 0.036227982491254807
Validation loss = 0.040603868663311005
Validation loss = 0.03872434049844742
Validation loss = 0.04017765074968338
Validation loss = 0.045482393354177475
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04946520924568176
Validation loss = 0.0403243713080883
Validation loss = 0.045205581933259964
Validation loss = 0.038358528167009354
Validation loss = 0.041870586574077606
Validation loss = 0.04157565161585808
Validation loss = 0.03648211434483528
Validation loss = 0.03869018703699112
Validation loss = 0.03695232421159744
Validation loss = 0.03669518604874611
Validation loss = 0.048068080097436905
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04874512553215027
Validation loss = 0.039655011147260666
Validation loss = 0.04461533576250076
Validation loss = 0.037965212017297745
Validation loss = 0.053284622728824615
Validation loss = 0.04458454251289368
Validation loss = 0.038520921021699905
Validation loss = 0.035560451447963715
Validation loss = 0.039930541068315506
Validation loss = 0.04064250364899635
Validation loss = 0.03811010345816612
Validation loss = 0.040971286594867706
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.05309734513274336
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.05726872246696035
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.06578947368421052
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.07423580786026202
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.08260869565217391
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.09090909090909091
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09482758620689655
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.09871244635193133
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.10256410256410256
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.11063829787234042
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.11440677966101695
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11392405063291139
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.11764705882352941
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.13389121338912133
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.14166666666666666
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14107883817427386
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1446280991735537
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.14814814814814814
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.1557377049180328
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.16326530612244897
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.17073170731707318
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17408906882591094
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.1814516129032258
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18473895582329317
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.188
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -57      |
| Iteration     | 8        |
| MaximumReturn | -29.8    |
| MinimumReturn | -87.5    |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05461585521697998
Validation loss = 0.033978499472141266
Validation loss = 0.036472056061029434
Validation loss = 0.03647409379482269
Validation loss = 0.039533138275146484
Validation loss = 0.033523429185152054
Validation loss = 0.030694110319018364
Validation loss = 0.030952032655477524
Validation loss = 0.03257109224796295
Validation loss = 0.0338696613907814
Validation loss = 0.03455999493598938
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05300129950046539
Validation loss = 0.043215811252593994
Validation loss = 0.043060053139925
Validation loss = 0.029093358665704727
Validation loss = 0.0356488898396492
Validation loss = 0.030329227447509766
Validation loss = 0.03230171278119087
Validation loss = 0.030095193535089493
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.054533056914806366
Validation loss = 0.044901762157678604
Validation loss = 0.03427592292428017
Validation loss = 0.04756729677319527
Validation loss = 0.03479620814323425
Validation loss = 0.03377273306250572
Validation loss = 0.038429103791713715
Validation loss = 0.03530636802315712
Validation loss = 0.03339004144072533
Validation loss = 0.031759846955537796
Validation loss = 0.03097207471728325
Validation loss = 0.03043041191995144
Validation loss = 0.028633931651711464
Validation loss = 0.03011278435587883
Validation loss = 0.029303278774023056
Validation loss = 0.02930171974003315
Validation loss = 0.03131826967000961
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05083947628736496
Validation loss = 0.038846585899591446
Validation loss = 0.0335809700191021
Validation loss = 0.03425611928105354
Validation loss = 0.03261240944266319
Validation loss = 0.03445747494697571
Validation loss = 0.031603455543518066
Validation loss = 0.030459409579634666
Validation loss = 0.030487053096294403
Validation loss = 0.0338381752371788
Validation loss = 0.030819857493042946
Validation loss = 0.03838077560067177
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.051058121025562286
Validation loss = 0.03606148064136505
Validation loss = 0.036477893590927124
Validation loss = 0.04662749916315079
Validation loss = 0.0372709259390831
Validation loss = 0.038008034229278564
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18725099601593626
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1865079365079365
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1857707509881423
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18503937007874016
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1843137254901961
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1875
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1867704280155642
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18604651162790697
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18532818532818532
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18461538461538463
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1839080459770115
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.183206106870229
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18250950570342206
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18181818181818182
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1811320754716981
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18045112781954886
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1797752808988764
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1791044776119403
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17843866171003717
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17777777777777778
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17712177121771217
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17647058823529413
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17582417582417584
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17883211678832117
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1781818181818182
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0447  |
| Iteration     | 9        |
| MaximumReturn | -0.0124  |
| MinimumReturn | -0.347   |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04290146380662918
Validation loss = 0.032401494681835175
Validation loss = 0.031158626079559326
Validation loss = 0.033284127712249756
Validation loss = 0.03076518513262272
Validation loss = 0.03521500527858734
Validation loss = 0.03737631440162659
Validation loss = 0.03454655781388283
Validation loss = 0.03196585178375244
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.037921447306871414
Validation loss = 0.034658387303352356
Validation loss = 0.03272391855716705
Validation loss = 0.03328397125005722
Validation loss = 0.03605605661869049
Validation loss = 0.03706705570220947
Validation loss = 0.03617600351572037
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.036247119307518005
Validation loss = 0.03634702414274216
Validation loss = 0.03290511667728424
Validation loss = 0.03761439025402069
Validation loss = 0.035366758704185486
Validation loss = 0.035470347851514816
Validation loss = 0.03278180584311485
Validation loss = 0.03566766902804375
Validation loss = 0.03432061895728111
Validation loss = 0.032900236546993256
Validation loss = 0.03301812708377838
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04136252403259277
Validation loss = 0.031259067356586456
Validation loss = 0.038567546755075455
Validation loss = 0.036491796374320984
Validation loss = 0.03426942229270935
Validation loss = 0.03391801193356514
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.039530444890260696
Validation loss = 0.03478275239467621
Validation loss = 0.0497383214533329
Validation loss = 0.038536276668310165
Validation loss = 0.03168398141860962
Validation loss = 0.03395775333046913
Validation loss = 0.03313127160072327
Validation loss = 0.035351745784282684
Validation loss = 0.033894240856170654
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17753623188405798
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17689530685920576
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17625899280575538
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17562724014336917
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.175
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17437722419928825
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17375886524822695
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17314487632508835
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17253521126760563
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17192982456140352
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17132867132867133
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17073170731707318
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1701388888888889
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1695501730103806
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16896551724137931
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16838487972508592
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1678082191780822
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16723549488054607
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16666666666666666
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16610169491525423
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16554054054054054
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16498316498316498
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1644295302013423
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16387959866220736
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16333333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.108   |
| Iteration     | 10       |
| MaximumReturn | -0.0436  |
| MinimumReturn | -0.544   |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03260355070233345
Validation loss = 0.03207142651081085
Validation loss = 0.031250834465026855
Validation loss = 0.028271108865737915
Validation loss = 0.029538696631789207
Validation loss = 0.03069101832807064
Validation loss = 0.034837447106838226
Validation loss = 0.030329596251249313
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03564688563346863
Validation loss = 0.03332509845495224
Validation loss = 0.028704490512609482
Validation loss = 0.03053487464785576
Validation loss = 0.03074534609913826
Validation loss = 0.04060593992471695
Validation loss = 0.031065087765455246
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.035165734589099884
Validation loss = 0.031872741878032684
Validation loss = 0.02904246188700199
Validation loss = 0.029561543837189674
Validation loss = 0.03169825300574303
Validation loss = 0.029932186007499695
Validation loss = 0.029837068170309067
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.030495043843984604
Validation loss = 0.027948791161179543
Validation loss = 0.02866286039352417
Validation loss = 0.029630254954099655
Validation loss = 0.029416143894195557
Validation loss = 0.031049245968461037
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.029954027384519577
Validation loss = 0.031538985669612885
Validation loss = 0.031610410660505295
Validation loss = 0.035389937460422516
Validation loss = 0.030684780329465866
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16279069767441862
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16225165562913907
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1617161716171617
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1611842105263158
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16065573770491803
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16013071895424835
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15960912052117263
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1590909090909091
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15857605177993528
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15806451612903225
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15755627009646303
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15705128205128205
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15654952076677317
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15605095541401273
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15555555555555556
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1550632911392405
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15457413249211358
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1540880503144654
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1536050156739812
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.153125
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1526479750778816
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15217391304347827
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15170278637770898
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15123456790123457
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15076923076923077
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00313 |
| Iteration     | 11       |
| MaximumReturn | -0.00227 |
| MinimumReturn | -0.00405 |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03549390286207199
Validation loss = 0.03306259959936142
Validation loss = 0.029362361878156662
Validation loss = 0.03532196953892708
Validation loss = 0.030060121789574623
Validation loss = 0.03192305564880371
Validation loss = 0.03146720677614212
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04314647614955902
Validation loss = 0.033927470445632935
Validation loss = 0.03185824304819107
Validation loss = 0.035869013518095016
Validation loss = 0.03134641796350479
Validation loss = 0.03662871569395065
Validation loss = 0.035545460879802704
Validation loss = 0.03524964302778244
Validation loss = 0.03078574314713478
Validation loss = 0.029529079794883728
Validation loss = 0.03975573927164078
Validation loss = 0.0389971062541008
Validation loss = 0.03274570032954216
Validation loss = 0.03175076097249985
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04021006077528
Validation loss = 0.031573910266160965
Validation loss = 0.03506168723106384
Validation loss = 0.03386972099542618
Validation loss = 0.0377572663128376
Validation loss = 0.03453490883111954
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03641846403479576
Validation loss = 0.03254701942205429
Validation loss = 0.03537428006529808
Validation loss = 0.034589607268571854
Validation loss = 0.03696564584970474
Validation loss = 0.02938210964202881
Validation loss = 0.03298161178827286
Validation loss = 0.03195161372423172
Validation loss = 0.03184548392891884
Validation loss = 0.03483450040221214
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03487444669008255
Validation loss = 0.032215047627687454
Validation loss = 0.03205941244959831
Validation loss = 0.032869432121515274
Validation loss = 0.03285456821322441
Validation loss = 0.030650783330202103
Validation loss = 0.03445295989513397
Validation loss = 0.029773231595754623
Validation loss = 0.02927204966545105
Validation loss = 0.03163602203130722
Validation loss = 0.03383751958608627
Validation loss = 0.03352590277791023
Validation loss = 0.03491095453500748
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15337423312883436
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1559633027522936
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.16463414634146342
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.1702127659574468
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17272727272727273
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.18126888217522658
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18373493975903615
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.1921921921921922
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.19461077844311378
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.20597014925373133
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20833333333333334
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.21068249258160238
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.21301775147928995
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2182890855457227
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2235294117647059
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22287390029325513
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22514619883040934
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22740524781341107
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.23255813953488372
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23478260869565218
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23699421965317918
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2420749279538905
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2471264367816092
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2492836676217765
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.25142857142857145
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -71.2    |
| Iteration     | 12       |
| MaximumReturn | -43.9    |
| MinimumReturn | -104     |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03230353072285652
Validation loss = 0.03646098077297211
Validation loss = 0.03820085898041725
Validation loss = 0.027189433574676514
Validation loss = 0.028660165145993233
Validation loss = 0.026854878291487694
Validation loss = 0.028227632865309715
Validation loss = 0.029157359153032303
Validation loss = 0.034154992550611496
Validation loss = 0.029015593230724335
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.037336982786655426
Validation loss = 0.0331868939101696
Validation loss = 0.03741489350795746
Validation loss = 0.029356010258197784
Validation loss = 0.031041709706187248
Validation loss = 0.034347329288721085
Validation loss = 0.029321225360035896
Validation loss = 0.02897869423031807
Validation loss = 0.027574460953474045
Validation loss = 0.02905031479895115
Validation loss = 0.03174546733498573
Validation loss = 0.026730619370937347
Validation loss = 0.027138907462358475
Validation loss = 0.03121018409729004
Validation loss = 0.029196297749876976
Validation loss = 0.02718067727982998
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03425505384802818
Validation loss = 0.035001907497644424
Validation loss = 0.029369018971920013
Validation loss = 0.027920549735426903
Validation loss = 0.029415298253297806
Validation loss = 0.02791743353009224
Validation loss = 0.028653321787714958
Validation loss = 0.027439367026090622
Validation loss = 0.026748819276690483
Validation loss = 0.029681676998734474
Validation loss = 0.026346715167164803
Validation loss = 0.02909853495657444
Validation loss = 0.02747449465095997
Validation loss = 0.03313415125012398
Validation loss = 0.02759934775531292
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.030808502808213234
Validation loss = 0.030532121658325195
Validation loss = 0.02992328442633152
Validation loss = 0.030343418940901756
Validation loss = 0.03050069324672222
Validation loss = 0.030427761375904083
Validation loss = 0.029018593952059746
Validation loss = 0.026954321190714836
Validation loss = 0.03548479825258255
Validation loss = 0.028944717720150948
Validation loss = 0.030672607943415642
Validation loss = 0.026859372854232788
Validation loss = 0.026112770661711693
Validation loss = 0.029665686190128326
Validation loss = 0.027196144685149193
Validation loss = 0.026572905480861664
Validation loss = 0.0291158314794302
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0339767187833786
Validation loss = 0.028998441994190216
Validation loss = 0.027181165292859077
Validation loss = 0.02821996994316578
Validation loss = 0.027704892680048943
Validation loss = 0.026357948780059814
Validation loss = 0.027957534417510033
Validation loss = 0.026306796818971634
Validation loss = 0.02770649828016758
Validation loss = 0.030145443975925446
Validation loss = 0.027945084497332573
Validation loss = 0.03380656987428665
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25071225071225073
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24929178470254956
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24858757062146894
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24788732394366197
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24719101123595505
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24649859943977592
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24581005586592178
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24512534818941503
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24444444444444444
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24376731301939059
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2430939226519337
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24242424242424243
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24175824175824176
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2410958904109589
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24043715846994534
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23978201634877383
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2391304347826087
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23848238482384823
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23783783783783785
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2371967654986523
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23655913978494625
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2359249329758713
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23529411764705882
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23466666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.103   |
| Iteration     | 13       |
| MaximumReturn | -0.0524  |
| MinimumReturn | -0.159   |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0334414504468441
Validation loss = 0.0325295627117157
Validation loss = 0.03323177248239517
Validation loss = 0.03236787021160126
Validation loss = 0.03156411647796631
Validation loss = 0.03710931912064552
Validation loss = 0.03215283155441284
Validation loss = 0.031943779438734055
Validation loss = 0.032891374081373215
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03522295877337456
Validation loss = 0.03174617514014244
Validation loss = 0.03127056360244751
Validation loss = 0.036652207374572754
Validation loss = 0.038392312824726105
Validation loss = 0.036409977823495865
Validation loss = 0.03135102987289429
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03314364701509476
Validation loss = 0.031679026782512665
Validation loss = 0.03233914077281952
Validation loss = 0.03379509225487709
Validation loss = 0.032788004726171494
Validation loss = 0.031434882432222366
Validation loss = 0.03258630633354187
Validation loss = 0.03275350108742714
Validation loss = 0.03350967541337013
Validation loss = 0.03372681885957718
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.030986713245511055
Validation loss = 0.031470660120248795
Validation loss = 0.033270228654146194
Validation loss = 0.032622337341308594
Validation loss = 0.03284047544002533
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03242861479520798
Validation loss = 0.03117598593235016
Validation loss = 0.03300711512565613
Validation loss = 0.03174765780568123
Validation loss = 0.03376321867108345
Validation loss = 0.03410497307777405
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23404255319148937
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23342175066312998
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2328042328042328
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23218997361477572
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23157894736842105
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23097112860892388
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23036649214659685
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2297650130548303
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22916666666666666
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22857142857142856
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22797927461139897
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22739018087855298
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2268041237113402
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2262210796915167
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22564102564102564
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22506393861892582
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22448979591836735
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22391857506361323
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2233502538071066
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22278481012658227
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2222222222222222
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2216624685138539
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22110552763819097
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22055137844611528
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00383 |
| Iteration     | 14       |
| MaximumReturn | -0.00328 |
| MinimumReturn | -0.00536 |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03263742849230766
Validation loss = 0.034316595643758774
Validation loss = 0.035904236137866974
Validation loss = 0.031019015237689018
Validation loss = 0.03811226785182953
Validation loss = 0.034948546439409256
Validation loss = 0.031363047659397125
Validation loss = 0.03204481676220894
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.029766254127025604
Validation loss = 0.03388512507081032
Validation loss = 0.03141815960407257
Validation loss = 0.03163262456655502
Validation loss = 0.03131524845957756
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0367501936852932
Validation loss = 0.03242664039134979
Validation loss = 0.031618788838386536
Validation loss = 0.03149396926164627
Validation loss = 0.030544083565473557
Validation loss = 0.03416327014565468
Validation loss = 0.03271237760782242
Validation loss = 0.035320062190294266
Validation loss = 0.03105798177421093
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.037276528775691986
Validation loss = 0.030850548297166824
Validation loss = 0.03205466270446777
Validation loss = 0.0318729467689991
Validation loss = 0.031217943876981735
Validation loss = 0.02867443673312664
Validation loss = 0.03213031589984894
Validation loss = 0.030979575589299202
Validation loss = 0.029781309887766838
Validation loss = 0.028279881924390793
Validation loss = 0.028896482661366463
Validation loss = 0.03073650784790516
Validation loss = 0.03282533586025238
Validation loss = 0.03585541620850563
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03245178237557411
Validation loss = 0.0386151522397995
Validation loss = 0.031988658010959625
Validation loss = 0.03349626809358597
Validation loss = 0.031066089868545532
Validation loss = 0.03131164610385895
Validation loss = 0.03204020857810974
Validation loss = 0.033793941140174866
Validation loss = 0.029967041686177254
Validation loss = 0.03183817118406296
Validation loss = 0.03111289069056511
Validation loss = 0.030173668637871742
Validation loss = 0.03081239014863968
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2194513715710723
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21890547263681592
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21836228287841192
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21782178217821782
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21728395061728395
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21674876847290642
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21621621621621623
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21568627450980393
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21515892420537897
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2146341463414634
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2141119221411192
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21359223300970873
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21307506053268765
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21256038647342995
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21204819277108433
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21153846153846154
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21103117505995203
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21052631578947367
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2100238663484487
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20952380952380953
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20902612826603326
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20853080568720378
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20803782505910165
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20754716981132076
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20705882352941177
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00386 |
| Iteration     | 15       |
| MaximumReturn | -0.00243 |
| MinimumReturn | -0.00558 |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03521766513586044
Validation loss = 0.02857268415391445
Validation loss = 0.03090091049671173
Validation loss = 0.0301668643951416
Validation loss = 0.028902696445584297
Validation loss = 0.027671754360198975
Validation loss = 0.029777714982628822
Validation loss = 0.029384290799498558
Validation loss = 0.02803041599690914
Validation loss = 0.028979023918509483
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03113645687699318
Validation loss = 0.028796229511499405
Validation loss = 0.029435385018587112
Validation loss = 0.028333278372883797
Validation loss = 0.02935776114463806
Validation loss = 0.029736081138253212
Validation loss = 0.03090440295636654
Validation loss = 0.02916833385825157
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.030210407450795174
Validation loss = 0.03548634797334671
Validation loss = 0.029987765476107597
Validation loss = 0.028762778267264366
Validation loss = 0.027681350708007812
Validation loss = 0.03333400934934616
Validation loss = 0.02867681346833706
Validation loss = 0.03166890889406204
Validation loss = 0.032787054777145386
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.033693257719278336
Validation loss = 0.03229860961437225
Validation loss = 0.028883768245577812
Validation loss = 0.03224824368953705
Validation loss = 0.028441261500120163
Validation loss = 0.031135624274611473
Validation loss = 0.03057831898331642
Validation loss = 0.0286570992320776
Validation loss = 0.030549701303243637
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.030129363760352135
Validation loss = 0.031498704105615616
Validation loss = 0.031087668612599373
Validation loss = 0.029470622539520264
Validation loss = 0.030436033383011818
Validation loss = 0.03442069888114929
Validation loss = 0.02838127315044403
Validation loss = 0.030284101143479347
Validation loss = 0.03071098029613495
Validation loss = 0.027483509853482246
Validation loss = 0.030886158347129822
Validation loss = 0.02828734926879406
Validation loss = 0.030283158645033836
Validation loss = 0.028806278482079506
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20657276995305165
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20608899297423888
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.205607476635514
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20512820512820512
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20465116279069767
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20417633410672853
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2037037037037037
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20323325635103925
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20276497695852536
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20229885057471264
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2018348623853211
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20137299771167047
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2009132420091324
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20045558086560364
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19954648526077098
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19909502262443438
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1986455981941309
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1981981981981982
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19775280898876405
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19730941704035873
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19686800894854586
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19642857142857142
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19599109131403117
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19555555555555557
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000988 |
| Iteration     | 16        |
| MaximumReturn | -0.00059  |
| MinimumReturn | -0.00138  |
| TotalSamples  | 29988     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.030678339302539825
Validation loss = 0.02926134131848812
Validation loss = 0.02963569387793541
Validation loss = 0.02981530874967575
Validation loss = 0.02940094843506813
Validation loss = 0.030441533774137497
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03051343560218811
Validation loss = 0.028391003608703613
Validation loss = 0.029252445325255394
Validation loss = 0.03137911111116409
Validation loss = 0.031006453558802605
Validation loss = 0.02756238542497158
Validation loss = 0.029556261375546455
Validation loss = 0.029975896701216698
Validation loss = 0.028173495084047318
Validation loss = 0.029842892661690712
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.033607110381126404
Validation loss = 0.028454869985580444
Validation loss = 0.0304697398096323
Validation loss = 0.03068333864212036
Validation loss = 0.029074814170598984
Validation loss = 0.027448706328868866
Validation loss = 0.029975775629281998
Validation loss = 0.02816038206219673
Validation loss = 0.03139500692486763
Validation loss = 0.0359877273440361
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.029270954430103302
Validation loss = 0.030062071979045868
Validation loss = 0.030557090416550636
Validation loss = 0.028297550976276398
Validation loss = 0.028728894889354706
Validation loss = 0.028969403356313705
Validation loss = 0.030539391562342644
Validation loss = 0.032077889889478683
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.031673990190029144
Validation loss = 0.029816759750247
Validation loss = 0.030331244692206383
Validation loss = 0.028185544535517693
Validation loss = 0.027924640104174614
Validation loss = 0.028518104925751686
Validation loss = 0.03025689721107483
Validation loss = 0.03014266863465309
Validation loss = 0.031684163957834244
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1951219512195122
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19469026548672566
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19426048565121412
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19383259911894274
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1934065934065934
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19298245614035087
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1925601750547046
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19213973799126638
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19172113289760348
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19130434782608696
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19088937093275488
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19047619047619047
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1900647948164147
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1896551724137931
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18924731182795698
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1888412017167382
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18843683083511778
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18803418803418803
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18763326226012794
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18723404255319148
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18683651804670912
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1864406779661017
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18604651162790697
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18565400843881857
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18526315789473685
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.028   |
| Iteration     | 17       |
| MaximumReturn | -0.0182  |
| MinimumReturn | -0.0396  |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029718663543462753
Validation loss = 0.029121898114681244
Validation loss = 0.02714882604777813
Validation loss = 0.02547975815832615
Validation loss = 0.031028591096401215
Validation loss = 0.027289116755127907
Validation loss = 0.02816658653318882
Validation loss = 0.027274522930383682
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.030398409813642502
Validation loss = 0.02854085899889469
Validation loss = 0.02873094193637371
Validation loss = 0.02777188830077648
Validation loss = 0.02991747111082077
Validation loss = 0.029781032353639603
Validation loss = 0.0296019334346056
Validation loss = 0.030463039875030518
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03234611824154854
Validation loss = 0.02854073978960514
Validation loss = 0.02862650714814663
Validation loss = 0.029452791437506676
Validation loss = 0.02660696767270565
Validation loss = 0.028750810772180557
Validation loss = 0.02847679704427719
Validation loss = 0.0318181999027729
Validation loss = 0.027179516851902008
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028411708772182465
Validation loss = 0.025214286521077156
Validation loss = 0.0262631643563509
Validation loss = 0.02748037315905094
Validation loss = 0.027698319405317307
Validation loss = 0.03185613825917244
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.029971007257699966
Validation loss = 0.027058687061071396
Validation loss = 0.027602404356002808
Validation loss = 0.028946321457624435
Validation loss = 0.027284739539027214
Validation loss = 0.027051875367760658
Validation loss = 0.02902495302259922
Validation loss = 0.030418120324611664
Validation loss = 0.02998800203204155
Validation loss = 0.03183042258024216
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18487394957983194
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18448637316561844
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18410041841004185
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1837160751565762
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18333333333333332
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18295218295218296
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1825726141078838
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18219461697722567
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18181818181818182
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18144329896907216
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18106995884773663
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1806981519507187
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18032786885245902
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17995910020449898
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17959183673469387
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17922606924643583
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17886178861788618
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17849898580121704
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17813765182186234
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17777777777777778
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1774193548387097
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17706237424547283
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17670682730923695
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17635270541082165
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.176
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.52    |
| Iteration     | 18       |
| MaximumReturn | -0.155   |
| MinimumReturn | -13.7    |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02917630597949028
Validation loss = 0.027052786201238632
Validation loss = 0.027584418654441833
Validation loss = 0.025591367855668068
Validation loss = 0.029450690373778343
Validation loss = 0.027228504419326782
Validation loss = 0.025778384879231453
Validation loss = 0.02544432505965233
Validation loss = 0.02379939891397953
Validation loss = 0.026281002908945084
Validation loss = 0.025851748883724213
Validation loss = 0.03240329772233963
Validation loss = 0.028942642733454704
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.029689468443393707
Validation loss = 0.03000681847333908
Validation loss = 0.026871390640735626
Validation loss = 0.027816955000162125
Validation loss = 0.02712591364979744
Validation loss = 0.0271930955350399
Validation loss = 0.026851916685700417
Validation loss = 0.024975277483463287
Validation loss = 0.02640942484140396
Validation loss = 0.02669190801680088
Validation loss = 0.026206124573946
Validation loss = 0.027393948286771774
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.031362857669591904
Validation loss = 0.026993311941623688
Validation loss = 0.026438606902956963
Validation loss = 0.028736241161823273
Validation loss = 0.03280571103096008
Validation loss = 0.02621755376458168
Validation loss = 0.026805344969034195
Validation loss = 0.02875034511089325
Validation loss = 0.028405413031578064
Validation loss = 0.0268251895904541
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.030975354835391045
Validation loss = 0.02473018504679203
Validation loss = 0.0279486496001482
Validation loss = 0.02484804205596447
Validation loss = 0.025093458592891693
Validation loss = 0.023882506415247917
Validation loss = 0.03046727553009987
Validation loss = 0.02392757497727871
Validation loss = 0.02564260922372341
Validation loss = 0.025970211252570152
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02974272333085537
Validation loss = 0.025114459916949272
Validation loss = 0.027916796505451202
Validation loss = 0.026654362678527832
Validation loss = 0.027274467051029205
Validation loss = 0.02587636560201645
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17564870259481039
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1752988047808765
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1749502982107356
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1746031746031746
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17425742574257425
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17391304347826086
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17357001972386588
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1732283464566929
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17288801571709234
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17450980392156862
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17416829745596868
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.173828125
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17348927875243664
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17315175097276264
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17281553398058253
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17248062015503876
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.172147001934236
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1718146718146718
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17148362235067438
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17115384615384616
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1708253358925144
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17049808429118773
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1701720841300191
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16984732824427481
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16952380952380952
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0322  |
| Iteration     | 19       |
| MaximumReturn | -0.0193  |
| MinimumReturn | -0.0687  |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02683267556130886
Validation loss = 0.029595907777547836
Validation loss = 0.023784397169947624
Validation loss = 0.02308201976120472
Validation loss = 0.037709664553403854
Validation loss = 0.02381625771522522
Validation loss = 0.025663254782557487
Validation loss = 0.025934496894478798
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02732357755303383
Validation loss = 0.024619095027446747
Validation loss = 0.023744048550724983
Validation loss = 0.0261920765042305
Validation loss = 0.0246756412088871
Validation loss = 0.02485397644340992
Validation loss = 0.024436285719275475
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.026326263323426247
Validation loss = 0.02443111315369606
Validation loss = 0.02375159040093422
Validation loss = 0.02735159918665886
Validation loss = 0.02461114339530468
Validation loss = 0.02679610811173916
Validation loss = 0.024984655901789665
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02478882484138012
Validation loss = 0.023579008877277374
Validation loss = 0.025200096890330315
Validation loss = 0.02441985346376896
Validation loss = 0.02427704632282257
Validation loss = 0.024820759892463684
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025297975167632103
Validation loss = 0.023339184001088142
Validation loss = 0.030588146299123764
Validation loss = 0.025736473500728607
Validation loss = 0.024195263162255287
Validation loss = 0.024927569553256035
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16920152091254753
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16888045540796964
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16856060606060605
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16824196597353497
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16792452830188678
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16760828625235405
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16729323308270677
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1669793621013133
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16666666666666666
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16635514018691588
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.166044776119403
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16573556797020483
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1654275092936803
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16512059369202226
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1648148148148148
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1645101663585952
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16420664206642066
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16390423572744015
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1636029411764706
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.163302752293578
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.163003663003663
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16270566727605118
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1624087591240876
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1621129326047359
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1618181818181818
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00422 |
| Iteration     | 20       |
| MaximumReturn | -0.00277 |
| MinimumReturn | -0.00564 |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02373785525560379
Validation loss = 0.024520037695765495
Validation loss = 0.02284489944577217
Validation loss = 0.023549217730760574
Validation loss = 0.026123367249965668
Validation loss = 0.025006916373968124
Validation loss = 0.02523118071258068
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026346243917942047
Validation loss = 0.02517949417233467
Validation loss = 0.02331119030714035
Validation loss = 0.024807123467326164
Validation loss = 0.023905089125037193
Validation loss = 0.02941618114709854
Validation loss = 0.024774717167019844
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.028654012829065323
Validation loss = 0.026580289006233215
Validation loss = 0.02714347653090954
Validation loss = 0.027401821687817574
Validation loss = 0.022430971264839172
Validation loss = 0.028577987104654312
Validation loss = 0.02281859517097473
Validation loss = 0.022883418947458267
Validation loss = 0.025109224021434784
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022140175104141235
Validation loss = 0.023190034553408623
Validation loss = 0.025058014318346977
Validation loss = 0.02377190813422203
Validation loss = 0.02396085299551487
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024026818573474884
Validation loss = 0.023402957245707512
Validation loss = 0.023968663066625595
Validation loss = 0.02442072331905365
Validation loss = 0.023906748741865158
Validation loss = 0.02512570284307003
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16152450090744103
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.161231884057971
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1609403254972875
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16064981949458484
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16036036036036036
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16007194244604317
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15978456014362658
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15949820788530467
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1592128801431127
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15892857142857142
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1586452762923351
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1583629893238434
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15808170515097691
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15780141843971632
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15752212389380532
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15724381625441697
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15696649029982362
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15669014084507044
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15641476274165203
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.156140350877193
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15586690017513136
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1555944055944056
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15532286212914484
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15505226480836237
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15478260869565216
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00107  |
| Iteration     | 21        |
| MaximumReturn | -0.000735 |
| MinimumReturn | -0.00153  |
| TotalSamples  | 38318     |
-----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024181431159377098
Validation loss = 0.023544779047369957
Validation loss = 0.022930756211280823
Validation loss = 0.023782605305314064
Validation loss = 0.02637178637087345
Validation loss = 0.02427358366549015
Validation loss = 0.024533964693546295
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02269192785024643
Validation loss = 0.025937743484973907
Validation loss = 0.023757103830575943
Validation loss = 0.02463124506175518
Validation loss = 0.025051189586520195
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023664603009819984
Validation loss = 0.02232365310192108
Validation loss = 0.0223329309374094
Validation loss = 0.030473574995994568
Validation loss = 0.023691609501838684
Validation loss = 0.0239503625780344
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022925561293959618
Validation loss = 0.022488897666335106
Validation loss = 0.02098090387880802
Validation loss = 0.022020595148205757
Validation loss = 0.023803554475307465
Validation loss = 0.023588573560118675
Validation loss = 0.02417411282658577
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022367840632796288
Validation loss = 0.02288210019469261
Validation loss = 0.02717268094420433
Validation loss = 0.026924660429358482
Validation loss = 0.023652343079447746
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1545138888888889
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1559792027729636
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15570934256055363
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15544041450777202
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15689655172413794
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15834767641996558
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15807560137457044
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1595197255574614
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16095890410958905
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1623931623931624
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16382252559726962
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1635434412265758
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1649659863945578
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.166383701188455
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16610169491525423
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16751269035532995
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16891891891891891
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16863406408094436
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16835016835016836
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16806722689075632
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16946308724832215
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1708542713567839
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17224080267558528
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17195325542570952
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17166666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -20.6    |
| Iteration     | 22       |
| MaximumReturn | -0.0467  |
| MinimumReturn | -57      |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026527047157287598
Validation loss = 0.024231132119894028
Validation loss = 0.022265281528234482
Validation loss = 0.024375375360250473
Validation loss = 0.02095714583992958
Validation loss = 0.02071995474398136
Validation loss = 0.02369542419910431
Validation loss = 0.020590225234627724
Validation loss = 0.020816784352064133
Validation loss = 0.02050968073308468
Validation loss = 0.021058309823274612
Validation loss = 0.021354397758841515
Validation loss = 0.022078758105635643
Validation loss = 0.02056152932345867
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02257850207388401
Validation loss = 0.026285752654075623
Validation loss = 0.023021109402179718
Validation loss = 0.022733893245458603
Validation loss = 0.022074509412050247
Validation loss = 0.02297711931169033
Validation loss = 0.02278507500886917
Validation loss = 0.02195553295314312
Validation loss = 0.03538120537996292
Validation loss = 0.02327302284538746
Validation loss = 0.021517718210816383
Validation loss = 0.024161705747246742
Validation loss = 0.022742969915270805
Validation loss = 0.022412506863474846
Validation loss = 0.02325347252190113
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024879399687051773
Validation loss = 0.02274443581700325
Validation loss = 0.0226804967969656
Validation loss = 0.0206881295889616
Validation loss = 0.021778937429189682
Validation loss = 0.02190398797392845
Validation loss = 0.022756729274988174
Validation loss = 0.023650038987398148
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022674880921840668
Validation loss = 0.020097743719816208
Validation loss = 0.02118837460875511
Validation loss = 0.020961206406354904
Validation loss = 0.020367208868265152
Validation loss = 0.022007538005709648
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022544806823134422
Validation loss = 0.021379541605710983
Validation loss = 0.021327584981918335
Validation loss = 0.0223555751144886
Validation loss = 0.02149316854774952
Validation loss = 0.021674100309610367
Validation loss = 0.022928351536393166
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.17470881863560733
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1744186046511628
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.17744610281923714
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.18211920529801323
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1834710743801653
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.19141914191419143
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.19934102141680396
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19901315789473684
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.20361247947454844
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2081967213114754
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.21112929623567922
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2107843137254902
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.21533442088091354
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21498371335504887
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.216260162601626
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.21753246753246752
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.22042139384116693
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22168284789644013
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22294022617124395
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.22903225806451613
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.23510466988727857
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23633440514469453
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2375601926163724
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.24358974358974358
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.2464
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -49.4    |
| Iteration     | 23       |
| MaximumReturn | -2.03    |
| MinimumReturn | -87.3    |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024479510262608528
Validation loss = 0.020121481269598007
Validation loss = 0.01840997487306595
Validation loss = 0.017725437879562378
Validation loss = 0.018825970590114594
Validation loss = 0.017404284328222275
Validation loss = 0.01872118189930916
Validation loss = 0.02074812725186348
Validation loss = 0.018702831119298935
Validation loss = 0.018839463591575623
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02507525309920311
Validation loss = 0.023898299783468246
Validation loss = 0.019143639132380486
Validation loss = 0.018231123685836792
Validation loss = 0.01797197014093399
Validation loss = 0.019619116559624672
Validation loss = 0.018383074551820755
Validation loss = 0.018791157752275467
Validation loss = 0.017612338066101074
Validation loss = 0.01823880709707737
Validation loss = 0.019127361476421356
Validation loss = 0.01897520199418068
Validation loss = 0.02106703445315361
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021561050787568092
Validation loss = 0.02037772163748741
Validation loss = 0.018307900056242943
Validation loss = 0.01949026994407177
Validation loss = 0.018939482048153877
Validation loss = 0.020402941852808
Validation loss = 0.017987646162509918
Validation loss = 0.01880299672484398
Validation loss = 0.018760349601507187
Validation loss = 0.020096438005566597
Validation loss = 0.019007809460163116
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023113448172807693
Validation loss = 0.01916593313217163
Validation loss = 0.01886756531894207
Validation loss = 0.01769046112895012
Validation loss = 0.0181194506585598
Validation loss = 0.018804138526320457
Validation loss = 0.018803196027874947
Validation loss = 0.017265303060412407
Validation loss = 0.0184977725148201
Validation loss = 0.016560574993491173
Validation loss = 0.017622096464037895
Validation loss = 0.018626200035214424
Validation loss = 0.018257685005664825
Validation loss = 0.01817096769809723
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02095433510839939
Validation loss = 0.02562769316136837
Validation loss = 0.019880887120962143
Validation loss = 0.018740065395832062
Validation loss = 0.019279448315501213
Validation loss = 0.019245145842432976
Validation loss = 0.019443131983280182
Validation loss = 0.019265759736299515
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24600638977635783
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24561403508771928
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24522292993630573
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24483306836248012
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24444444444444444
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24405705229793978
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24367088607594936
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24328593996840442
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24290220820189273
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24251968503937008
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24213836477987422
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24175824175824176
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2413793103448276
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24100156494522693
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.240625
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24024960998439937
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2398753894080997
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23950233281493002
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2391304347826087
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23875968992248062
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23839009287925697
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23802163833075735
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23765432098765432
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23728813559322035
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23692307692307693
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.113   |
| Iteration     | 24       |
| MaximumReturn | -0.0584  |
| MinimumReturn | -0.303   |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026159249246120453
Validation loss = 0.026005761697888374
Validation loss = 0.02520701102912426
Validation loss = 0.025517625734210014
Validation loss = 0.02607816830277443
Validation loss = 0.025309786200523376
Validation loss = 0.026057854294776917
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02522790990769863
Validation loss = 0.024837937206029892
Validation loss = 0.024617400020360947
Validation loss = 0.025837481021881104
Validation loss = 0.027196742594242096
Validation loss = 0.025139352306723595
Validation loss = 0.028719667345285416
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02431725338101387
Validation loss = 0.02410212531685829
Validation loss = 0.023779675364494324
Validation loss = 0.0274337325245142
Validation loss = 0.03129265829920769
Validation loss = 0.025495989248156548
Validation loss = 0.024536382406949997
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02435222640633583
Validation loss = 0.026584818959236145
Validation loss = 0.024708151817321777
Validation loss = 0.025615133345127106
Validation loss = 0.027071844786405563
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026958556845784187
Validation loss = 0.026305560022592545
Validation loss = 0.024776633828878403
Validation loss = 0.025760671123862267
Validation loss = 0.02621811255812645
Validation loss = 0.026927391067147255
Validation loss = 0.02629738487303257
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23655913978494625
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2361963190184049
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23583460949464014
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23547400611620795
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23511450381679388
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2347560975609756
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2343987823439878
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23404255319148937
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2336874051593323
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23333333333333334
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2329803328290469
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2326283987915408
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23227752639517346
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2319277108433735
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23157894736842105
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23123123123123124
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23088455772113944
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23053892215568864
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23019431988041852
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2298507462686567
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22950819672131148
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22916666666666666
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2288261515601783
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.228486646884273
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22814814814814816
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0159  |
| Iteration     | 25       |
| MaximumReturn | -0.0103  |
| MinimumReturn | -0.0232  |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024009430781006813
Validation loss = 0.0243440642952919
Validation loss = 0.02650483325123787
Validation loss = 0.03045218624174595
Validation loss = 0.02678513340651989
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02405110001564026
Validation loss = 0.026519687846302986
Validation loss = 0.0262831449508667
Validation loss = 0.02656383067369461
Validation loss = 0.023796062916517258
Validation loss = 0.024877185001969337
Validation loss = 0.023773353546857834
Validation loss = 0.026248108595609665
Validation loss = 0.024680161848664284
Validation loss = 0.024937940761446953
Validation loss = 0.025100143626332283
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025160400196909904
Validation loss = 0.024720806628465652
Validation loss = 0.025758827105164528
Validation loss = 0.0264495387673378
Validation loss = 0.024736672639846802
Validation loss = 0.028915924951434135
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027866505086421967
Validation loss = 0.02496291883289814
Validation loss = 0.02294277399778366
Validation loss = 0.023629819974303246
Validation loss = 0.0241542998701334
Validation loss = 0.02767009660601616
Validation loss = 0.029407016932964325
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02541300095617771
Validation loss = 0.026670033112168312
Validation loss = 0.026000982150435448
Validation loss = 0.026333294808864594
Validation loss = 0.026787878945469856
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22781065088757396
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2274741506646972
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22713864306784662
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2268041237113402
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22647058823529412
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2261380323054332
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22580645161290322
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22547584187408493
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22514619883040934
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22481751824817517
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22448979591836735
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22416302765647744
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2238372093023256
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22351233671988388
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22318840579710145
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22286541244573083
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22254335260115607
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2222222222222222
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2219020172910663
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22158273381294963
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22126436781609196
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22094691535150646
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22063037249283668
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22031473533619456
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.037   |
| Iteration     | 26       |
| MaximumReturn | -0.0193  |
| MinimumReturn | -0.0556  |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025987466797232628
Validation loss = 0.02551680989563465
Validation loss = 0.024526799097657204
Validation loss = 0.02565406635403633
Validation loss = 0.027032731100916862
Validation loss = 0.024875838309526443
Validation loss = 0.023827586323022842
Validation loss = 0.024685218930244446
Validation loss = 0.025479799136519432
Validation loss = 0.02661265805363655
Validation loss = 0.02470465376973152
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025094231590628624
Validation loss = 0.024267997592687607
Validation loss = 0.02207651548087597
Validation loss = 0.024526257067918777
Validation loss = 0.02455037087202072
Validation loss = 0.028905700892210007
Validation loss = 0.027986103668808937
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024318454787135124
Validation loss = 0.02386205643415451
Validation loss = 0.02321755699813366
Validation loss = 0.026474768295884132
Validation loss = 0.025631455704569817
Validation loss = 0.022872820496559143
Validation loss = 0.023212414234876633
Validation loss = 0.02545761875808239
Validation loss = 0.02592596970498562
Validation loss = 0.023318735882639885
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024449260905385017
Validation loss = 0.02314818650484085
Validation loss = 0.02421201393008232
Validation loss = 0.023664038628339767
Validation loss = 0.022893523797392845
Validation loss = 0.02513713948428631
Validation loss = 0.02332598716020584
Validation loss = 0.023422272875905037
Validation loss = 0.02462179958820343
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02531314641237259
Validation loss = 0.02310238406062126
Validation loss = 0.027477143332362175
Validation loss = 0.02492404729127884
Validation loss = 0.023930920287966728
Validation loss = 0.028993461281061172
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21968616262482168
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21937321937321938
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21906116642958748
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21875
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21843971631205675
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21813031161473087
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21782178217821782
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2175141242937853
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21720733427362482
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21690140845070421
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21659634317862167
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21629213483146068
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21598877980364656
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21568627450980393
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2153846153846154
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21508379888268156
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21478382147838215
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21448467966573817
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2141863699582754
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21388888888888888
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21359223300970873
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21329639889196675
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21300138312586445
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.212707182320442
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21241379310344827
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00579 |
| Iteration     | 27       |
| MaximumReturn | -0.00423 |
| MinimumReturn | -0.00727 |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025948941707611084
Validation loss = 0.02409224957227707
Validation loss = 0.026042968034744263
Validation loss = 0.02600966952741146
Validation loss = 0.023727616295218468
Validation loss = 0.02465224266052246
Validation loss = 0.028551721945405006
Validation loss = 0.02479042299091816
Validation loss = 0.025624627247452736
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025322578847408295
Validation loss = 0.02329775132238865
Validation loss = 0.023383162915706635
Validation loss = 0.023559555411338806
Validation loss = 0.023107096552848816
Validation loss = 0.024440789595246315
Validation loss = 0.023863187059760094
Validation loss = 0.026098385453224182
Validation loss = 0.024748509749770164
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024041621014475822
Validation loss = 0.025093957781791687
Validation loss = 0.022880837321281433
Validation loss = 0.023124590516090393
Validation loss = 0.022893009707331657
Validation loss = 0.024389514699578285
Validation loss = 0.023534363135695457
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02458822727203369
Validation loss = 0.022611333057284355
Validation loss = 0.025598132982850075
Validation loss = 0.024548880755901337
Validation loss = 0.024266928434371948
Validation loss = 0.022895554080605507
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.028350025415420532
Validation loss = 0.02232062630355358
Validation loss = 0.02454335242509842
Validation loss = 0.023487964645028114
Validation loss = 0.022962642833590508
Validation loss = 0.024174487218260765
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21212121212121213
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21182943603851445
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21153846153846154
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2112482853223594
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21095890410958903
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2106703146374829
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2103825136612022
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2100954979536153
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2098092643051771
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20952380952380953
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20923913043478262
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.208955223880597
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.21002710027100271
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20974289580514208
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20945945945945946
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20917678812415655
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20889487870619947
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20861372812920592
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20833333333333334
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2080536912751678
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20777479892761394
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20883534136546184
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20855614973262032
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2082777036048064
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.208
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10.1    |
| Iteration     | 28       |
| MaximumReturn | -0.0424  |
| MinimumReturn | -44      |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023262618109583855
Validation loss = 0.023526040837168694
Validation loss = 0.021935950964689255
Validation loss = 0.02347603254020214
Validation loss = 0.022930920124053955
Validation loss = 0.024703169241547585
Validation loss = 0.022569341585040092
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025320792570710182
Validation loss = 0.022589826956391335
Validation loss = 0.022781746461987495
Validation loss = 0.023589029908180237
Validation loss = 0.02349274232983589
Validation loss = 0.02416309341788292
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021670157089829445
Validation loss = 0.022085953503847122
Validation loss = 0.023114832118153572
Validation loss = 0.02288166433572769
Validation loss = 0.02196626365184784
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024120299145579338
Validation loss = 0.022454561665654182
Validation loss = 0.021119194105267525
Validation loss = 0.022525668144226074
Validation loss = 0.022247305139899254
Validation loss = 0.02140911854803562
Validation loss = 0.024887938052415848
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023822112008929253
Validation loss = 0.02483174577355385
Validation loss = 0.023627731949090958
Validation loss = 0.024547919631004333
Validation loss = 0.02255464345216751
Validation loss = 0.022636065259575844
Validation loss = 0.022816648706793785
Validation loss = 0.022805137559771538
Validation loss = 0.023305052891373634
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20772303595206393
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2074468085106383
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20717131474103587
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20689655172413793
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20662251655629138
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20634920634920634
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20607661822985468
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20580474934036938
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20685111989459815
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20657894736842106
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2076215505913272
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2073490813648294
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20707732634338138
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20680628272251309
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2065359477124183
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.206266318537859
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20599739243807041
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20572916666666666
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.20806241872561768
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2077922077922078
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20752269779507135
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20725388601036268
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2069857697283312
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20671834625322996
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2064516129032258
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.51    |
| Iteration     | 29       |
| MaximumReturn | -0.0315  |
| MinimumReturn | -36.4    |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023403655737638474
Validation loss = 0.024298829957842827
Validation loss = 0.022545931860804558
Validation loss = 0.02360544167459011
Validation loss = 0.024522587656974792
Validation loss = 0.02618485689163208
Validation loss = 0.025867817923426628
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023304998874664307
Validation loss = 0.02481655590236187
Validation loss = 0.02361360378563404
Validation loss = 0.02434692159295082
Validation loss = 0.025208372622728348
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0236273854970932
Validation loss = 0.023081541061401367
Validation loss = 0.023170657455921173
Validation loss = 0.02303582988679409
Validation loss = 0.02322208695113659
Validation loss = 0.0226727444678545
Validation loss = 0.023418886587023735
Validation loss = 0.02982763759791851
Validation loss = 0.022012444213032722
Validation loss = 0.021200798451900482
Validation loss = 0.022657427936792374
Validation loss = 0.023791024461388588
Validation loss = 0.024599960073828697
Validation loss = 0.022401733323931694
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024304993450641632
Validation loss = 0.022590091452002525
Validation loss = 0.022533128038048744
Validation loss = 0.022775180637836456
Validation loss = 0.023201756179332733
Validation loss = 0.024248264729976654
Validation loss = 0.024125218391418457
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02582458406686783
Validation loss = 0.022306013852357864
Validation loss = 0.025543468073010445
Validation loss = 0.023717913776636124
Validation loss = 0.024498967453837395
Validation loss = 0.023913182318210602
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20618556701030927
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2059202059202059
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20565552699228792
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20539152759948653
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20512820512820512
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20486555697823303
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20460358056265984
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20434227330779056
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20408163265306123
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20382165605095542
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2035623409669211
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20330368487928843
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20304568527918782
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20278833967046894
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20253164556962025
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.202275600505689
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20202020202020202
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.201765447667087
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20151133501259447
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20125786163522014
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20100502512562815
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20075282308657466
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20050125313283207
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2002503128911139
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00585 |
| Iteration     | 30       |
| MaximumReturn | -0.00408 |
| MinimumReturn | -0.00804 |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02699906751513481
Validation loss = 0.02393178455531597
Validation loss = 0.023542284965515137
Validation loss = 0.023954499512910843
Validation loss = 0.024857819080352783
Validation loss = 0.023923439905047417
Validation loss = 0.024161173030734062
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024992406368255615
Validation loss = 0.0238331388682127
Validation loss = 0.02439888007938862
Validation loss = 0.022512244060635567
Validation loss = 0.02409781701862812
Validation loss = 0.027057724073529243
Validation loss = 0.026854105293750763
Validation loss = 0.024198798462748528
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023827197030186653
Validation loss = 0.02253401279449463
Validation loss = 0.02272425964474678
Validation loss = 0.023717045783996582
Validation loss = 0.023498037829995155
Validation loss = 0.02329830266535282
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026209387928247452
Validation loss = 0.023182768374681473
Validation loss = 0.023421088233590126
Validation loss = 0.02113509364426136
Validation loss = 0.022766320034861565
Validation loss = 0.024948857724666595
Validation loss = 0.022121578454971313
Validation loss = 0.022408325225114822
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0257429052144289
Validation loss = 0.022078843787312508
Validation loss = 0.02400379627943039
Validation loss = 0.02579132653772831
Validation loss = 0.023510469123721123
Validation loss = 0.024522436782717705
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19975031210986266
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19950124688279303
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.199252801992528
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19900497512437812
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19875776397515527
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19851116625310175
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1982651796778191
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19801980198019803
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19777503090234858
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19753086419753085
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19728729963008632
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19704433497536947
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1968019680196802
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19656019656019655
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19631901840490798
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19607843137254902
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19583843329253367
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19559902200489
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19536019536019536
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1951219512195122
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19488428745432398
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19464720194647203
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19441069258809235
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1941747572815534
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19393939393939394
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00842 |
| Iteration     | 31       |
| MaximumReturn | -0.00588 |
| MinimumReturn | -0.0118  |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023973373696208
Validation loss = 0.022664424031972885
Validation loss = 0.0231830682605505
Validation loss = 0.023624492809176445
Validation loss = 0.02241501957178116
Validation loss = 0.02335420437157154
Validation loss = 0.02299445867538452
Validation loss = 0.023122623562812805
Validation loss = 0.022918224334716797
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024932391941547394
Validation loss = 0.02148829586803913
Validation loss = 0.024459950625896454
Validation loss = 0.02202446386218071
Validation loss = 0.023378150537610054
Validation loss = 0.02310813218355179
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022087719291448593
Validation loss = 0.022496012970805168
Validation loss = 0.021810706704854965
Validation loss = 0.02198120392858982
Validation loss = 0.021751072257757187
Validation loss = 0.02174275740981102
Validation loss = 0.021789895370602608
Validation loss = 0.023744596168398857
Validation loss = 0.022181281819939613
Validation loss = 0.023243827745318413
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021416733041405678
Validation loss = 0.02199341543018818
Validation loss = 0.021714892238378525
Validation loss = 0.029565099626779556
Validation loss = 0.02246788889169693
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022257864475250244
Validation loss = 0.022540250793099403
Validation loss = 0.02092910185456276
Validation loss = 0.02406766451895237
Validation loss = 0.02271108329296112
Validation loss = 0.023062221705913544
Validation loss = 0.023250920698046684
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1937046004842615
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19347037484885127
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1932367149758454
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19300361881785283
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1927710843373494
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19253910950661854
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19230769230769232
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19207683073229292
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19184652278177458
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19161676646706588
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19138755980861244
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1911589008363202
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1909307875894988
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1907032181168057
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19047619047619047
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1902497027348395
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19002375296912113
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18979833926453143
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1895734597156398
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1893491124260355
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18912529550827423
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18890200708382526
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18867924528301888
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18845700824499412
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18823529411764706
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0445  |
| Iteration     | 32       |
| MaximumReturn | -0.0232  |
| MinimumReturn | -0.0842  |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022986235097050667
Validation loss = 0.02212098054587841
Validation loss = 0.02395445480942726
Validation loss = 0.02178354375064373
Validation loss = 0.021458229050040245
Validation loss = 0.03189260885119438
Validation loss = 0.022289978340268135
Validation loss = 0.02212207019329071
Validation loss = 0.023215455934405327
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02206955850124359
Validation loss = 0.023038338869810104
Validation loss = 0.02204895205795765
Validation loss = 0.022016124799847603
Validation loss = 0.02344602346420288
Validation loss = 0.02697608433663845
Validation loss = 0.022143499925732613
Validation loss = 0.022634238004684448
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022240789607167244
Validation loss = 0.020089009776711464
Validation loss = 0.020435944199562073
Validation loss = 0.02195831760764122
Validation loss = 0.021259905770421028
Validation loss = 0.027718478813767433
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020450636744499207
Validation loss = 0.020169587805867195
Validation loss = 0.02089918591082096
Validation loss = 0.02085546776652336
Validation loss = 0.022214533761143684
Validation loss = 0.020303910598158836
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021160989999771118
Validation loss = 0.021412165835499763
Validation loss = 0.021071001887321472
Validation loss = 0.022262465208768845
Validation loss = 0.02447655238211155
Validation loss = 0.02281302586197853
Validation loss = 0.021060368046164513
Validation loss = 0.02472788095474243
Validation loss = 0.02126651629805565
Validation loss = 0.020245954394340515
Validation loss = 0.02236458659172058
Validation loss = 0.021806543692946434
Validation loss = 0.02848370373249054
Validation loss = 0.02197161130607128
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18801410105757932
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18779342723004694
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18757327080890973
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1873536299765808
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1871345029239766
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18691588785046728
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1866977829638273
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1864801864801865
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18626309662398138
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18604651162790697
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18583042973286876
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18561484918793503
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1853997682502897
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18518518518518517
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18497109826589594
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18475750577367206
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1845444059976932
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18433179723502305
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18411967779056387
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1839080459770115
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18369690011481057
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1834862385321101
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18327605956471935
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18306636155606407
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18285714285714286
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0135  |
| Iteration     | 33       |
| MaximumReturn | -0.00857 |
| MinimumReturn | -0.0179  |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022008342668414116
Validation loss = 0.023433463647961617
Validation loss = 0.02047513797879219
Validation loss = 0.020899638533592224
Validation loss = 0.02106950245797634
Validation loss = 0.023374637588858604
Validation loss = 0.022616013884544373
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021544435992836952
Validation loss = 0.022491253912448883
Validation loss = 0.021190529689192772
Validation loss = 0.021091556176543236
Validation loss = 0.020929992198944092
Validation loss = 0.02215723507106304
Validation loss = 0.021432744339108467
Validation loss = 0.022731276229023933
Validation loss = 0.02215229906141758
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022972751408815384
Validation loss = 0.022067325189709663
Validation loss = 0.020188815891742706
Validation loss = 0.020034264773130417
Validation loss = 0.02077142335474491
Validation loss = 0.02122228778898716
Validation loss = 0.022226452827453613
Validation loss = 0.02367398701608181
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02059704065322876
Validation loss = 0.023374861106276512
Validation loss = 0.020763231441378593
Validation loss = 0.02233767695724964
Validation loss = 0.02124067395925522
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02045195735991001
Validation loss = 0.02205006591975689
Validation loss = 0.021216996014118195
Validation loss = 0.02499428018927574
Validation loss = 0.020733749493956566
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.182648401826484
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18244013683010263
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18223234624145787
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1820250284414107
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18181818181818182
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18161180476730987
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18140589569160998
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1812004530011325
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18099547511312217
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1807909604519774
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18058690744920994
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18038331454340473
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18018018018018017
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17997750281214847
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1797752808988764
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17957351290684623
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17937219730941703
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1791713325867861
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1789709172259508
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1787709497206704
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17857142857142858
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17837235228539577
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17817371937639198
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17797552836484984
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17777777777777778
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00078  |
| Iteration     | 34        |
| MaximumReturn | -0.000534 |
| MinimumReturn | -0.00115  |
| TotalSamples  | 59976     |
-----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022225739434361458
Validation loss = 0.021772373467683792
Validation loss = 0.021169833838939667
Validation loss = 0.021712129935622215
Validation loss = 0.020224900916218758
Validation loss = 0.022780267521739006
Validation loss = 0.021623902022838593
Validation loss = 0.021793151274323463
Validation loss = 0.020249105989933014
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02022835798561573
Validation loss = 0.020725250244140625
Validation loss = 0.0214613638818264
Validation loss = 0.02323751524090767
Validation loss = 0.021516751497983932
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018987763673067093
Validation loss = 0.02156788483262062
Validation loss = 0.018854288384318352
Validation loss = 0.021417910233139992
Validation loss = 0.020764539018273354
Validation loss = 0.021991362795233727
Validation loss = 0.025274790823459625
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020321916788816452
Validation loss = 0.020175641402602196
Validation loss = 0.020089615136384964
Validation loss = 0.0201573483645916
Validation loss = 0.02080031856894493
Validation loss = 0.01975957304239273
Validation loss = 0.021697821095585823
Validation loss = 0.020708831027150154
Validation loss = 0.02365756966173649
Validation loss = 0.021067800000309944
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020615994930267334
Validation loss = 0.020813187584280968
Validation loss = 0.02186821587383747
Validation loss = 0.02071707881987095
Validation loss = 0.02312014438211918
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17758046614872364
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17738359201773837
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17718715393134
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17699115044247787
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17679558011049723
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17660044150110377
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17640573318632854
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1762114537444934
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17601760176017603
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17582417582417584
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1756311745334797
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17543859649122806
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17524644030668127
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.175054704595186
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17486338797814208
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17467248908296942
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17448200654307525
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17429193899782136
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17410228509249184
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17391304347826086
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1737242128121607
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1735357917570499
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1733477789815818
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17316017316017315
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17297297297297298
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0304  |
| Iteration     | 35       |
| MaximumReturn | -0.0167  |
| MinimumReturn | -0.0472  |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021019132807850838
Validation loss = 0.021694021299481392
Validation loss = 0.02095838636159897
Validation loss = 0.02190982922911644
Validation loss = 0.02149786800146103
Validation loss = 0.021785413846373558
Validation loss = 0.02353711798787117
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02141479030251503
Validation loss = 0.0242758858948946
Validation loss = 0.021926378831267357
Validation loss = 0.020901424810290337
Validation loss = 0.02121019922196865
Validation loss = 0.021810535341501236
Validation loss = 0.021786658093333244
Validation loss = 0.022942161187529564
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020622070878744125
Validation loss = 0.020707422867417336
Validation loss = 0.021858658641576767
Validation loss = 0.019443433731794357
Validation loss = 0.019830385223031044
Validation loss = 0.019993575289845467
Validation loss = 0.02018154226243496
Validation loss = 0.019873233512043953
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019527766853570938
Validation loss = 0.01961749792098999
Validation loss = 0.022066988050937653
Validation loss = 0.021326977759599686
Validation loss = 0.021442046388983727
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023249860852956772
Validation loss = 0.02514481730759144
Validation loss = 0.02122417651116848
Validation loss = 0.020989157259464264
Validation loss = 0.02205185778439045
Validation loss = 0.02140488475561142
Validation loss = 0.020961256697773933
Validation loss = 0.01995428092777729
Validation loss = 0.022179817780852318
Validation loss = 0.020846813917160034
Validation loss = 0.020310722291469574
Validation loss = 0.01999763213098049
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17278617710583152
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1725997842502697
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1724137931034483
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1722282023681378
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17204301075268819
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17185821697099893
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17167381974248927
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1714898177920686
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17130620985010706
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1711229946524064
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17094017094017094
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17075773745997866
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17057569296375266
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1703940362087327
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1702127659574468
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17003188097768332
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16985138004246284
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16967126193001061
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1694915254237288
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1693121693121693
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16913319238900634
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1689545934530095
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16877637130801687
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16859852476290832
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16842105263157894
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00104  |
| Iteration     | 36        |
| MaximumReturn | -0.000722 |
| MinimumReturn | -0.00161  |
| TotalSamples  | 63308     |
-----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02114349976181984
Validation loss = 0.02077518403530121
Validation loss = 0.02188693918287754
Validation loss = 0.024412037804722786
Validation loss = 0.021638238802552223
Validation loss = 0.021746067330241203
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022415855899453163
Validation loss = 0.022236693650484085
Validation loss = 0.022795705124735832
Validation loss = 0.023244787007570267
Validation loss = 0.022996144369244576
Validation loss = 0.020825179293751717
Validation loss = 0.02867932617664337
Validation loss = 0.020756226032972336
Validation loss = 0.020533256232738495
Validation loss = 0.021460333839058876
Validation loss = 0.02501155622303486
Validation loss = 0.022909322753548622
Validation loss = 0.0211680568754673
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023047205060720444
Validation loss = 0.020351139828562737
Validation loss = 0.019479740411043167
Validation loss = 0.02500639483332634
Validation loss = 0.02058192528784275
Validation loss = 0.019444430246949196
Validation loss = 0.019240759313106537
Validation loss = 0.021324563771486282
Validation loss = 0.019572844728827477
Validation loss = 0.02044413425028324
Validation loss = 0.019500333815813065
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019849533215165138
Validation loss = 0.02230650931596756
Validation loss = 0.020774031057953835
Validation loss = 0.019371695816516876
Validation loss = 0.020684614777565002
Validation loss = 0.020964691415429115
Validation loss = 0.020394112914800644
Validation loss = 0.020050136372447014
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021480899304151535
Validation loss = 0.020512614399194717
Validation loss = 0.021543366834521294
Validation loss = 0.02025650441646576
Validation loss = 0.023103153333067894
Validation loss = 0.020151136443018913
Validation loss = 0.023064276203513145
Validation loss = 0.020652173087000847
Validation loss = 0.02264581434428692
Validation loss = 0.022670097649097443
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16824395373291273
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16806722689075632
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16789087093389296
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16771488469601678
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16753926701570682
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16736401673640167
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1671891327063741
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16701461377870563
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16684045881126172
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16666666666666666
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16649323621227888
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16632016632016633
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16614745586708204
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16597510373443983
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16580310880829016
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16563146997929606
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1654601861427094
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1652892561983471
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1651186790505676
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16494845360824742
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.164778578784758
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1646090534979424
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1644398766700925
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16427104722792607
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1641025641025641
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00094  |
| Iteration     | 37        |
| MaximumReturn | -0.000659 |
| MinimumReturn | -0.00125  |
| TotalSamples  | 64974     |
-----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021620551124215126
Validation loss = 0.021660219877958298
Validation loss = 0.02135087549686432
Validation loss = 0.021076513454318047
Validation loss = 0.023762337863445282
Validation loss = 0.02195475809276104
Validation loss = 0.023149807006120682
Validation loss = 0.020403750240802765
Validation loss = 0.022190112620592117
Validation loss = 0.021196847781538963
Validation loss = 0.021171879023313522
Validation loss = 0.021011780947446823
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020115140825510025
Validation loss = 0.02169320546090603
Validation loss = 0.022173408418893814
Validation loss = 0.021351877599954605
Validation loss = 0.021717876195907593
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01975133828818798
Validation loss = 0.020434148609638214
Validation loss = 0.019027244299650192
Validation loss = 0.02097804844379425
Validation loss = 0.01856909692287445
Validation loss = 0.019414234906435013
Validation loss = 0.01921270228922367
Validation loss = 0.019300315529108047
Validation loss = 0.019768811762332916
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021805165335536003
Validation loss = 0.019750144332647324
Validation loss = 0.02049858309328556
Validation loss = 0.020896537229418755
Validation loss = 0.020927131175994873
Validation loss = 0.019406089559197426
Validation loss = 0.020628420636057854
Validation loss = 0.021725263446569443
Validation loss = 0.02035028114914894
Validation loss = 0.021404774859547615
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019607234746217728
Validation loss = 0.020536864176392555
Validation loss = 0.020848780870437622
Validation loss = 0.022031359374523163
Validation loss = 0.020261071622371674
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16393442622950818
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16376663254861823
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16359918200409
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1634320735444331
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16326530612244897
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16309887869520898
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1629327902240326
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16276703967446593
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16260162601626016
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16243654822335024
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16227180527383367
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16210739614994935
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16194331983805668
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16177957532861476
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16161616161616163
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16145307769929365
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16129032258064516
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16112789526686808
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16096579476861167
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16080402010050251
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1606425702811245
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.160481444332999
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16032064128256512
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16016016016016016
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00654 |
| Iteration     | 38       |
| MaximumReturn | -0.00485 |
| MinimumReturn | -0.0086  |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02025729790329933
Validation loss = 0.020849591121077538
Validation loss = 0.02161601185798645
Validation loss = 0.023410804569721222
Validation loss = 0.021124059334397316
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020020965486764908
Validation loss = 0.020593056455254555
Validation loss = 0.020071325823664665
Validation loss = 0.019956959411501884
Validation loss = 0.02108941040933132
Validation loss = 0.020126594230532646
Validation loss = 0.019741082563996315
Validation loss = 0.02188335545361042
Validation loss = 0.022201422601938248
Validation loss = 0.02145102433860302
Validation loss = 0.020304705947637558
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019297011196613312
Validation loss = 0.017824258655309677
Validation loss = 0.019060295075178146
Validation loss = 0.01817013882100582
Validation loss = 0.018672209233045578
Validation loss = 0.0181429460644722
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020034585148096085
Validation loss = 0.019652800634503365
Validation loss = 0.02059091441333294
Validation loss = 0.022999335080385208
Validation loss = 0.021324876695871353
Validation loss = 0.020589197054505348
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02107115648686886
Validation loss = 0.020488642156124115
Validation loss = 0.02371367998421192
Validation loss = 0.019483836367726326
Validation loss = 0.02013954520225525
Validation loss = 0.01900419592857361
Validation loss = 0.022323939949274063
Validation loss = 0.022536100819706917
Validation loss = 0.020059648901224136
Validation loss = 0.0201147198677063
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15984015984015984
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1596806387225549
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15952143569292124
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1593625498007968
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15920398009950248
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15904572564612326
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15888778550148958
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15873015873015872
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15857284440039643
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15841584158415842
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1582591493570722
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15810276679841898
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1579466929911155
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15779092702169625
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15763546798029557
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15748031496062992
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15732546705998032
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15717092337917485
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15701668302257116
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1568627450980392
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15670910871694418
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15655577299412915
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15640273704789834
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15625
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15609756097560976
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0014   |
| Iteration     | 39        |
| MaximumReturn | -0.000824 |
| MinimumReturn | -0.00165  |
| TotalSamples  | 68306     |
-----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021532054990530014
Validation loss = 0.019736913964152336
Validation loss = 0.020029308274388313
Validation loss = 0.020123055204749107
Validation loss = 0.020846577361226082
Validation loss = 0.020721448585391045
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020912282168865204
Validation loss = 0.02630302682518959
Validation loss = 0.020908618345856667
Validation loss = 0.01961144059896469
Validation loss = 0.019653474912047386
Validation loss = 0.02012704499065876
Validation loss = 0.01937676966190338
Validation loss = 0.0193114522844553
Validation loss = 0.021616090089082718
Validation loss = 0.019921060651540756
Validation loss = 0.020705707371234894
Validation loss = 0.020838988944888115
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01990904100239277
Validation loss = 0.01852281391620636
Validation loss = 0.02178068831562996
Validation loss = 0.0203898623585701
Validation loss = 0.018435688689351082
Validation loss = 0.018456844612956047
Validation loss = 0.01890208199620247
Validation loss = 0.018177025020122528
Validation loss = 0.018600041046738625
Validation loss = 0.01927829347550869
Validation loss = 0.018591709434986115
Validation loss = 0.021074999123811722
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02002362348139286
Validation loss = 0.01990981213748455
Validation loss = 0.018745535984635353
Validation loss = 0.02069373056292534
Validation loss = 0.020155249163508415
Validation loss = 0.019813209772109985
Validation loss = 0.01958874613046646
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021239230409264565
Validation loss = 0.019489450380206108
Validation loss = 0.020638909190893173
Validation loss = 0.02239346131682396
Validation loss = 0.0186814833432436
Validation loss = 0.01870601624250412
Validation loss = 0.020808609202504158
Validation loss = 0.019203903153538704
Validation loss = 0.01953190006315708
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15594541910331383
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15579357351509251
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1556420233463035
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1554907677356657
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1553398058252427
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15518913676042678
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15503875968992248
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15488867376573087
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15473887814313347
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15458937198067632
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15444015444015444
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15429122468659595
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15414258188824662
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15399422521655437
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15384615384615385
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15369836695485112
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15355086372360843
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15340364333652926
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1532567049808429
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15311004784688995
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15296367112810708
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15281757402101243
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15267175572519084
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15252621544327932
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1523809523809524
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00223 |
| Iteration     | 40       |
| MaximumReturn | -0.0013  |
| MinimumReturn | -0.00284 |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02113441191613674
Validation loss = 0.020565779879689217
Validation loss = 0.019859101623296738
Validation loss = 0.020579390227794647
Validation loss = 0.0196418147534132
Validation loss = 0.020855482667684555
Validation loss = 0.02349109575152397
Validation loss = 0.01838354766368866
Validation loss = 0.022415481507778168
Validation loss = 0.02194015495479107
Validation loss = 0.019949670881032944
Validation loss = 0.021069858223199844
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022649310529232025
Validation loss = 0.01993723213672638
Validation loss = 0.021266058087348938
Validation loss = 0.020186033099889755
Validation loss = 0.019881144165992737
Validation loss = 0.02067977748811245
Validation loss = 0.020063739269971848
Validation loss = 0.027150312438607216
Validation loss = 0.021693522110581398
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019024984911084175
Validation loss = 0.018409768119454384
Validation loss = 0.019915234297513962
Validation loss = 0.017799271270632744
Validation loss = 0.020423343405127525
Validation loss = 0.019095443189144135
Validation loss = 0.017899906262755394
Validation loss = 0.017570771276950836
Validation loss = 0.020366016775369644
Validation loss = 0.01765565760433674
Validation loss = 0.018258923664689064
Validation loss = 0.018470048904418945
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0192648284137249
Validation loss = 0.020360251888632774
Validation loss = 0.01972527615725994
Validation loss = 0.021227270364761353
Validation loss = 0.021355988457798958
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019582780078053474
Validation loss = 0.021331310272216797
Validation loss = 0.018893998116254807
Validation loss = 0.02055501379072666
Validation loss = 0.019926657900214195
Validation loss = 0.019478708505630493
Validation loss = 0.022862978279590607
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1522359657469077
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1520912547528517
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15194681861348527
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15180265654648956
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15165876777251186
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15151515151515152
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15137180700094607
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15122873345935728
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1510859301227573
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1509433962264151
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15080113100848255
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15065913370998116
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15051740357478832
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15037593984962405
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15023474178403756
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.150093808630394
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14995313964386128
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.149812734082397
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14967259120673526
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14953271028037382
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14939309056956115
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14925373134328357
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14911463187325255
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.148975791433892
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14883720930232558
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00927 |
| Iteration     | 41       |
| MaximumReturn | -0.00537 |
| MinimumReturn | -0.0153  |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02191467024385929
Validation loss = 0.020197123289108276
Validation loss = 0.020086290314793587
Validation loss = 0.019895246252417564
Validation loss = 0.020222611725330353
Validation loss = 0.019292457029223442
Validation loss = 0.01873295195400715
Validation loss = 0.019933171570301056
Validation loss = 0.01999659650027752
Validation loss = 0.021067460998892784
Validation loss = 0.021770579740405083
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019266402348876
Validation loss = 0.019672026857733727
Validation loss = 0.01939396746456623
Validation loss = 0.019508205354213715
Validation loss = 0.01925455406308174
Validation loss = 0.019734952598810196
Validation loss = 0.019452977925539017
Validation loss = 0.020067770034074783
Validation loss = 0.02380755916237831
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018294664099812508
Validation loss = 0.023622969165444374
Validation loss = 0.01844378001987934
Validation loss = 0.01886143907904625
Validation loss = 0.017213094979524612
Validation loss = 0.016934512183070183
Validation loss = 0.018095454201102257
Validation loss = 0.0186323132365942
Validation loss = 0.01734798029065132
Validation loss = 0.01818166673183441
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020434174686670303
Validation loss = 0.01983480714261532
Validation loss = 0.02075166627764702
Validation loss = 0.020769242197275162
Validation loss = 0.018839871510863304
Validation loss = 0.0225045308470726
Validation loss = 0.01808415725827217
Validation loss = 0.018125956878066063
Validation loss = 0.0194869264960289
Validation loss = 0.02093108370900154
Validation loss = 0.02021659165620804
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018542177975177765
Validation loss = 0.019554024562239647
Validation loss = 0.019965626299381256
Validation loss = 0.021719327196478844
Validation loss = 0.01886487938463688
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14869888475836432
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14856081708449395
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14842300556586271
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14828544949026876
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14814814814814814
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14801110083256244
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1478743068391867
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14773776546629733
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14760147601476015
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14746543778801843
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14732965009208104
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14719411223551057
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14705882352941177
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14692378328741965
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14678899082568808
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1466544454628781
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14652014652014653
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1463860933211345
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14625228519195613
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1461187214611872
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.145985401459854
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14585232452142205
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14571948998178508
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14558689717925385
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14545454545454545
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000942 |
| Iteration     | 42        |
| MaximumReturn | -0.000628 |
| MinimumReturn | -0.00186  |
| TotalSamples  | 73304     |
-----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020579010248184204
Validation loss = 0.01963169127702713
Validation loss = 0.020113393664360046
Validation loss = 0.020721303299069405
Validation loss = 0.020878374576568604
Validation loss = 0.020169604569673538
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021965235471725464
Validation loss = 0.01898656040430069
Validation loss = 0.01903245598077774
Validation loss = 0.019517019391059875
Validation loss = 0.018242686986923218
Validation loss = 0.018700648099184036
Validation loss = 0.019666245207190514
Validation loss = 0.01934218406677246
Validation loss = 0.01974734291434288
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019478049129247665
Validation loss = 0.019924301654100418
Validation loss = 0.01862775720655918
Validation loss = 0.017737407237291336
Validation loss = 0.017868025228381157
Validation loss = 0.020953727886080742
Validation loss = 0.01790904998779297
Validation loss = 0.019245712086558342
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019199855625629425
Validation loss = 0.020297907292842865
Validation loss = 0.020093010738492012
Validation loss = 0.019598864018917084
Validation loss = 0.019833087921142578
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01908179372549057
Validation loss = 0.01976826973259449
Validation loss = 0.020624913275241852
Validation loss = 0.01998770609498024
Validation loss = 0.020112723112106323
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14532243415077203
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14519056261343014
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14505893019038985
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14492753623188406
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14479638009049775
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14466546112115733
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14453477868112014
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1444043321299639
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1442741208295762
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14414414414414414
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14401440144014402
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14388489208633093
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14375561545372867
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1436265709156194
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14349775784753363
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14336917562724014
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1432408236347359
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14311270125223613
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14298480786416443
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14285714285714285
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14272970561998216
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14260249554367202
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14247551202137132
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1423487544483986
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14222222222222222
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00167 |
| Iteration     | 43       |
| MaximumReturn | -0.00113 |
| MinimumReturn | -0.00222 |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02022687718272209
Validation loss = 0.0186446700245142
Validation loss = 0.01985684409737587
Validation loss = 0.020950371399521828
Validation loss = 0.019939085468649864
Validation loss = 0.020012134686112404
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01875651441514492
Validation loss = 0.019115060567855835
Validation loss = 0.0205509252846241
Validation loss = 0.01798284612596035
Validation loss = 0.019239913672208786
Validation loss = 0.018501896411180496
Validation loss = 0.01832837052643299
Validation loss = 0.02008877322077751
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018001725897192955
Validation loss = 0.018146485090255737
Validation loss = 0.01760403998196125
Validation loss = 0.020873187109827995
Validation loss = 0.018606234341859818
Validation loss = 0.01776060275733471
Validation loss = 0.017607728019356728
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019968777894973755
Validation loss = 0.01956726238131523
Validation loss = 0.01915140450000763
Validation loss = 0.020144041627645493
Validation loss = 0.01847882941365242
Validation loss = 0.021105017513036728
Validation loss = 0.0185724887996912
Validation loss = 0.01970796100795269
Validation loss = 0.018624503165483475
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020369958132505417
Validation loss = 0.017927099019289017
Validation loss = 0.020451968535780907
Validation loss = 0.019151322543621063
Validation loss = 0.020770976319909096
Validation loss = 0.017632048577070236
Validation loss = 0.02092975191771984
Validation loss = 0.019646072760224342
Validation loss = 0.018900906667113304
Validation loss = 0.022225288674235344
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14209591474245115
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1419698314108252
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14184397163120568
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.141718334809566
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1415929203539823
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14146772767462423
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1413427561837456
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1412180052956752
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14109347442680775
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14096916299559473
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14084507042253522
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14072119613016712
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.140597539543058
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14047410008779632
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14035087719298245
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14022787028921999
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14010507880910683
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1399825021872266
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13986013986013987
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13973799126637554
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13961605584642234
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13949433304272013
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13937282229965156
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1392515230635335
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1391304347826087
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000866 |
| Iteration     | 44        |
| MaximumReturn | -0.000614 |
| MinimumReturn | -0.00114  |
| TotalSamples  | 76636     |
-----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018222661688923836
Validation loss = 0.019089164212346077
Validation loss = 0.01898646168410778
Validation loss = 0.01885099709033966
Validation loss = 0.019198304042220116
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01932103931903839
Validation loss = 0.019358769059181213
Validation loss = 0.019524188712239265
Validation loss = 0.018604937940835953
Validation loss = 0.01763121597468853
Validation loss = 0.01820765621960163
Validation loss = 0.01891271583735943
Validation loss = 0.019800228998064995
Validation loss = 0.020218728110194206
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01813557557761669
Validation loss = 0.017508942633867264
Validation loss = 0.0173318013548851
Validation loss = 0.01737520657479763
Validation loss = 0.018709974363446236
Validation loss = 0.017955297604203224
Validation loss = 0.016710294410586357
Validation loss = 0.018905678763985634
Validation loss = 0.018121078610420227
Validation loss = 0.01764781028032303
Validation loss = 0.01734393835067749
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01973043568432331
Validation loss = 0.02118562161922455
Validation loss = 0.018885593861341476
Validation loss = 0.019584959372878075
Validation loss = 0.019365064799785614
Validation loss = 0.01894461363554001
Validation loss = 0.020088352262973785
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019192244857549667
Validation loss = 0.01831674948334694
Validation loss = 0.019767064601182938
Validation loss = 0.018969813361763954
Validation loss = 0.019793041050434113
Validation loss = 0.021498097106814384
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13900955690703737
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1388888888888889
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13876843018213356
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1386481802426343
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13852813852813853
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1384083044982699
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13828867761452032
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1381692573402418
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13805004314063848
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13793103448275862
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13781223083548666
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13769363166953527
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13757523645743766
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13745704467353953
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13733905579399142
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.137221269296741
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13710368466152528
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.136986301369863
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13686911890504705
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13675213675213677
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13663535439795046
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13651877133105803
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13640238704177324
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1362862010221465
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13617021276595745
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000901 |
| Iteration     | 45        |
| MaximumReturn | -0.000573 |
| MinimumReturn | -0.00131  |
| TotalSamples  | 78302     |
-----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024250468239188194
Validation loss = 0.018502214923501015
Validation loss = 0.01998526230454445
Validation loss = 0.019660092890262604
Validation loss = 0.021162282675504684
Validation loss = 0.019738854840397835
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020327333360910416
Validation loss = 0.01969696208834648
Validation loss = 0.01946435123682022
Validation loss = 0.01868552900850773
Validation loss = 0.018308639526367188
Validation loss = 0.018614407628774643
Validation loss = 0.018628939986228943
Validation loss = 0.018398964777588844
Validation loss = 0.02010483853518963
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017268823459744453
Validation loss = 0.017792239785194397
Validation loss = 0.017066938802599907
Validation loss = 0.017727339640259743
Validation loss = 0.01878258027136326
Validation loss = 0.017873283475637436
Validation loss = 0.016319870948791504
Validation loss = 0.016687432304024696
Validation loss = 0.01950797252357006
Validation loss = 0.01694517582654953
Validation loss = 0.016792062669992447
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019245166331529617
Validation loss = 0.019598258659243584
Validation loss = 0.02081015147268772
Validation loss = 0.020092613995075226
Validation loss = 0.019710246473550797
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019882114604115486
Validation loss = 0.018630320206284523
Validation loss = 0.019563764333724976
Validation loss = 0.019523639231920242
Validation loss = 0.018869537860155106
Validation loss = 0.018539344891905785
Validation loss = 0.02210623025894165
Validation loss = 0.019855709746479988
Validation loss = 0.018760334700345993
Validation loss = 0.01827254146337509
Validation loss = 0.020565232262015343
Validation loss = 0.021320730447769165
Validation loss = 0.019326606765389442
Validation loss = 0.018681708723306656
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1360544217687075
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13593882752761258
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13582342954159593
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13570822731128074
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13559322033898305
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1354784081287045
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1353637901861252
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1352493660185968
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13513513513513514
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1358649789029536
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1357504215851602
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13563605728727884
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13552188552188552
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13540790580319595
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13529411764705881
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1351805205709488
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13506711409395974
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.134953897736798
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13484087102177555
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13472803347280335
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1346153846153846
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13450292397660818
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1343906510851419
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13427856547122602
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13416666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.6     |
| Iteration     | 46       |
| MaximumReturn | -0.0204  |
| MinimumReturn | -39      |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018847128376364708
Validation loss = 0.022234734147787094
Validation loss = 0.018521413207054138
Validation loss = 0.01776948757469654
Validation loss = 0.018485382199287415
Validation loss = 0.018877172842621803
Validation loss = 0.01901184767484665
Validation loss = 0.01895306631922722
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01721210405230522
Validation loss = 0.0189193245023489
Validation loss = 0.017513692378997803
Validation loss = 0.018092816695570946
Validation loss = 0.018162336200475693
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01802230067551136
Validation loss = 0.017208505421876907
Validation loss = 0.01817338541150093
Validation loss = 0.01686861924827099
Validation loss = 0.017873261123895645
Validation loss = 0.016951212659478188
Validation loss = 0.01622495986521244
Validation loss = 0.017453212291002274
Validation loss = 0.017692890018224716
Validation loss = 0.01617250218987465
Validation loss = 0.016985010355710983
Validation loss = 0.01721167005598545
Validation loss = 0.01691133715212345
Validation loss = 0.015769457444548607
Validation loss = 0.016487834975123405
Validation loss = 0.016612308099865913
Validation loss = 0.017146864905953407
Validation loss = 0.018379146233201027
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018158579245209694
Validation loss = 0.017954140901565552
Validation loss = 0.01933327317237854
Validation loss = 0.018931742757558823
Validation loss = 0.018248114734888077
Validation loss = 0.01775383949279785
Validation loss = 0.017057493329048157
Validation loss = 0.01843971759080887
Validation loss = 0.01891046203672886
Validation loss = 0.01786429062485695
Validation loss = 0.018557824194431305
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01834758184850216
Validation loss = 0.01944076083600521
Validation loss = 0.017400452867150307
Validation loss = 0.018095126375555992
Validation loss = 0.018240367993712425
Validation loss = 0.018520930781960487
Validation loss = 0.017293497920036316
Validation loss = 0.01836584322154522
Validation loss = 0.018142085522413254
Validation loss = 0.02106337621808052
Validation loss = 0.020290173590183258
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1340549542048293
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13394342762063227
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13383208645054032
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13372093023255813
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13360995850622406
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13349917081260365
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13338856669428334
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13327814569536423
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13316790736145576
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13305785123966943
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1329479768786127
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13283828382838284
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1327287716405606
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1326194398682043
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1325102880658436
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13240131578947367
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1322925225965489
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13218390804597702
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1320754716981132
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1319672131147541
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13185913185913187
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13175122749590834
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13164349959116925
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1315359477124183
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13142857142857142
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00277 |
| Iteration     | 47       |
| MaximumReturn | -0.00212 |
| MinimumReturn | -0.00395 |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018710020929574966
Validation loss = 0.01931876689195633
Validation loss = 0.01937010884284973
Validation loss = 0.018770810216665268
Validation loss = 0.0188857764005661
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018649214878678322
Validation loss = 0.0191938653588295
Validation loss = 0.01819603703916073
Validation loss = 0.018408428877592087
Validation loss = 0.01943342760205269
Validation loss = 0.019486557692289352
Validation loss = 0.01667223498225212
Validation loss = 0.018091771751642227
Validation loss = 0.017593864351511
Validation loss = 0.01896984502673149
Validation loss = 0.018099257722496986
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016724828630685806
Validation loss = 0.01755896769464016
Validation loss = 0.017023328691720963
Validation loss = 0.017258351668715477
Validation loss = 0.01616673171520233
Validation loss = 0.0176556296646595
Validation loss = 0.01890529692173004
Validation loss = 0.01785065233707428
Validation loss = 0.018494579941034317
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019345074892044067
Validation loss = 0.019056085497140884
Validation loss = 0.018038522452116013
Validation loss = 0.01882784441113472
Validation loss = 0.01746016927063465
Validation loss = 0.017465826123952866
Validation loss = 0.017949461936950684
Validation loss = 0.018535727635025978
Validation loss = 0.018369894474744797
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018287230283021927
Validation loss = 0.01879192888736725
Validation loss = 0.018337611109018326
Validation loss = 0.019081231206655502
Validation loss = 0.01962178945541382
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13132137030995106
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13121434392828035
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13110749185667753
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13100081366965013
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13089430894308943
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13078797725426483
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13068181818181818
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1305758313057583
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13047001620745544
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13036437246963561
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1302588996763754
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1301535974130962
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13004846526655897
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12994350282485875
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12983870967741937
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1297340854149879
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12962962962962962
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12952534191472245
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12942122186495178
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12931726907630522
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12921348314606743
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12910986367281477
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12900641025641027
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1289031224979984
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1288
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0102  |
| Iteration     | 48       |
| MaximumReturn | -0.00612 |
| MinimumReturn | -0.015   |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01822003908455372
Validation loss = 0.01937824860215187
Validation loss = 0.018783332780003548
Validation loss = 0.02013525366783142
Validation loss = 0.01801677979528904
Validation loss = 0.018048955127596855
Validation loss = 0.018005304038524628
Validation loss = 0.019376780837774277
Validation loss = 0.01967298611998558
Validation loss = 0.01793474145233631
Validation loss = 0.01903713122010231
Validation loss = 0.018346698954701424
Validation loss = 0.017188969999551773
Validation loss = 0.017106371000409126
Validation loss = 0.019551001489162445
Validation loss = 0.018322788178920746
Validation loss = 0.01921543851494789
Validation loss = 0.022479001432657242
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019290342926979065
Validation loss = 0.017247753217816353
Validation loss = 0.019192002713680267
Validation loss = 0.018353627994656563
Validation loss = 0.01802535355091095
Validation loss = 0.018416956067085266
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01692129112780094
Validation loss = 0.01755947433412075
Validation loss = 0.0175896268337965
Validation loss = 0.016087772324681282
Validation loss = 0.01625259406864643
Validation loss = 0.01694221794605255
Validation loss = 0.01745389588177204
Validation loss = 0.016413535922765732
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.019396863877773285
Validation loss = 0.01917491853237152
Validation loss = 0.017459096387028694
Validation loss = 0.019765617325901985
Validation loss = 0.018206944689154625
Validation loss = 0.01776011474430561
Validation loss = 0.017375793308019638
Validation loss = 0.017972605302929878
Validation loss = 0.01943882554769516
Validation loss = 0.018220707774162292
Validation loss = 0.018575774505734444
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018733583390712738
Validation loss = 0.019231196492910385
Validation loss = 0.01796194538474083
Validation loss = 0.020000062882900238
Validation loss = 0.0187383946031332
Validation loss = 0.018873605877161026
Validation loss = 0.019331524148583412
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1286970423661071
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12859424920127796
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12849162011173185
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1283891547049442
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12828685258964143
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12818471337579618
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1280827366746221
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12798092209856915
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1278792692613185
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12777777777777777
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12767644726407612
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12757527733755944
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12747426761678543
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.127373417721519
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12727272727272726
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12717219589257503
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1270718232044199
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12697160883280756
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12687155240346729
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12677165354330708
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12667191188040913
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12657232704402516
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12647289866457187
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12637362637362637
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12627450980392158
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000875 |
| Iteration     | 49        |
| MaximumReturn | -0.000676 |
| MinimumReturn | -0.00114  |
| TotalSamples  | 84966     |
-----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02005642093718052
Validation loss = 0.017164748162031174
Validation loss = 0.017885079607367516
Validation loss = 0.018380794674158096
Validation loss = 0.017572686076164246
Validation loss = 0.017525168135762215
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017675718292593956
Validation loss = 0.018177980557084084
Validation loss = 0.017614301294088364
Validation loss = 0.02010340243577957
Validation loss = 0.01846739836037159
Validation loss = 0.019666079431772232
Validation loss = 0.01873655617237091
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016431348398327827
Validation loss = 0.016373321413993835
Validation loss = 0.01765032671391964
Validation loss = 0.01726628839969635
Validation loss = 0.016877684742212296
Validation loss = 0.02114569954574108
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01747818849980831
Validation loss = 0.01740989461541176
Validation loss = 0.01822897419333458
Validation loss = 0.01793421432375908
Validation loss = 0.019650481641292572
Validation loss = 0.018226590007543564
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018288573250174522
Validation loss = 0.01916709914803505
Validation loss = 0.017322149127721786
Validation loss = 0.02085764706134796
Validation loss = 0.01903047226369381
Validation loss = 0.017623454332351685
Validation loss = 0.01766250468790531
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12617554858934169
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12607674236491778
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12597809076682315
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12587959343236904
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12578125
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12568306010928962
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12558502340093602
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1254871395167576
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12538940809968846
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12529182879377432
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12519440124416797
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1250971250971251
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12490302560124127
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1248062015503876
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12470952749806352
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12461300309597523
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12451662799690642
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12442040185471406
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12432432432432433
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12422839506172839
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12413261372397841
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12403697996918336
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.123941493456505
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12384615384615384
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00188 |
| Iteration     | 50       |
| MaximumReturn | -0.00134 |
| MinimumReturn | -0.00277 |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018049681559205055
Validation loss = 0.018050706014037132
Validation loss = 0.017288953065872192
Validation loss = 0.01872563175857067
Validation loss = 0.01795150153338909
Validation loss = 0.016742270439863205
Validation loss = 0.018516886979341507
Validation loss = 0.018490472808480263
Validation loss = 0.017394641414284706
Validation loss = 0.017623871564865112
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01728609949350357
Validation loss = 0.018447168171405792
Validation loss = 0.021177006885409355
Validation loss = 0.017143219709396362
Validation loss = 0.01654546894133091
Validation loss = 0.01778911054134369
Validation loss = 0.017806947231292725
Validation loss = 0.018487179651856422
Validation loss = 0.018651045858860016
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02130749076604843
Validation loss = 0.015833787620067596
Validation loss = 0.01687813363969326
Validation loss = 0.015625478699803352
Validation loss = 0.017343223094940186
Validation loss = 0.016990501433610916
Validation loss = 0.018534116446971893
Validation loss = 0.016997259110212326
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01700141467154026
Validation loss = 0.01727914996445179
Validation loss = 0.01717478036880493
Validation loss = 0.017716925591230392
Validation loss = 0.020764051005244255
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017606589943170547
Validation loss = 0.017456816509366035
Validation loss = 0.016660695895552635
Validation loss = 0.017856035381555557
Validation loss = 0.01802610605955124
Validation loss = 0.01702740788459778
Validation loss = 0.01807296648621559
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12375096079938509
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12365591397849462
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12356101304681504
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12346625766871165
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12337164750957855
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12327718223583461
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12318286151491967
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12308868501529052
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12299465240641712
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12290076335877863
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12280701754385964
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12271341463414634
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12261995430312261
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12252663622526636
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12243346007604562
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12234042553191489
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12224753227031131
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12215477996965099
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12206216830932524
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12196969696969696
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12187736563209689
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12178517397881997
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12169312169312169
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1216012084592145
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12150943396226416
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00154 |
| Iteration     | 51       |
| MaximumReturn | -0.00114 |
| MinimumReturn | -0.00219 |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016708489507436752
Validation loss = 0.019641555845737457
Validation loss = 0.017659835517406464
Validation loss = 0.01637445203959942
Validation loss = 0.016797950491309166
Validation loss = 0.019034085795283318
Validation loss = 0.01699734665453434
Validation loss = 0.018021931871771812
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017081337049603462
Validation loss = 0.018986938521265984
Validation loss = 0.020319486036896706
Validation loss = 0.016996851190924644
Validation loss = 0.01737288199365139
Validation loss = 0.021188165992498398
Validation loss = 0.017091771587729454
Validation loss = 0.016385678201913834
Validation loss = 0.01783655770123005
Validation loss = 0.017849832773208618
Validation loss = 0.017053164541721344
Validation loss = 0.01687757484614849
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016119563952088356
Validation loss = 0.016366906464099884
Validation loss = 0.016325844451785088
Validation loss = 0.015987683087587357
Validation loss = 0.01776755042374134
Validation loss = 0.015604841522872448
Validation loss = 0.01625840552151203
Validation loss = 0.017442699521780014
Validation loss = 0.01573200710117817
Validation loss = 0.015363180078566074
Validation loss = 0.016795814037322998
Validation loss = 0.01563039980828762
Validation loss = 0.01733078993856907
Validation loss = 0.01644297130405903
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017757318913936615
Validation loss = 0.01742537133395672
Validation loss = 0.018127361312508583
Validation loss = 0.01895662397146225
Validation loss = 0.017159709706902504
Validation loss = 0.017028816044330597
Validation loss = 0.024318115785717964
Validation loss = 0.016069801524281502
Validation loss = 0.01856132410466671
Validation loss = 0.01749151013791561
Validation loss = 0.01692873425781727
Validation loss = 0.01850847899913788
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018290050327777863
Validation loss = 0.01872466318309307
Validation loss = 0.017893558368086815
Validation loss = 0.017268044874072075
Validation loss = 0.019113393500447273
Validation loss = 0.017485536634922028
Validation loss = 0.01644664816558361
Validation loss = 0.017804114148020744
Validation loss = 0.0182664655148983
Validation loss = 0.017961621284484863
Validation loss = 0.018430208787322044
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12141779788838612
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12132629992464206
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12123493975903614
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12114371708051166
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12105263157894737
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12096168294515403
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12087087087087087
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1207801950487622
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1206896551724138
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12059925093632959
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12050898203592815
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12041884816753927
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1203288490284006
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12023898431665422
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12014925373134329
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12005965697240865
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11997019374068554
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11988086373790022
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11979166666666667
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11970260223048328
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11961367013372957
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11952487008166296
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11943620178041543
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11934766493699037
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11925925925925926
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000898 |
| Iteration     | 52        |
| MaximumReturn | -0.000577 |
| MinimumReturn | -0.00125  |
| TotalSamples  | 89964     |
-----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017949730157852173
Validation loss = 0.017037246376276016
Validation loss = 0.016580354422330856
Validation loss = 0.017771724611520767
Validation loss = 0.01629433035850525
Validation loss = 0.017908670008182526
Validation loss = 0.019115282222628593
Validation loss = 0.017411187291145325
Validation loss = 0.01719505526125431
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01748810149729252
Validation loss = 0.01833152025938034
Validation loss = 0.017926475033164024
Validation loss = 0.01752309501171112
Validation loss = 0.017433637753129005
Validation loss = 0.017815247178077698
Validation loss = 0.016810355708003044
Validation loss = 0.018539119511842728
Validation loss = 0.016633691266179085
Validation loss = 0.01776222325861454
Validation loss = 0.01865900307893753
Validation loss = 0.017322169616818428
Validation loss = 0.0170990452170372
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016448184847831726
Validation loss = 0.017486954107880592
Validation loss = 0.017122481018304825
Validation loss = 0.01514530461281538
Validation loss = 0.016042640432715416
Validation loss = 0.01586666703224182
Validation loss = 0.018429525196552277
Validation loss = 0.015788674354553223
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017923491075634956
Validation loss = 0.01718067191541195
Validation loss = 0.017244582995772362
Validation loss = 0.017078822478652
Validation loss = 0.018931973725557327
Validation loss = 0.018153192475438118
Validation loss = 0.017171062529087067
Validation loss = 0.016452277079224586
Validation loss = 0.017452402040362358
Validation loss = 0.01804397627711296
Validation loss = 0.019645588472485542
Validation loss = 0.017212802544236183
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017807159572839737
Validation loss = 0.0191770289093256
Validation loss = 0.01763102225959301
Validation loss = 0.018889857456088066
Validation loss = 0.016315629705786705
Validation loss = 0.017590632662177086
Validation loss = 0.01789928786456585
Validation loss = 0.018099356442689896
Validation loss = 0.01831875555217266
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11917098445595854
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11908284023668639
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11899482631189948
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1189069423929099
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11881918819188192
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1187315634218289
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11864406779661017
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11855670103092783
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11846946284032377
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11838235294117647
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11829537105069801
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11820851688693099
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11812179016874541
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11803519061583578
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11794871794871795
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11786237188872621
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11777615215801024
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11769005847953216
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11760409057706354
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11751824817518249
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1174325309992706
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11734693877551021
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11726147123088128
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11717612809315867
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11709090909090909
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0147  |
| Iteration     | 53       |
| MaximumReturn | -0.00765 |
| MinimumReturn | -0.024   |
| TotalSamples  | 91630    |
----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017466045916080475
Validation loss = 0.01836429163813591
Validation loss = 0.017423540353775024
Validation loss = 0.01692773960530758
Validation loss = 0.016799267381429672
Validation loss = 0.016486890614032745
Validation loss = 0.023815156891942024
Validation loss = 0.015757357701659203
Validation loss = 0.016757076606154442
Validation loss = 0.017184698954224586
Validation loss = 0.017389951273798943
Validation loss = 0.017577407881617546
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017352810129523277
Validation loss = 0.017873333767056465
Validation loss = 0.016970958560705185
Validation loss = 0.02016364224255085
Validation loss = 0.017188582569360733
Validation loss = 0.017026783898472786
Validation loss = 0.017582179978489876
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01575258933007717
Validation loss = 0.01601424813270569
Validation loss = 0.016640815883874893
Validation loss = 0.014925000257790089
Validation loss = 0.016691740602254868
Validation loss = 0.015517651103436947
Validation loss = 0.017260905355215073
Validation loss = 0.0189743023365736
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017138084396719933
Validation loss = 0.01722623035311699
Validation loss = 0.01834454946219921
Validation loss = 0.017886120826005936
Validation loss = 0.016906721517443657
Validation loss = 0.017922192811965942
Validation loss = 0.017675122246146202
Validation loss = 0.017948023974895477
Validation loss = 0.016320273280143738
Validation loss = 0.01718004234135151
Validation loss = 0.016618788242340088
Validation loss = 0.01678094081580639
Validation loss = 0.01682513765990734
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017601797357201576
Validation loss = 0.018082359805703163
Validation loss = 0.0187944658100605
Validation loss = 0.018048347905278206
Validation loss = 0.016559498384594917
Validation loss = 0.017587551847100258
Validation loss = 0.018055535852909088
Validation loss = 0.01713392697274685
Validation loss = 0.017598619684576988
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11700581395348837
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11692084241103849
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11683599419448476
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.116751269035533
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11666666666666667
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1165821868211441
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11649782923299566
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11641359363702097
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11632947976878613
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11624548736462094
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11616161616161616
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11607786589762076
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11599423631123919
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11591072714182865
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1158273381294964
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11574406901509705
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11566091954022989
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11557788944723618
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11549497847919656
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11541218637992831
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11532951289398281
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1152469577666428
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11516452074391989
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11508220157255182
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.115
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000964 |
| Iteration     | 54        |
| MaximumReturn | -0.000706 |
| MinimumReturn | -0.00135  |
| TotalSamples  | 93296     |
-----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01756775937974453
Validation loss = 0.016922220587730408
Validation loss = 0.016853787004947662
Validation loss = 0.01819777488708496
Validation loss = 0.01667006127536297
Validation loss = 0.017784805968403816
Validation loss = 0.016198014840483665
Validation loss = 0.018664583563804626
Validation loss = 0.01732688955962658
Validation loss = 0.016617601737380028
Validation loss = 0.01912669837474823
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01716185174882412
Validation loss = 0.017460348084568977
Validation loss = 0.017402639612555504
Validation loss = 0.016097309067845345
Validation loss = 0.01657485030591488
Validation loss = 0.01572239026427269
Validation loss = 0.017418483272194862
Validation loss = 0.017424482852220535
Validation loss = 0.017274467274546623
Validation loss = 0.01907031238079071
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017168857157230377
Validation loss = 0.016206447035074234
Validation loss = 0.016957584768533707
Validation loss = 0.016308611258864403
Validation loss = 0.01607540063560009
Validation loss = 0.0149513715878129
Validation loss = 0.01638610288500786
Validation loss = 0.015881335362792015
Validation loss = 0.016896318644285202
Validation loss = 0.017431532964110374
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016526667401194572
Validation loss = 0.017477387562394142
Validation loss = 0.017317162826657295
Validation loss = 0.01761528104543686
Validation loss = 0.016028814017772675
Validation loss = 0.01677771657705307
Validation loss = 0.01689678616821766
Validation loss = 0.01647927053272724
Validation loss = 0.018031436949968338
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019038820639252663
Validation loss = 0.01786382496356964
Validation loss = 0.017918724566698074
Validation loss = 0.01717308536171913
Validation loss = 0.01793498918414116
Validation loss = 0.020548423752188683
Validation loss = 0.01854369044303894
Validation loss = 0.017208263278007507
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11491791577444682
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11483594864479316
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11475409836065574
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11467236467236468
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11459074733096085
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11450924608819346
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11442786069651742
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11434659090909091
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11426543647977289
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11418439716312057
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11410347271438696
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11402266288951841
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11394196744515216
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11386138613861387
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1137809187279152
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1137005649717514
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11362032462949895
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11354019746121298
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11346018322762509
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11338028169014085
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11330049261083744
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11322081575246132
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11314125087842586
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11306179775280899
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11298245614035088
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00463 |
| Iteration     | 55       |
| MaximumReturn | -0.0032  |
| MinimumReturn | -0.00764 |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.017577920109033585
Validation loss = 0.016210773959755898
Validation loss = 0.017520848661661148
Validation loss = 0.016008352860808372
Validation loss = 0.015910139307379723
Validation loss = 0.016841337084770203
Validation loss = 0.01835155114531517
Validation loss = 0.01694241538643837
Validation loss = 0.016255952417850494
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017577236518263817
Validation loss = 0.01849658228456974
Validation loss = 0.016872428357601166
Validation loss = 0.018387336283922195
Validation loss = 0.01580815203487873
Validation loss = 0.016654647886753082
Validation loss = 0.015510492958128452
Validation loss = 0.01713172346353531
Validation loss = 0.019926805049180984
Validation loss = 0.017135517671704292
Validation loss = 0.018718183040618896
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0157663244754076
Validation loss = 0.015390402637422085
Validation loss = 0.01642368733882904
Validation loss = 0.01650884747505188
Validation loss = 0.015547463670372963
Validation loss = 0.015628423541784286
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016949854791164398
Validation loss = 0.018505312502384186
Validation loss = 0.016661180183291435
Validation loss = 0.016995418816804886
Validation loss = 0.01657557673752308
Validation loss = 0.01662861928343773
Validation loss = 0.01630590111017227
Validation loss = 0.017296766862273216
Validation loss = 0.017895255237817764
Validation loss = 0.01837713085114956
Validation loss = 0.016765890643000603
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017804425209760666
Validation loss = 0.01777588203549385
Validation loss = 0.018177548423409462
Validation loss = 0.017408933490514755
Validation loss = 0.01717142015695572
Validation loss = 0.018215011805295944
Validation loss = 0.017664184793829918
Validation loss = 0.01889885403215885
Validation loss = 0.01735604740679264
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11290322580645161
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11282410651716888
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11274509803921569
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11266620013995801
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11258741258741259
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11250873515024458
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11243016759776536
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11235170969993022
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11227336122733612
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11219512195121951
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11211699164345404
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11203897007654837
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11196105702364395
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11188325225851285
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11180555555555556
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11172796668979876
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11165048543689321
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11157311157311157
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11149584487534626
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11141868512110727
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11134163208852006
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11126468555632343
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1111878453038674
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1111111111111111
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11103448275862068
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000842 |
| Iteration     | 56        |
| MaximumReturn | -0.000657 |
| MinimumReturn | -0.00107  |
| TotalSamples  | 96628     |
-----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016884231939911842
Validation loss = 0.016833189874887466
Validation loss = 0.017772532999515533
Validation loss = 0.016000235453248024
Validation loss = 0.017717011272907257
Validation loss = 0.0169286597520113
Validation loss = 0.01680753566324711
Validation loss = 0.01699570193886757
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01625797338783741
Validation loss = 0.01607690006494522
Validation loss = 0.016434982419013977
Validation loss = 0.017075859010219574
Validation loss = 0.016888203099370003
Validation loss = 0.015735173597931862
Validation loss = 0.01637737825512886
Validation loss = 0.01884668506681919
Validation loss = 0.01738806627690792
Validation loss = 0.017188144847750664
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014098008163273335
Validation loss = 0.016637297347187996
Validation loss = 0.015910232439637184
Validation loss = 0.01595407910645008
Validation loss = 0.01588922180235386
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01688181422650814
Validation loss = 0.016027459874749184
Validation loss = 0.016684887930750847
Validation loss = 0.01755850575864315
Validation loss = 0.01637418381869793
Validation loss = 0.016696786507964134
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016536027193069458
Validation loss = 0.01665075123310089
Validation loss = 0.017237529158592224
Validation loss = 0.01868641935288906
Validation loss = 0.017950447276234627
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11095796002756719
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11088154269972451
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11080523055746731
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11072902338376892
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11065292096219931
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11057692307692307
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11050102951269733
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11042524005486969
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11034955448937629
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11027397260273973
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11019849418206708
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11012311901504789
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11004784688995216
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10997267759562841
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1098976109215017
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10982264665757162
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10974778459441036
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10967302452316076
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10959836623553437
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10952380952380952
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10944935418082936
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.109375
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10930074677528853
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10922659430122117
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10915254237288136
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00329 |
| Iteration     | 57       |
| MaximumReturn | -0.00183 |
| MinimumReturn | -0.00491 |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01699780859053135
Validation loss = 0.018005743622779846
Validation loss = 0.018287641927599907
Validation loss = 0.018188968300819397
Validation loss = 0.016475480049848557
Validation loss = 0.01632438227534294
Validation loss = 0.015660405158996582
Validation loss = 0.018726341426372528
Validation loss = 0.017775947228074074
Validation loss = 0.016837988048791885
Validation loss = 0.017546115443110466
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016902174800634384
Validation loss = 0.018699215725064278
Validation loss = 0.016658935695886612
Validation loss = 0.01581510901451111
Validation loss = 0.016555454581975937
Validation loss = 0.01587032712996006
Validation loss = 0.0163652915507555
Validation loss = 0.015595031902194023
Validation loss = 0.015965577214956284
Validation loss = 0.01733516901731491
Validation loss = 0.01669110171496868
Validation loss = 0.017112312838435173
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018180502578616142
Validation loss = 0.015430523082613945
Validation loss = 0.015541767701506615
Validation loss = 0.015065595507621765
Validation loss = 0.016321232542395592
Validation loss = 0.016164284199476242
Validation loss = 0.016397051513195038
Validation loss = 0.01574324071407318
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01716936007142067
Validation loss = 0.017097944393754005
Validation loss = 0.016846366226673126
Validation loss = 0.017340566962957382
Validation loss = 0.015729393810033798
Validation loss = 0.016887180507183075
Validation loss = 0.017720097675919533
Validation loss = 0.015936847776174545
Validation loss = 0.016766482964158058
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017067808657884598
Validation loss = 0.01731175184249878
Validation loss = 0.018551690503954887
Validation loss = 0.016622846946120262
Validation loss = 0.0178073737770319
Validation loss = 0.018272656947374344
Validation loss = 0.016490699723362923
Validation loss = 0.016586028039455414
Validation loss = 0.016412589699029922
Validation loss = 0.017703881487250328
Validation loss = 0.01735030487179756
Validation loss = 0.020780425518751144
Validation loss = 0.016455069184303284
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10907859078590786
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10900473933649289
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10893098782138025
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10885733603786342
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10878378378378378
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1087103308575287
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10863697705802969
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10856372218476062
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10849056603773585
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10841750841750841
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10834454912516824
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10827168796234028
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1081989247311828
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1081262592343855
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10805369127516778
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.107981220657277
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1079088471849866
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10783657066309445
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10776439089692101
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1076923076923077
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10762032085561497
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10754843019372078
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10747663551401869
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10740493662441628
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10733333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000901 |
| Iteration     | 58        |
| MaximumReturn | -0.000716 |
| MinimumReturn | -0.0013   |
| TotalSamples  | 99960     |
-----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016190724447369576
Validation loss = 0.017029719427227974
Validation loss = 0.016486212611198425
Validation loss = 0.015936145558953285
Validation loss = 0.016236819326877594
Validation loss = 0.015971504151821136
Validation loss = 0.015621921047568321
Validation loss = 0.015407124534249306
Validation loss = 0.017344661056995392
Validation loss = 0.015961267054080963
Validation loss = 0.01531736645847559
Validation loss = 0.016017403453588486
Validation loss = 0.016375228762626648
Validation loss = 0.016389865428209305
Validation loss = 0.01549895852804184
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017315154895186424
Validation loss = 0.016773110255599022
Validation loss = 0.016548072919249535
Validation loss = 0.017920708283782005
Validation loss = 0.016758671030402184
Validation loss = 0.015744628384709358
Validation loss = 0.016901681199669838
Validation loss = 0.017337042838335037
Validation loss = 0.015959827229380608
Validation loss = 0.016448384150862694
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015791980549693108
Validation loss = 0.01532004214823246
Validation loss = 0.015537257306277752
Validation loss = 0.015785064548254013
Validation loss = 0.01584377884864807
Validation loss = 0.016014814376831055
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016103239730000496
Validation loss = 0.017181485891342163
Validation loss = 0.017040161415934563
Validation loss = 0.016027500852942467
Validation loss = 0.01762465387582779
Validation loss = 0.015861690044403076
Validation loss = 0.019261999055743217
Validation loss = 0.016693394631147385
Validation loss = 0.01670147478580475
Validation loss = 0.01745036244392395
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016493961215019226
Validation loss = 0.01716730371117592
Validation loss = 0.01860485039651394
Validation loss = 0.01679103821516037
Validation loss = 0.01658429391682148
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1072618254497002
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10719041278295606
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10711909514304724
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10704787234042554
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10697674418604651
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10690571049136786
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10683477106834771
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10676392572944297
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10669317428760769
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10662251655629139
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10655195234943746
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10648148148148148
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10641110376734964
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10634081902245707
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10627062706270628
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10620052770448549
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1061305207646671
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10606060606060606
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10599078341013825
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10592105263157894
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10585141354372124
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10578186596583443
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1057124097176625
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10564304461942257
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10557377049180328
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0202  |
| Iteration     | 59       |
| MaximumReturn | -0.00993 |
| MinimumReturn | -0.0272  |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01619754359126091
Validation loss = 0.015537929721176624
Validation loss = 0.01526566781103611
Validation loss = 0.015501448884606361
Validation loss = 0.014935873448848724
Validation loss = 0.01747359335422516
Validation loss = 0.016132045537233353
Validation loss = 0.015635238960385323
Validation loss = 0.015855761244893074
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016747215762734413
Validation loss = 0.01712740771472454
Validation loss = 0.015788499265909195
Validation loss = 0.016663843765854836
Validation loss = 0.023055095225572586
Validation loss = 0.016507495194673538
Validation loss = 0.01663319766521454
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018071996048092842
Validation loss = 0.01598995365202427
Validation loss = 0.01590980589389801
Validation loss = 0.015312732197344303
Validation loss = 0.015096205286681652
Validation loss = 0.016139591112732887
Validation loss = 0.01693204790353775
Validation loss = 0.01760481484234333
Validation loss = 0.017010511830449104
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016267186030745506
Validation loss = 0.015898557379841805
Validation loss = 0.015489715151488781
Validation loss = 0.016619229689240456
Validation loss = 0.023801354691386223
Validation loss = 0.016156496480107307
Validation loss = 0.015213915146887302
Validation loss = 0.016550162807106972
Validation loss = 0.01631571352481842
Validation loss = 0.0205516554415226
Validation loss = 0.016071181744337082
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016142066568136215
Validation loss = 0.018279587849974632
Validation loss = 0.018475551158189774
Validation loss = 0.016859624534845352
Validation loss = 0.01677914336323738
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10550458715596331
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1054354944335298
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10536649214659685
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.105297580117724
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10522875816993464
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10516002612671456
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10509138381201044
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1050228310502283
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10495436766623208
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10488599348534201
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10481770833333333
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10474951203643461
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1046814044213264
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1046133853151397
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10454545454545454
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1044776119402985
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10440985732814527
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10434219053791316
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10427461139896373
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10420711974110032
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10413971539456662
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10407239819004525
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10400516795865633
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1039380245319561
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10387096774193548
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00903 |
| Iteration     | 60       |
| MaximumReturn | -0.00651 |
| MinimumReturn | -0.0128  |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016395652666687965
Validation loss = 0.019326483830809593
Validation loss = 0.014389229007065296
Validation loss = 0.015073657035827637
Validation loss = 0.014918598346412182
Validation loss = 0.01777905598282814
Validation loss = 0.01541772298514843
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015612181276082993
Validation loss = 0.015337878838181496
Validation loss = 0.01567043922841549
Validation loss = 0.016681930050253868
Validation loss = 0.015833819285035133
Validation loss = 0.016102682799100876
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01608899049460888
Validation loss = 0.015341979451477528
Validation loss = 0.015385105274617672
Validation loss = 0.015962984412908554
Validation loss = 0.015575836412608624
Validation loss = 0.01579585298895836
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015927445143461227
Validation loss = 0.016048643738031387
Validation loss = 0.018233919516205788
Validation loss = 0.016339706256985664
Validation loss = 0.016542071476578712
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018585382029414177
Validation loss = 0.019938908517360687
Validation loss = 0.016822511330246925
Validation loss = 0.016956817358732224
Validation loss = 0.015190175734460354
Validation loss = 0.015993107110261917
Validation loss = 0.018534308299422264
Validation loss = 0.015931155532598495
Validation loss = 0.016579562798142433
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1038039974210187
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10373711340206186
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10367031551835158
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1036036036036036
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10353697749196142
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10347043701799485
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10340398201669879
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10333761232349166
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10327132777421424
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1032051282051282
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1031390134529148
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10307298335467349
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10300703774792067
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10294117647058823
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10287539936102237
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10280970625798212
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10274409700063816
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10267857142857142
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10261312938177183
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10254777070063695
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10248249522597072
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10241730279898219
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10235219326128417
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.102287166454892
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10222222222222223
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000829 |
| Iteration     | 61        |
| MaximumReturn | -0.000643 |
| MinimumReturn | -0.00113  |
| TotalSamples  | 104958    |
-----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01674678735435009
Validation loss = 0.016016067937016487
Validation loss = 0.016385555267333984
Validation loss = 0.016446981579065323
Validation loss = 0.015025530941784382
Validation loss = 0.01609240658581257
Validation loss = 0.01471536885946989
Validation loss = 0.016386037692427635
Validation loss = 0.015196474269032478
Validation loss = 0.01608995907008648
Validation loss = 0.015313338488340378
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01678195782005787
Validation loss = 0.016888588666915894
Validation loss = 0.015559808351099491
Validation loss = 0.015343018807470798
Validation loss = 0.015155584551393986
Validation loss = 0.016783300787210464
Validation loss = 0.015596872195601463
Validation loss = 0.015550649724900723
Validation loss = 0.015977632254362106
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014849110506474972
Validation loss = 0.016090990975499153
Validation loss = 0.01474052481353283
Validation loss = 0.015431861393153667
Validation loss = 0.01874368265271187
Validation loss = 0.014750570990145206
Validation loss = 0.015289743430912495
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01567249558866024
Validation loss = 0.016172679141163826
Validation loss = 0.01652815379202366
Validation loss = 0.015837224200367928
Validation loss = 0.01723819598555565
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017634514719247818
Validation loss = 0.016672305762767792
Validation loss = 0.01605203002691269
Validation loss = 0.016193736344575882
Validation loss = 0.023345448076725006
Validation loss = 0.016224028542637825
Validation loss = 0.015757940709590912
Validation loss = 0.017121832817792892
Validation loss = 0.01541442796587944
Validation loss = 0.017320619896054268
Validation loss = 0.017689280211925507
Validation loss = 0.016790354624390602
Validation loss = 0.018436230719089508
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10215736040609137
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10209258084971465
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10202788339670468
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1019632678910703
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10189873417721519
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10183428209993675
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10176991150442478
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10170562223626027
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10164141414141414
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10157728706624605
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10151324085750316
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10144927536231885
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10138539042821158
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1013215859030837
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10125786163522013
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10119421747328725
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10113065326633165
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10106716886377903
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10100376411543287
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10094043887147336
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10087719298245613
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1008140262993112
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10075093867334167
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10068792995622264
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.100625
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00999 |
| Iteration     | 62       |
| MaximumReturn | -0.00629 |
| MinimumReturn | -0.015   |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01469667349010706
Validation loss = 0.015991808846592903
Validation loss = 0.014819607138633728
Validation loss = 0.014862380921840668
Validation loss = 0.017608538269996643
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016171082854270935
Validation loss = 0.016253123059868813
Validation loss = 0.015888303518295288
Validation loss = 0.015288504771888256
Validation loss = 0.015272104181349277
Validation loss = 0.014928956516087055
Validation loss = 0.015547391027212143
Validation loss = 0.014515356160700321
Validation loss = 0.015015890821814537
Validation loss = 0.01488678902387619
Validation loss = 0.01631462574005127
Validation loss = 0.01594678871333599
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014689077623188496
Validation loss = 0.014949334785342216
Validation loss = 0.015181263908743858
Validation loss = 0.014436213299632072
Validation loss = 0.015192091464996338
Validation loss = 0.015622344799339771
Validation loss = 0.014863366261124611
Validation loss = 0.01528021041303873
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018420299515128136
Validation loss = 0.01562752015888691
Validation loss = 0.016516927629709244
Validation loss = 0.015476001426577568
Validation loss = 0.014719707891345024
Validation loss = 0.015143810771405697
Validation loss = 0.015594839118421078
Validation loss = 0.01670507900416851
Validation loss = 0.015381609089672565
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015317069366574287
Validation loss = 0.016030551865696907
Validation loss = 0.020243003964424133
Validation loss = 0.016713691875338554
Validation loss = 0.01597040705382824
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10056214865708932
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10049937578027465
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10043668122270742
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10037406483790523
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10031152647975078
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10024906600249066
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1001866832607343
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10012437810945274
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10006215040397763
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09993792675356922
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09987593052109181
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09981401115933045
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09975216852540272
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09969040247678018
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09962871287128713
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09956709956709957
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09950556242274412
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09944410129709698
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09938271604938272
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09932140653917335
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09926017262638717
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09919901417128774
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09913793103448276
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09907692307692308
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000983 |
| Iteration     | 63        |
| MaximumReturn | -0.000785 |
| MinimumReturn | -0.00146  |
| TotalSamples  | 108290    |
-----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01655125804245472
Validation loss = 0.0153376879170537
Validation loss = 0.015783093869686127
Validation loss = 0.01554894633591175
Validation loss = 0.014830074273049831
Validation loss = 0.01612064242362976
Validation loss = 0.015174697153270245
Validation loss = 0.01475144736468792
Validation loss = 0.01728682778775692
Validation loss = 0.01547204703092575
Validation loss = 0.01659836433827877
Validation loss = 0.015138368122279644
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014342593029141426
Validation loss = 0.014452469535171986
Validation loss = 0.0170296598225832
Validation loss = 0.015198105946183205
Validation loss = 0.015374742448329926
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01553850993514061
Validation loss = 0.01493284571915865
Validation loss = 0.015703704208135605
Validation loss = 0.015712572261691093
Validation loss = 0.014439030550420284
Validation loss = 0.015774719417095184
Validation loss = 0.014877929352223873
Validation loss = 0.016302838921546936
Validation loss = 0.014872241765260696
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01580110564827919
Validation loss = 0.01633043773472309
Validation loss = 0.015793967992067337
Validation loss = 0.015661044046282768
Validation loss = 0.015254759229719639
Validation loss = 0.022618116810917854
Validation loss = 0.015520489774644375
Validation loss = 0.016506271436810493
Validation loss = 0.015012670308351517
Validation loss = 0.015774190425872803
Validation loss = 0.01576155796647072
Validation loss = 0.015101923607289791
Validation loss = 0.016072936356067657
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016386592760682106
Validation loss = 0.016573835164308548
Validation loss = 0.015346423722803593
Validation loss = 0.015347330830991268
Validation loss = 0.01571984589099884
Validation loss = 0.016993965953588486
Validation loss = 0.0166928730905056
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0990159901599016
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09895513214505225
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0988943488943489
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09883364027010436
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09877300613496932
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09871244635193133
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09865196078431372
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09859154929577464
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09853121175030599
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09847094801223241
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09841075794621026
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09835064141722663
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09829059829059829
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09823062843197071
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09817073170731708
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09811090798293723
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09805115712545676
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09799147900182593
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09793187347931874
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09787234042553192
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09781287970838397
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09775349119611415
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09769417475728155
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0976349302607641
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09757575757575758
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00344 |
| Iteration     | 64       |
| MaximumReturn | -0.00243 |
| MinimumReturn | -0.00564 |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014658622443675995
Validation loss = 0.015402569435536861
Validation loss = 0.014600045047700405
Validation loss = 0.014873082749545574
Validation loss = 0.014691590331494808
Validation loss = 0.016518067568540573
Validation loss = 0.01506806816905737
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015535041689872742
Validation loss = 0.01615174114704132
Validation loss = 0.015025402419269085
Validation loss = 0.015330314636230469
Validation loss = 0.01630488410592079
Validation loss = 0.015395098365843296
Validation loss = 0.01687685027718544
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015203510411083698
Validation loss = 0.015504708513617516
Validation loss = 0.014371387660503387
Validation loss = 0.015608344227075577
Validation loss = 0.01588151417672634
Validation loss = 0.015393082983791828
Validation loss = 0.01470912154763937
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014927411451935768
Validation loss = 0.015513578429818153
Validation loss = 0.015195668675005436
Validation loss = 0.016813402995467186
Validation loss = 0.017215771600604057
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016359493136405945
Validation loss = 0.01544683426618576
Validation loss = 0.015500077977776527
Validation loss = 0.01501262653619051
Validation loss = 0.016336018219590187
Validation loss = 0.019747089594602585
Validation loss = 0.015363534912467003
Validation loss = 0.01590866968035698
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09751665657177468
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09745762711864407
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09739866908650938
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0973397823458283
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09728096676737161
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09722222222222222
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0971635485817743
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0971049457177322
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0970464135021097
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09698795180722891
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09692956050571945
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09687123947051744
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0968129885748647
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0967548076923077
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0966966966966967
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09663865546218488
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09658068386322735
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09652278177458033
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09646494907130018
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09640718562874251
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09634949132256133
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09629186602870814
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09623430962343096
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0961768219832736
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09611940298507463
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00177 |
| Iteration     | 65       |
| MaximumReturn | -0.00104 |
| MinimumReturn | -0.0026  |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015922045335173607
Validation loss = 0.016574477776885033
Validation loss = 0.015303704887628555
Validation loss = 0.014769110828638077
Validation loss = 0.015162661671638489
Validation loss = 0.016458258032798767
Validation loss = 0.015174983069300652
Validation loss = 0.015827026218175888
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01526696141809225
Validation loss = 0.015282578766345978
Validation loss = 0.01777106709778309
Validation loss = 0.015350380912423134
Validation loss = 0.015437919646501541
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014875398948788643
Validation loss = 0.0162659902125597
Validation loss = 0.01417343970388174
Validation loss = 0.013934386894106865
Validation loss = 0.015368310734629631
Validation loss = 0.018100330606102943
Validation loss = 0.01642037369310856
Validation loss = 0.015826428309082985
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015056338161230087
Validation loss = 0.015193480998277664
Validation loss = 0.017473334446549416
Validation loss = 0.016253508627414703
Validation loss = 0.015139661729335785
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01643551141023636
Validation loss = 0.015889720991253853
Validation loss = 0.015892241150140762
Validation loss = 0.016366973519325256
Validation loss = 0.01580212078988552
Validation loss = 0.017412863671779633
Validation loss = 0.015703003853559494
Validation loss = 0.015985708683729172
Validation loss = 0.015494657680392265
Validation loss = 0.015762824565172195
Validation loss = 0.015254665166139603
Validation loss = 0.015557326376438141
Validation loss = 0.016627242788672447
Validation loss = 0.015488113276660442
Validation loss = 0.01574191264808178
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09606205250596658
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09600477042337507
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09594755661501787
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0958904109589041
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09583333333333334
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09577632361689471
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09571938168846611
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09566250742721331
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09560570071258907
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09554896142433235
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09549228944246738
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0954356846473029
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09537914691943128
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09532267613972766
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09526627218934912
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09520993494973388
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09515366430260047
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09509746012994684
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09504132231404959
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09498525073746313
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09492924528301887
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09487330583382439
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09481743227326266
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09476162448499117
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09470588235294118
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00108  |
| Iteration     | 66        |
| MaximumReturn | -0.000751 |
| MinimumReturn | -0.00189  |
| TotalSamples  | 113288    |
-----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015291749499738216
Validation loss = 0.015050232410430908
Validation loss = 0.01453597191721201
Validation loss = 0.016065102070569992
Validation loss = 0.015065163373947144
Validation loss = 0.015565783716738224
Validation loss = 0.016252029687166214
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01545522641390562
Validation loss = 0.015768010169267654
Validation loss = 0.017621973529458046
Validation loss = 0.01545681618154049
Validation loss = 0.014889041893184185
Validation loss = 0.015008299611508846
Validation loss = 0.014466396532952785
Validation loss = 0.01533083338290453
Validation loss = 0.015602043829858303
Validation loss = 0.016526000574231148
Validation loss = 0.016283560544252396
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014538859017193317
Validation loss = 0.015157458372414112
Validation loss = 0.016122836619615555
Validation loss = 0.014598755165934563
Validation loss = 0.014491809532046318
Validation loss = 0.015210737474262714
Validation loss = 0.01467932853847742
Validation loss = 0.015063473023474216
Validation loss = 0.01470671035349369
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015291531570255756
Validation loss = 0.015034993179142475
Validation loss = 0.015254361554980278
Validation loss = 0.01630132459104061
Validation loss = 0.01540954690426588
Validation loss = 0.016876498237252235
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01734158769249916
Validation loss = 0.016480086371302605
Validation loss = 0.01593814790248871
Validation loss = 0.017640257254242897
Validation loss = 0.01615912839770317
Validation loss = 0.01666256971657276
Validation loss = 0.015928905457258224
Validation loss = 0.016697045415639877
Validation loss = 0.017119066789746284
Validation loss = 0.015962084755301476
Validation loss = 0.016018621623516083
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09465020576131687
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0945945945945946
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09453904873752202
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09448356807511737
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09442815249266862
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09437280187573271
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09431751611013474
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0942622950819672
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09420713867758923
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09415204678362574
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09409701928696669
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09404205607476636
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09398715703444249
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09393232205367562
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09387755102040816
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09382284382284382
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0937682003494467
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09371362048894064
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09365910413030831
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0936046511627907
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09355026147588612
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09349593495934959
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0934416715031921
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09338747099767981
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00147 |
| Iteration     | 67       |
| MaximumReturn | -0.00109 |
| MinimumReturn | -0.00195 |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01604684442281723
Validation loss = 0.015272481366991997
Validation loss = 0.014735721051692963
Validation loss = 0.015278126113116741
Validation loss = 0.015162141993641853
Validation loss = 0.016330422833561897
Validation loss = 0.01545173954218626
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015279305167496204
Validation loss = 0.01446171011775732
Validation loss = 0.015508649870753288
Validation loss = 0.01590593159198761
Validation loss = 0.014798019081354141
Validation loss = 0.01614902913570404
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015733890235424042
Validation loss = 0.014245137572288513
Validation loss = 0.014284365810453892
Validation loss = 0.014467906206846237
Validation loss = 0.017343493178486824
Validation loss = 0.01691233366727829
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015012301504611969
Validation loss = 0.015728242695331573
Validation loss = 0.01540487352758646
Validation loss = 0.01549906563013792
Validation loss = 0.01527682226151228
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01571841910481453
Validation loss = 0.015888553112745285
Validation loss = 0.015461163595318794
Validation loss = 0.015421580523252487
Validation loss = 0.016797414049506187
Validation loss = 0.015218088403344154
Validation loss = 0.01590646244585514
Validation loss = 0.014998555183410645
Validation loss = 0.016498858109116554
Validation loss = 0.017103463411331177
Validation loss = 0.01574856787919998
Validation loss = 0.01591465063393116
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.093279258400927
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09322524609148813
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0931712962962963
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0931174089068826
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0930635838150289
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09300982091276719
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09295612009237875
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09290248124639354
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09284890426758939
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09279538904899136
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09274193548387097
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09268854346574554
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09263521288837745
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09258194364577343
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0925287356321839
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09247558874210224
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09242250287026406
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09236947791164658
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09231651376146789
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09226361031518625
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09221076746849943
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09215798511734402
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09210526315789473
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09205260148656375
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.092
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00157 |
| Iteration     | 68       |
| MaximumReturn | -0.00122 |
| MinimumReturn | -0.00215 |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015244163572788239
Validation loss = 0.014728337526321411
Validation loss = 0.014669734053313732
Validation loss = 0.014944497495889664
Validation loss = 0.014744325540959835
Validation loss = 0.015289872884750366
Validation loss = 0.014938629232347012
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016513284295797348
Validation loss = 0.015580205246806145
Validation loss = 0.0160597525537014
Validation loss = 0.015806641429662704
Validation loss = 0.015510933473706245
Validation loss = 0.01613105647265911
Validation loss = 0.01558669377118349
Validation loss = 0.015525808557868004
Validation loss = 0.01573343761265278
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014188754372298717
Validation loss = 0.0157074686139822
Validation loss = 0.013877594843506813
Validation loss = 0.01489773765206337
Validation loss = 0.014867383055388927
Validation loss = 0.014382094144821167
Validation loss = 0.014641287736594677
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01617397740483284
Validation loss = 0.01624486595392227
Validation loss = 0.014819693751633167
Validation loss = 0.016208883374929428
Validation loss = 0.014609684236347675
Validation loss = 0.015538445673882961
Validation loss = 0.016817158088088036
Validation loss = 0.0152831906452775
Validation loss = 0.0147338155657053
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015844831243157387
Validation loss = 0.015034949406981468
Validation loss = 0.016084203496575356
Validation loss = 0.01718493364751339
Validation loss = 0.015271563082933426
Validation loss = 0.015707096084952354
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09194745859508852
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09189497716894977
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09184255561893896
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09179019384264538
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09173789173789174
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09168564920273349
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09163346613545817
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09158134243458475
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09152927799886298
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09147727272727273
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09142532651902328
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09137343927355278
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09132161089052751
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09126984126984126
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09121813031161473
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0911664779161948
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09111488398415393
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09106334841628959
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09101187111362352
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09096045197740113
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09090909090909091
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09085778781038374
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09080654258319233
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09075535512965051
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09070422535211267
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00113  |
| Iteration     | 69        |
| MaximumReturn | -0.000839 |
| MinimumReturn | -0.00192  |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015200674533843994
Validation loss = 0.013567402958869934
Validation loss = 0.015697112306952477
Validation loss = 0.015350672416388988
Validation loss = 0.01829029619693756
Validation loss = 0.014440275728702545
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015286107547581196
Validation loss = 0.015164872631430626
Validation loss = 0.016156833618879318
Validation loss = 0.015435855835676193
Validation loss = 0.01535743847489357
Validation loss = 0.015245742164552212
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014106682501733303
Validation loss = 0.015079583041369915
Validation loss = 0.014587482437491417
Validation loss = 0.01436105277389288
Validation loss = 0.014013255015015602
Validation loss = 0.015419808216392994
Validation loss = 0.014315965585410595
Validation loss = 0.014446604065597057
Validation loss = 0.014166043139994144
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01525892224162817
Validation loss = 0.018359187990427017
Validation loss = 0.015015456825494766
Validation loss = 0.01624930091202259
Validation loss = 0.014857837930321693
Validation loss = 0.014955179765820503
Validation loss = 0.014888640493154526
Validation loss = 0.015444125980138779
Validation loss = 0.014918426051735878
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015527090057730675
Validation loss = 0.014968396164476871
Validation loss = 0.015961557626724243
Validation loss = 0.015445325523614883
Validation loss = 0.015523465350270271
Validation loss = 0.015911545604467392
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09065315315315316
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09060213843556555
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09055118110236221
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09050028105677346
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09044943820224718
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09039865244244806
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09034792368125702
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09029725182277061
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09024663677130045
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09019607843137255
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09014557670772676
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09009513150531617
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09004474272930649
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08999441028507546
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08994413407821229
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08989391401451703
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08984375
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0897936419408812
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08974358974358974
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08969359331476323
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08964365256124722
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0895937673900946
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08954393770856507
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08949416342412451
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08944444444444444
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000853 |
| Iteration     | 70        |
| MaximumReturn | -0.000578 |
| MinimumReturn | -0.00108  |
| TotalSamples  | 119952    |
-----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015548823401331902
Validation loss = 0.015183254145085812
Validation loss = 0.014788206666707993
Validation loss = 0.014834278263151646
Validation loss = 0.014538613148033619
Validation loss = 0.014922616071999073
Validation loss = 0.014977668412029743
Validation loss = 0.015325674787163734
Validation loss = 0.01635412685573101
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014925706200301647
Validation loss = 0.014706508256494999
Validation loss = 0.015049387700855732
Validation loss = 0.015237586572766304
Validation loss = 0.014536548405885696
Validation loss = 0.014920758083462715
Validation loss = 0.01637345366179943
Validation loss = 0.016786660999059677
Validation loss = 0.01538467314094305
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014856653288006783
Validation loss = 0.015031331218779087
Validation loss = 0.01578933373093605
Validation loss = 0.014745661057531834
Validation loss = 0.01406355481594801
Validation loss = 0.01747984252870083
Validation loss = 0.014357793144881725
Validation loss = 0.014810248278081417
Validation loss = 0.01502662431448698
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01504936907440424
Validation loss = 0.014440122991800308
Validation loss = 0.015273126773536205
Validation loss = 0.015971200540661812
Validation loss = 0.01637946628034115
Validation loss = 0.014819817617535591
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01618373952805996
Validation loss = 0.014912319369614124
Validation loss = 0.01506716012954712
Validation loss = 0.016164269298315048
Validation loss = 0.014760800637304783
Validation loss = 0.014780266210436821
Validation loss = 0.016236426308751106
Validation loss = 0.0147761981934309
Validation loss = 0.01674761064350605
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08939478067740145
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08934517203107659
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08929561841375486
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08924611973392461
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08919667590027701
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08914728682170543
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08909795240730492
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08904867256637168
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08899944720840243
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08895027624309393
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08890115958034235
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08885209713024282
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0888030888030888
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08875413450937156
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08870523415977961
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08865638766519823
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08860759493670886
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08855885588558855
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08851017042330951
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08846153846153847
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08841295991213619
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08836443468715698
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08831596269884805
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08826754385964912
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08821917808219178
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0155  |
| Iteration     | 71       |
| MaximumReturn | -0.00979 |
| MinimumReturn | -0.0323  |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01488319132477045
Validation loss = 0.014316634275019169
Validation loss = 0.014556323178112507
Validation loss = 0.013983605429530144
Validation loss = 0.015199920162558556
Validation loss = 0.01415095292031765
Validation loss = 0.014659788459539413
Validation loss = 0.01515286322683096
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015521408058702946
Validation loss = 0.015035322867333889
Validation loss = 0.01565643399953842
Validation loss = 0.015380892902612686
Validation loss = 0.014704977162182331
Validation loss = 0.015072839334607124
Validation loss = 0.015392537228763103
Validation loss = 0.015060439705848694
Validation loss = 0.014937451109290123
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015499447472393513
Validation loss = 0.01486622728407383
Validation loss = 0.015308093279600143
Validation loss = 0.014755976386368275
Validation loss = 0.013869418762624264
Validation loss = 0.01429236214607954
Validation loss = 0.015478983521461487
Validation loss = 0.014142944477498531
Validation loss = 0.014040502719581127
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015310922637581825
Validation loss = 0.015393859706819057
Validation loss = 0.015304277651011944
Validation loss = 0.014624889008700848
Validation loss = 0.015188934281468391
Validation loss = 0.014256876893341541
Validation loss = 0.01448023784905672
Validation loss = 0.0163730401545763
Validation loss = 0.014956207945942879
Validation loss = 0.016230126842856407
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022726748138666153
Validation loss = 0.01564192585647106
Validation loss = 0.014558925293385983
Validation loss = 0.015996936708688736
Validation loss = 0.014763880521059036
Validation loss = 0.01643459126353264
Validation loss = 0.016067514196038246
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08817086527929902
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08812260536398467
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08807439824945296
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08802624384909787
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08797814207650273
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08793009284543965
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08788209606986899
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0878341516639389
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08778625954198473
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08773841961852862
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08769063180827887
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08764289602612955
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08759521218715996
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08754758020663404
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0875
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08745247148288973
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08740499457111835
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08735756918068367
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08731019522776573
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08726287262872628
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08721560130010834
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08716838115863562
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08712121212121213
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08707409410492158
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08702702702702703
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0461  |
| Iteration     | 72       |
| MaximumReturn | -0.0262  |
| MinimumReturn | -0.0634  |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0145904291421175
Validation loss = 0.014048909768462181
Validation loss = 0.01438395120203495
Validation loss = 0.014675247482955456
Validation loss = 0.013860903680324554
Validation loss = 0.015520507469773293
Validation loss = 0.01366010494530201
Validation loss = 0.015055542811751366
Validation loss = 0.014204178005456924
Validation loss = 0.01376406755298376
Validation loss = 0.014742689207196236
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01546410471200943
Validation loss = 0.014715139754116535
Validation loss = 0.014481277205049992
Validation loss = 0.015268498100340366
Validation loss = 0.01504715345799923
Validation loss = 0.016149498522281647
Validation loss = 0.016291705891489983
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01410992443561554
Validation loss = 0.01449667103588581
Validation loss = 0.014251289889216423
Validation loss = 0.014372981153428555
Validation loss = 0.016070395708084106
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014983274042606354
Validation loss = 0.014447339810431004
Validation loss = 0.014355055056512356
Validation loss = 0.014392660930752754
Validation loss = 0.014427099376916885
Validation loss = 0.014967495575547218
Validation loss = 0.014982631430029869
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014523212797939777
Validation loss = 0.015037845820188522
Validation loss = 0.015569077804684639
Validation loss = 0.015698537230491638
Validation loss = 0.014948619529604912
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08698001080497028
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08693304535637149
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0868861305990286
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08683926645091694
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08679245283018867
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08674568965517242
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08669897684437264
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08665231431646932
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08660570199031738
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08655913978494624
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08651262761955937
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08646616541353383
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08641975308641975
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08637339055793991
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08632707774798927
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08628081457663452
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08623460096411355
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08618843683083512
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08614232209737828
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08609625668449197
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08605024051309461
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0860042735042735
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08595835557928456
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08591248665955176
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08586666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0258  |
| Iteration     | 73       |
| MaximumReturn | -0.0164  |
| MinimumReturn | -0.0414  |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014126523397862911
Validation loss = 0.013384928926825523
Validation loss = 0.014895863831043243
Validation loss = 0.013740547001361847
Validation loss = 0.01576894149184227
Validation loss = 0.01587923988699913
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01772606559097767
Validation loss = 0.015476998873054981
Validation loss = 0.015367264859378338
Validation loss = 0.014638789929449558
Validation loss = 0.014548094943165779
Validation loss = 0.014406169764697552
Validation loss = 0.014957776293158531
Validation loss = 0.014492920599877834
Validation loss = 0.015129001811146736
Validation loss = 0.01656792126595974
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01309871207922697
Validation loss = 0.015561781823635101
Validation loss = 0.014732356183230877
Validation loss = 0.013746583834290504
Validation loss = 0.012835501693189144
Validation loss = 0.015343199484050274
Validation loss = 0.013919615186750889
Validation loss = 0.013849235139787197
Validation loss = 0.015280474908649921
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015317262150347233
Validation loss = 0.013674318790435791
Validation loss = 0.014859655871987343
Validation loss = 0.014052902348339558
Validation loss = 0.015508218668401241
Validation loss = 0.014285518787801266
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014468490146100521
Validation loss = 0.01403980515897274
Validation loss = 0.014955082908272743
Validation loss = 0.016021059826016426
Validation loss = 0.014970246702432632
Validation loss = 0.014959822408854961
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08582089552238806
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08577517314864144
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08572949946751864
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08568387440127727
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08563829787234042
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08559276980329612
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08554729011689692
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08550185873605948
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08545647558386411
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08541114058355438
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08536585365853659
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08532061473237944
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08527542372881355
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08523028057173107
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08518518518518518
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08514013749338974
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08509513742071882
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08505018489170629
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0850052798310454
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08496042216358839
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.084915611814346
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08487084870848709
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08482613277133826
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08478146392838336
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08473684210526315
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00726 |
| Iteration     | 74       |
| MaximumReturn | -0.0046  |
| MinimumReturn | -0.013   |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014410979114472866
Validation loss = 0.013436920009553432
Validation loss = 0.014961929060518742
Validation loss = 0.013672004453837872
Validation loss = 0.01450858823955059
Validation loss = 0.014932607300579548
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013969824649393559
Validation loss = 0.0157365333288908
Validation loss = 0.014693339355289936
Validation loss = 0.014599520713090897
Validation loss = 0.01584901660680771
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01454324834048748
Validation loss = 0.014483357779681683
Validation loss = 0.013556216843426228
Validation loss = 0.013672666624188423
Validation loss = 0.015008351765573025
Validation loss = 0.0150971794500947
Validation loss = 0.015003945678472519
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015895025804638863
Validation loss = 0.014089488424360752
Validation loss = 0.014180055819451809
Validation loss = 0.01436232402920723
Validation loss = 0.014366216026246548
Validation loss = 0.015544953756034374
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016459230333566666
Validation loss = 0.014639255590736866
Validation loss = 0.01476630475372076
Validation loss = 0.01467745192348957
Validation loss = 0.015122223645448685
Validation loss = 0.015127869322896004
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08469226722777486
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08464773922187172
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08460325801366264
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08455882352941177
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08451443569553806
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0844700944386149
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08442579968536969
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08438155136268344
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08433734939759036
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08429319371727749
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08424908424908426
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0842050209205021
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08416100365917407
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08411703239289446
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08407310704960835
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08402922755741127
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08398539384454877
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08394160583941605
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08389786347055758
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08385416666666666
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08381051535658511
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0837669094693028
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08372334893395736
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08367983367983368
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08363636363636363
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00258 |
| Iteration     | 75       |
| MaximumReturn | -0.00197 |
| MinimumReturn | -0.00333 |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014003511518239975
Validation loss = 0.014221441932022572
Validation loss = 0.01566833257675171
Validation loss = 0.014242216944694519
Validation loss = 0.014665558934211731
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014457765966653824
Validation loss = 0.014407494105398655
Validation loss = 0.013935673981904984
Validation loss = 0.014208381995558739
Validation loss = 0.01472565159201622
Validation loss = 0.014938258565962315
Validation loss = 0.015244847163558006
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014970741234719753
Validation loss = 0.014353152364492416
Validation loss = 0.013506271876394749
Validation loss = 0.017160454764962196
Validation loss = 0.01379481703042984
Validation loss = 0.013509277254343033
Validation loss = 0.013502088375389576
Validation loss = 0.013443341478705406
Validation loss = 0.01377697940915823
Validation loss = 0.013897886499762535
Validation loss = 0.01540214940905571
Validation loss = 0.014112965203821659
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014061308465898037
Validation loss = 0.013901526108384132
Validation loss = 0.013874799013137817
Validation loss = 0.013816772028803825
Validation loss = 0.013544647954404354
Validation loss = 0.014409561641514301
Validation loss = 0.01428334042429924
Validation loss = 0.014402461238205433
Validation loss = 0.014697656035423279
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015842128545045853
Validation loss = 0.014359064400196075
Validation loss = 0.014389627613127232
Validation loss = 0.014288725331425667
Validation loss = 0.014308868907392025
Validation loss = 0.013985528610646725
Validation loss = 0.014372818171977997
Validation loss = 0.014248031191527843
Validation loss = 0.014502298086881638
Validation loss = 0.013764703646302223
Validation loss = 0.014274048618972301
Validation loss = 0.014958422631025314
Validation loss = 0.015329034999012947
Validation loss = 0.01458892785012722
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08359293873312565
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08354955889984432
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08350622406639004
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08346293416277864
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08341968911917098
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08337648886587261
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08333333333333333
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08329022245214693
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08324715615305067
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08320413436692506
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08316115702479339
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08311822405782138
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08307533539731682
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08303249097472924
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08298969072164948
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08294693456980938
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08290422245108137
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08286155429747813
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08281893004115226
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08277634961439588
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08273381294964029
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08269131997945557
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08264887063655031
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08260646485377117
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08256410256410257
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0102  |
| Iteration     | 76       |
| MaximumReturn | -0.00663 |
| MinimumReturn | -0.0152  |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013477962464094162
Validation loss = 0.013292484916746616
Validation loss = 0.014221204444766045
Validation loss = 0.014352128840982914
Validation loss = 0.013089990243315697
Validation loss = 0.014466185122728348
Validation loss = 0.01371805090457201
Validation loss = 0.013929101638495922
Validation loss = 0.013318506069481373
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014225456863641739
Validation loss = 0.014335878193378448
Validation loss = 0.014719382859766483
Validation loss = 0.01556970365345478
Validation loss = 0.015455690212547779
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014044632203876972
Validation loss = 0.01373946201056242
Validation loss = 0.013710210099816322
Validation loss = 0.01352710835635662
Validation loss = 0.013591901399195194
Validation loss = 0.015019908547401428
Validation loss = 0.013520377688109875
Validation loss = 0.01329182367771864
Validation loss = 0.013769102282822132
Validation loss = 0.013751411810517311
Validation loss = 0.013150754384696484
Validation loss = 0.014875822700560093
Validation loss = 0.013301236554980278
Validation loss = 0.01311854925006628
Validation loss = 0.012599905021488667
Validation loss = 0.013391741551458836
Validation loss = 0.014263669028878212
Validation loss = 0.014720169827342033
Validation loss = 0.012997552752494812
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014285741373896599
Validation loss = 0.014695162884891033
Validation loss = 0.01454695500433445
Validation loss = 0.01538072805851698
Validation loss = 0.014265626668930054
Validation loss = 0.015510659664869308
Validation loss = 0.014193649403750896
Validation loss = 0.013807187788188457
Validation loss = 0.013922957703471184
Validation loss = 0.013976369053125381
Validation loss = 0.014128654263913631
Validation loss = 0.014363829046487808
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015525531955063343
Validation loss = 0.014053457416594028
Validation loss = 0.015300030820071697
Validation loss = 0.014403879642486572
Validation loss = 0.014353800565004349
Validation loss = 0.015927046537399292
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08252178370066633
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08247950819672131
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08243727598566308
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08239508700102355
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08235294117647059
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08231083844580778
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08226877874297393
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0822267620020429
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08218478815722308
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08214285714285714
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08210096889342172
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08205912334352701
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08201732042791646
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08197556008146639
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08193384223918575
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08189216683621567
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08185053380782918
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08180894308943089
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08176739461655663
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0817258883248731
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08168442415017757
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08164300202839757
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08160162189559048
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08156028368794327
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08151898734177215
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000849 |
| Iteration     | 77        |
| MaximumReturn | -0.0006   |
| MinimumReturn | -0.00108  |
| TotalSamples  | 131614    |
-----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014083610847592354
Validation loss = 0.016234681010246277
Validation loss = 0.015479025430977345
Validation loss = 0.013826913200318813
Validation loss = 0.014770535752177238
Validation loss = 0.013686520047485828
Validation loss = 0.014891116879880428
Validation loss = 0.014569166116416454
Validation loss = 0.013818802312016487
Validation loss = 0.013478834182024002
Validation loss = 0.014046400785446167
Validation loss = 0.013775222934782505
Validation loss = 0.014188287779688835
Validation loss = 0.013607348315417767
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014359425753355026
Validation loss = 0.013532592914998531
Validation loss = 0.016926096752285957
Validation loss = 0.014185209758579731
Validation loss = 0.0155239999294281
Validation loss = 0.013853229582309723
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013274996541440487
Validation loss = 0.014000589959323406
Validation loss = 0.012918555177748203
Validation loss = 0.013368970714509487
Validation loss = 0.016655821353197098
Validation loss = 0.01326155848801136
Validation loss = 0.015304372645914555
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015421119518578053
Validation loss = 0.013580537401139736
Validation loss = 0.013913699425756931
Validation loss = 0.016226764768362045
Validation loss = 0.014824451878666878
Validation loss = 0.014825524762272835
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014377416111528873
Validation loss = 0.014464805833995342
Validation loss = 0.013982729986310005
Validation loss = 0.014246232807636261
Validation loss = 0.014981241896748543
Validation loss = 0.01363331824541092
Validation loss = 0.014358589425683022
Validation loss = 0.014958730898797512
Validation loss = 0.016654178500175476
Validation loss = 0.01483910996466875
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08147773279352227
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08143651997976732
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08139534883720931
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08135421930267812
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08131313131313131
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0812720848056537
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08123107971745712
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08119011598587998
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0811491935483871
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08110831234256927
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.081067472306143
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08102667337695017
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08098591549295775
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08094519859225742
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08090452261306533
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08086388749372175
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08082329317269077
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08078273958855996
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08074222668004012
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08070175438596491
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08066132264529058
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08062093139709564
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08058058058058058
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08054027013506754
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0805
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000956 |
| Iteration     | 78        |
| MaximumReturn | -0.00067  |
| MinimumReturn | -0.00142  |
| TotalSamples  | 133280    |
-----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013932853937149048
Validation loss = 0.01349661871790886
Validation loss = 0.013733997009694576
Validation loss = 0.015447882004082203
Validation loss = 0.013789830729365349
Validation loss = 0.013471877202391624
Validation loss = 0.014140646904706955
Validation loss = 0.013379122130572796
Validation loss = 0.013768373988568783
Validation loss = 0.014876001514494419
Validation loss = 0.014214807190001011
Validation loss = 0.013193655759096146
Validation loss = 0.014770091511309147
Validation loss = 0.013926302082836628
Validation loss = 0.013432345353066921
Validation loss = 0.013482569716870785
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015415688045322895
Validation loss = 0.014500229619443417
Validation loss = 0.014814124442636967
Validation loss = 0.014476246200501919
Validation loss = 0.014327709563076496
Validation loss = 0.015135133638978004
Validation loss = 0.01650230586528778
Validation loss = 0.014345263130962849
Validation loss = 0.015067743137478828
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012912863865494728
Validation loss = 0.013565286993980408
Validation loss = 0.014311593025922775
Validation loss = 0.013590211980044842
Validation loss = 0.013077454641461372
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014214812777936459
Validation loss = 0.014934990555047989
Validation loss = 0.014488142915070057
Validation loss = 0.01392944436520338
Validation loss = 0.013475604355335236
Validation loss = 0.01384461298584938
Validation loss = 0.014779875054955482
Validation loss = 0.013822183944284916
Validation loss = 0.013803145848214626
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01453301403671503
Validation loss = 0.014876559376716614
Validation loss = 0.018469760194420815
Validation loss = 0.013692695647478104
Validation loss = 0.013906081207096577
Validation loss = 0.014684410765767097
Validation loss = 0.01808956265449524
Validation loss = 0.014700056053698063
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08045977011494253
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08041958041958042
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08037943085371942
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08033932135728543
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08029925187032419
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08025922233300099
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0802192326856004
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0801792828685259
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08013937282229965
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08009950248756219
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0800596718050721
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08001988071570576
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07998012916045703
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07994041708043693
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07990074441687345
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0798611111111111
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07982151710461081
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07978196233894945
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07974244675581971
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0797029702970297
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07966353290450272
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07962413452027696
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07958477508650519
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07954545454545454
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07950617283950617
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000879 |
| Iteration     | 79        |
| MaximumReturn | -0.000697 |
| MinimumReturn | -0.00115  |
| TotalSamples  | 134946    |
-----------------------------
