Logging to experiments/invertedPendulum/nov1/IA01_w350e3_seed2231
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7672200202941895
Validation loss = 0.4426625072956085
Validation loss = 0.40240195393562317
Validation loss = 0.33640944957733154
Validation loss = 0.31836289167404175
Validation loss = 0.3140455186367035
Validation loss = 0.311240553855896
Validation loss = 0.3061043918132782
Validation loss = 0.285686194896698
Validation loss = 0.2617984712123871
Validation loss = 0.23453006148338318
Validation loss = 0.23370537161827087
Validation loss = 0.22458547353744507
Validation loss = 0.21156562864780426
Validation loss = 0.21391905844211578
Validation loss = 0.2029852569103241
Validation loss = 0.21870028972625732
Validation loss = 0.18632635474205017
Validation loss = 0.1987246721982956
Validation loss = 0.1830197274684906
Validation loss = 0.19321931898593903
Validation loss = 0.19371673464775085
Validation loss = 0.17195641994476318
Validation loss = 0.1787792295217514
Validation loss = 0.1630566418170929
Validation loss = 0.16518916189670563
Validation loss = 0.16316300630569458
Validation loss = 0.1433466374874115
Validation loss = 0.1403859704732895
Validation loss = 0.1349603533744812
Validation loss = 0.14057497680187225
Validation loss = 0.13419848680496216
Validation loss = 0.147586852312088
Validation loss = 0.13662683963775635
Validation loss = 0.13287295401096344
Validation loss = 0.11414369940757751
Validation loss = 0.11208432912826538
Validation loss = 0.11350535601377487
Validation loss = 0.11064648628234863
Validation loss = 0.11042893677949905
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.757938802242279
Validation loss = 0.39235836267471313
Validation loss = 0.37608981132507324
Validation loss = 0.33708301186561584
Validation loss = 0.31444576382637024
Validation loss = 0.3059474229812622
Validation loss = 0.28808414936065674
Validation loss = 0.27589523792266846
Validation loss = 0.2586642801761627
Validation loss = 0.2349066436290741
Validation loss = 0.23089207708835602
Validation loss = 0.21797771751880646
Validation loss = 0.21494734287261963
Validation loss = 0.2356245368719101
Validation loss = 0.20700883865356445
Validation loss = 0.18466304242610931
Validation loss = 0.19594483077526093
Validation loss = 0.17350129783153534
Validation loss = 0.1713719218969345
Validation loss = 0.17310847342014313
Validation loss = 0.19726479053497314
Validation loss = 0.1766582578420639
Validation loss = 0.15848883986473083
Validation loss = 0.16138562560081482
Validation loss = 0.1608300507068634
Validation loss = 0.14524732530117035
Validation loss = 0.13810119032859802
Validation loss = 0.13696488738059998
Validation loss = 0.13059380650520325
Validation loss = 0.14320993423461914
Validation loss = 0.13571926951408386
Validation loss = 0.1362602561712265
Validation loss = 0.12886719405651093
Validation loss = 0.11163421720266342
Validation loss = 0.13237892091274261
Validation loss = 0.12794817984104156
Validation loss = 0.1140374019742012
Validation loss = 0.11201339960098267
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7827231287956238
Validation loss = 0.4270602762699127
Validation loss = 0.34990519285202026
Validation loss = 0.3262460231781006
Validation loss = 0.3204435706138611
Validation loss = 0.2939431667327881
Validation loss = 0.27191686630249023
Validation loss = 0.25685468316078186
Validation loss = 0.2371373325586319
Validation loss = 0.23020917177200317
Validation loss = 0.20909659564495087
Validation loss = 0.19802671670913696
Validation loss = 0.1887424886226654
Validation loss = 0.1770627498626709
Validation loss = 0.1961119920015335
Validation loss = 0.17853271961212158
Validation loss = 0.16986113786697388
Validation loss = 0.1676328182220459
Validation loss = 0.1648045778274536
Validation loss = 0.1583622843027115
Validation loss = 0.16699452698230743
Validation loss = 0.16080215573310852
Validation loss = 0.15212343633174896
Validation loss = 0.1622595489025116
Validation loss = 0.14523638784885406
Validation loss = 0.12660418450832367
Validation loss = 0.14235155284404755
Validation loss = 0.13227635622024536
Validation loss = 0.14887629449367523
Validation loss = 0.12396060675382614
Validation loss = 0.13840749859809875
Validation loss = 0.13275638222694397
Validation loss = 0.13808171451091766
Validation loss = 0.11690977960824966
Validation loss = 0.11211303621530533
Validation loss = 0.10660461336374283
Validation loss = 0.1004706546664238
Validation loss = 0.11145476996898651
Validation loss = 0.10197322815656662
Validation loss = 0.11505044251680374
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7445565462112427
Validation loss = 0.38509783148765564
Validation loss = 0.345786452293396
Validation loss = 0.3224968910217285
Validation loss = 0.30883485078811646
Validation loss = 0.2855364680290222
Validation loss = 0.2577722370624542
Validation loss = 0.24918130040168762
Validation loss = 0.23554269969463348
Validation loss = 0.21495603024959564
Validation loss = 0.21375826001167297
Validation loss = 0.21040213108062744
Validation loss = 0.19499839842319489
Validation loss = 0.1897938996553421
Validation loss = 0.1760387420654297
Validation loss = 0.17793186008930206
Validation loss = 0.17462502419948578
Validation loss = 0.18350112438201904
Validation loss = 0.15682294964790344
Validation loss = 0.17825375497341156
Validation loss = 0.15338189899921417
Validation loss = 0.14795619249343872
Validation loss = 0.14735636115074158
Validation loss = 0.1373109668493271
Validation loss = 0.150765523314476
Validation loss = 0.138900026679039
Validation loss = 0.13428346812725067
Validation loss = 0.13235323131084442
Validation loss = 0.12665170431137085
Validation loss = 0.13178983330726624
Validation loss = 0.12887077033519745
Validation loss = 0.14063943922519684
Validation loss = 0.10865788906812668
Validation loss = 0.1198452040553093
Validation loss = 0.11206191033124924
Validation loss = 0.11973761767148972
Validation loss = 0.13690726459026337
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7764407992362976
Validation loss = 0.4315906763076782
Validation loss = 0.36995914578437805
Validation loss = 0.33605489134788513
Validation loss = 0.3095203936100006
Validation loss = 0.31506094336509705
Validation loss = 0.2832483947277069
Validation loss = 0.26654043793678284
Validation loss = 0.261650949716568
Validation loss = 0.27382615208625793
Validation loss = 0.2237802892923355
Validation loss = 0.22731097042560577
Validation loss = 0.2106384038925171
Validation loss = 0.22824789583683014
Validation loss = 0.18373258411884308
Validation loss = 0.1801244020462036
Validation loss = 0.17895814776420593
Validation loss = 0.19972679018974304
Validation loss = 0.19321981072425842
Validation loss = 0.1917853206396103
Validation loss = 0.16516058146953583
Validation loss = 0.17315055429935455
Validation loss = 0.17226138710975647
Validation loss = 0.20558343827724457
Validation loss = 0.17084364593029022
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -106     |
| Iteration     | 0        |
| MaximumReturn | -55.9    |
| MinimumReturn | -143     |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3345020115375519
Validation loss = 0.19757559895515442
Validation loss = 0.16847117245197296
Validation loss = 0.15321476757526398
Validation loss = 0.13853725790977478
Validation loss = 0.12534599006175995
Validation loss = 0.12663471698760986
Validation loss = 0.11698564141988754
Validation loss = 0.11377901583909988
Validation loss = 0.10301398485898972
Validation loss = 0.1026955395936966
Validation loss = 0.09577108174562454
Validation loss = 0.08907061815261841
Validation loss = 0.0903022289276123
Validation loss = 0.0911175087094307
Validation loss = 0.08593931049108505
Validation loss = 0.08958081156015396
Validation loss = 0.07908055186271667
Validation loss = 0.07808012515306473
Validation loss = 0.08578235656023026
Validation loss = 0.08172456175088882
Validation loss = 0.08340802043676376
Validation loss = 0.08225355297327042
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.28170472383499146
Validation loss = 0.19874809682369232
Validation loss = 0.1709109991788864
Validation loss = 0.1516239047050476
Validation loss = 0.1390472799539566
Validation loss = 0.1308731734752655
Validation loss = 0.12150904536247253
Validation loss = 0.11901795119047165
Validation loss = 0.11394423246383667
Validation loss = 0.09781661629676819
Validation loss = 0.09525848925113678
Validation loss = 0.08613388240337372
Validation loss = 0.08871658146381378
Validation loss = 0.10629280656576157
Validation loss = 0.09355802834033966
Validation loss = 0.09303836524486542
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.36468732357025146
Validation loss = 0.19833673536777496
Validation loss = 0.17310233414173126
Validation loss = 0.15625835955142975
Validation loss = 0.14283093810081482
Validation loss = 0.1337922364473343
Validation loss = 0.12003739923238754
Validation loss = 0.11923880130052567
Validation loss = 0.1195918619632721
Validation loss = 0.11443270742893219
Validation loss = 0.09592136740684509
Validation loss = 0.10392197966575623
Validation loss = 0.08416403084993362
Validation loss = 0.09961599856615067
Validation loss = 0.10320819914340973
Validation loss = 0.08012186735868454
Validation loss = 0.0835559219121933
Validation loss = 0.08596543967723846
Validation loss = 0.0853075385093689
Validation loss = 0.08068814128637314
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.37425360083580017
Validation loss = 0.1908440887928009
Validation loss = 0.1711525022983551
Validation loss = 0.15562476217746735
Validation loss = 0.15952616930007935
Validation loss = 0.13007625937461853
Validation loss = 0.12199093401432037
Validation loss = 0.1290205717086792
Validation loss = 0.09999777376651764
Validation loss = 0.09980890154838562
Validation loss = 0.10995128005743027
Validation loss = 0.10355029255151749
Validation loss = 0.08994216471910477
Validation loss = 0.0934358611702919
Validation loss = 0.08377746492624283
Validation loss = 0.09753713756799698
Validation loss = 0.07890573143959045
Validation loss = 0.08729299157857895
Validation loss = 0.08433207869529724
Validation loss = 0.08529407531023026
Validation loss = 0.07753964513540268
Validation loss = 0.07270851731300354
Validation loss = 0.06980043649673462
Validation loss = 0.07323820888996124
Validation loss = 0.06915276497602463
Validation loss = 0.07360757142305374
Validation loss = 0.0753941684961319
Validation loss = 0.08161798864603043
Validation loss = 0.08680453151464462
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.301821231842041
Validation loss = 0.19586825370788574
Validation loss = 0.174270361661911
Validation loss = 0.14860323071479797
Validation loss = 0.13758356869220734
Validation loss = 0.12372998148202896
Validation loss = 0.12911276519298553
Validation loss = 0.1187877207994461
Validation loss = 0.1098257452249527
Validation loss = 0.10816464573144913
Validation loss = 0.09291675686836243
Validation loss = 0.0984295904636383
Validation loss = 0.0992831140756607
Validation loss = 0.10112568736076355
Validation loss = 0.10270508378744125
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.0196078431372549
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.019230769230769232
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.018867924528301886
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.037037037037037035
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03636363636363636
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.03571428571428571
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.07017543859649122
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.08620689655172414
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1016949152542373
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.11666666666666667
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11475409836065574
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11290322580645161
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12698412698412698
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.15625
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16923076923076924
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16666666666666666
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.19402985074626866
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19117647058823528
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.21739130434782608
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.2571428571428571
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.28169014084507044
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3055555555555556
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3150684931506849
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.32432432432432434
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3466666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.2    |
| Iteration     | 1        |
| MaximumReturn | -0.104   |
| MinimumReturn | -43.9    |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1404435932636261
Validation loss = 0.10178729146718979
Validation loss = 0.08536030352115631
Validation loss = 0.07875150442123413
Validation loss = 0.08309830725193024
Validation loss = 0.08226729184389114
Validation loss = 0.0825473740696907
Validation loss = 0.07823633402585983
Validation loss = 0.07917971909046173
Validation loss = 0.07668643444776535
Validation loss = 0.07976878434419632
Validation loss = 0.07853268086910248
Validation loss = 0.07903739809989929
Validation loss = 0.0902690663933754
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14882084727287292
Validation loss = 0.10474221408367157
Validation loss = 0.09165680408477783
Validation loss = 0.08902439475059509
Validation loss = 0.08420410752296448
Validation loss = 0.09338335692882538
Validation loss = 0.0799933522939682
Validation loss = 0.08251477777957916
Validation loss = 0.0843517929315567
Validation loss = 0.09319135546684265
Validation loss = 0.07988952100276947
Validation loss = 0.09423983097076416
Validation loss = 0.08031317591667175
Validation loss = 0.07890966534614563
Validation loss = 0.08635857701301575
Validation loss = 0.08352486789226532
Validation loss = 0.08164070546627045
Validation loss = 0.09568831324577332
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.19011551141738892
Validation loss = 0.13739217817783356
Validation loss = 0.10957552492618561
Validation loss = 0.09909380972385406
Validation loss = 0.08664238452911377
Validation loss = 0.08227569609880447
Validation loss = 0.08474323153495789
Validation loss = 0.07998119294643402
Validation loss = 0.07831878960132599
Validation loss = 0.07955975085496902
Validation loss = 0.08455328643321991
Validation loss = 0.08356738090515137
Validation loss = 0.08525507897138596
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.15612205862998962
Validation loss = 0.10599923133850098
Validation loss = 0.09507769346237183
Validation loss = 0.08500008285045624
Validation loss = 0.08294159919023514
Validation loss = 0.08746939897537231
Validation loss = 0.07805493474006653
Validation loss = 0.07577166706323624
Validation loss = 0.07791958749294281
Validation loss = 0.0798463374376297
Validation loss = 0.07625893503427505
Validation loss = 0.08177255839109421
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1654309630393982
Validation loss = 0.12778852880001068
Validation loss = 0.10395649075508118
Validation loss = 0.10532757639884949
Validation loss = 0.09621088951826096
Validation loss = 0.09146688133478165
Validation loss = 0.09605124592781067
Validation loss = 0.09538765251636505
Validation loss = 0.08518629521131516
Validation loss = 0.08548116683959961
Validation loss = 0.08539819717407227
Validation loss = 0.08115285634994507
Validation loss = 0.08720806241035461
Validation loss = 0.08331280201673508
Validation loss = 0.088111013174057
Validation loss = 0.08527179062366486
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.34210526315789475
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.33766233766233766
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3333333333333333
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3291139240506329
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.35
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.345679012345679
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.34146341463414637
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3373493975903614
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3333333333333333
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32941176470588235
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.32558139534883723
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3218390804597701
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3181818181818182
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3146067415730337
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3111111111111111
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3076923076923077
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.30434782608695654
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.3010752688172043
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2978723404255319
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29473684210526313
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3020833333333333
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29896907216494845
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29591836734693877
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29292929292929293
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.29
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.07    |
| Iteration     | 2        |
| MaximumReturn | -0.0357  |
| MinimumReturn | -12.5    |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07951372861862183
Validation loss = 0.0765933096408844
Validation loss = 0.08100565522909164
Validation loss = 0.07403936237096786
Validation loss = 0.0739905908703804
Validation loss = 0.0750265121459961
Validation loss = 0.09643501788377762
Validation loss = 0.08378481864929199
Validation loss = 0.07875408977270126
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10107306391000748
Validation loss = 0.07935584336519241
Validation loss = 0.08435923606157303
Validation loss = 0.08823081105947495
Validation loss = 0.08136364072561264
Validation loss = 0.07482457160949707
Validation loss = 0.0758182480931282
Validation loss = 0.07524639368057251
Validation loss = 0.08017676323652267
Validation loss = 0.0823076143860817
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08314458280801773
Validation loss = 0.08251673728227615
Validation loss = 0.08244375139474869
Validation loss = 0.08334868401288986
Validation loss = 0.08509305864572525
Validation loss = 0.08007428795099258
Validation loss = 0.08091678470373154
Validation loss = 0.07526282221078873
Validation loss = 0.07703033089637756
Validation loss = 0.07593593001365662
Validation loss = 0.07770850509405136
Validation loss = 0.07973405718803406
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10458114743232727
Validation loss = 0.0834399163722992
Validation loss = 0.07475673407316208
Validation loss = 0.08122814446687698
Validation loss = 0.08614780753850937
Validation loss = 0.08001987636089325
Validation loss = 0.07756362110376358
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.12388551235198975
Validation loss = 0.08255326747894287
Validation loss = 0.09531249850988388
Validation loss = 0.07935073226690292
Validation loss = 0.07802236825227737
Validation loss = 0.07813095301389694
Validation loss = 0.0861801877617836
Validation loss = 0.08420360833406448
Validation loss = 0.08264393359422684
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2871287128712871
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.28431372549019607
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2815533980582524
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27884615384615385
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2761904761904762
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27358490566037735
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.27102803738317754
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26851851851851855
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26605504587155965
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2636363636363636
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26126126126126126
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25892857142857145
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25663716814159293
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2543859649122807
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25217391304347825
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24786324786324787
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2457627118644068
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24369747899159663
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24166666666666667
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2396694214876033
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23770491803278687
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23577235772357724
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23387096774193547
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.232
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.017   |
| Iteration     | 3        |
| MaximumReturn | -0.0107  |
| MinimumReturn | -0.0239  |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11022484302520752
Validation loss = 0.0732022300362587
Validation loss = 0.07134151458740234
Validation loss = 0.06951818615198135
Validation loss = 0.07454482465982437
Validation loss = 0.0647532194852829
Validation loss = 0.07721369713544846
Validation loss = 0.06884291768074036
Validation loss = 0.060610584914684296
Validation loss = 0.06465178728103638
Validation loss = 0.06546653062105179
Validation loss = 0.0636269748210907
Validation loss = 0.06306485831737518
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09486963599920273
Validation loss = 0.08621405810117722
Validation loss = 0.07246636599302292
Validation loss = 0.0678124949336052
Validation loss = 0.07365910708904266
Validation loss = 0.062352199107408524
Validation loss = 0.0656917467713356
Validation loss = 0.06643795222043991
Validation loss = 0.06630024313926697
Validation loss = 0.06964854151010513
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11410935968160629
Validation loss = 0.08692333847284317
Validation loss = 0.08074577897787094
Validation loss = 0.06777064502239227
Validation loss = 0.06349059194326401
Validation loss = 0.06344622373580933
Validation loss = 0.0748966634273529
Validation loss = 0.06477800756692886
Validation loss = 0.06394654512405396
Validation loss = 0.06235288083553314
Validation loss = 0.06779579818248749
Validation loss = 0.06518108397722244
Validation loss = 0.06289754062891006
Validation loss = 0.06510201096534729
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11519681662321091
Validation loss = 0.08728786557912827
Validation loss = 0.08127407729625702
Validation loss = 0.07072502374649048
Validation loss = 0.08086901903152466
Validation loss = 0.0665089562535286
Validation loss = 0.07023143768310547
Validation loss = 0.06915751844644547
Validation loss = 0.06636130809783936
Validation loss = 0.06751681119203568
Validation loss = 0.07003825157880783
Validation loss = 0.06669837981462479
Validation loss = 0.06457974016666412
Validation loss = 0.06399642676115036
Validation loss = 0.067044697701931
Validation loss = 0.07769742608070374
Validation loss = 0.06790030002593994
Validation loss = 0.07406473904848099
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11346998065710068
Validation loss = 0.07339470833539963
Validation loss = 0.07030736654996872
Validation loss = 0.06510510295629501
Validation loss = 0.0689140260219574
Validation loss = 0.06671524047851562
Validation loss = 0.07127241045236588
Validation loss = 0.06973390281200409
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23015873015873015
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2283464566929134
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2265625
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2248062015503876
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2230769230769231
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22137404580152673
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2196969696969697
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21804511278195488
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21641791044776118
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21481481481481482
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21323529411764705
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2116788321167883
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21014492753623187
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20863309352517986
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20714285714285716
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20567375886524822
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20422535211267606
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20279720279720279
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2013888888888889
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19863013698630136
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19727891156462585
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19594594594594594
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19463087248322147
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19333333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.05    |
| Iteration     | 4        |
| MaximumReturn | -0.0304  |
| MinimumReturn | -0.213   |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06871148943901062
Validation loss = 0.055387504398822784
Validation loss = 0.05373489856719971
Validation loss = 0.05056710168719292
Validation loss = 0.05211299657821655
Validation loss = 0.04961498826742172
Validation loss = 0.05216703563928604
Validation loss = 0.05653146654367447
Validation loss = 0.053019531071186066
Validation loss = 0.04885237291455269
Validation loss = 0.04927052557468414
Validation loss = 0.04813828319311142
Validation loss = 0.05181562900543213
Validation loss = 0.05235232040286064
Validation loss = 0.05116572976112366
Validation loss = 0.04996616765856743
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07224559038877487
Validation loss = 0.05828180909156799
Validation loss = 0.06215570494532585
Validation loss = 0.0606154203414917
Validation loss = 0.05580337718129158
Validation loss = 0.05578659847378731
Validation loss = 0.05218476802110672
Validation loss = 0.05210225656628609
Validation loss = 0.053425855934619904
Validation loss = 0.053883619606494904
Validation loss = 0.05847351998090744
Validation loss = 0.04960920289158821
Validation loss = 0.05439462512731552
Validation loss = 0.05156952887773514
Validation loss = 0.04927723854780197
Validation loss = 0.055495280772447586
Validation loss = 0.056280843913555145
Validation loss = 0.05417464300990105
Validation loss = 0.055017340928316116
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07939431816339493
Validation loss = 0.0598980076611042
Validation loss = 0.0507645383477211
Validation loss = 0.050095848739147186
Validation loss = 0.05181838944554329
Validation loss = 0.04972941428422928
Validation loss = 0.052172500640153885
Validation loss = 0.04858933761715889
Validation loss = 0.052016694098711014
Validation loss = 0.05298949405550957
Validation loss = 0.0535275824368
Validation loss = 0.050664015114307404
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06867588311433792
Validation loss = 0.055740319192409515
Validation loss = 0.05635827034711838
Validation loss = 0.05384470149874687
Validation loss = 0.054694633930921555
Validation loss = 0.05547575280070305
Validation loss = 0.053999729454517365
Validation loss = 0.050311557948589325
Validation loss = 0.0534229651093483
Validation loss = 0.05677853897213936
Validation loss = 0.055049777030944824
Validation loss = 0.05251465365290642
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07970738410949707
Validation loss = 0.06477119773626328
Validation loss = 0.05766204744577408
Validation loss = 0.05287853628396988
Validation loss = 0.0525868721306324
Validation loss = 0.06112103536725044
Validation loss = 0.05430739372968674
Validation loss = 0.0567852184176445
Validation loss = 0.057387638837099075
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1986754966887417
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20394736842105263
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.20915032679738563
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2077922077922078
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2064516129032258
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20512820512820512
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.21019108280254778
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2088607594936709
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20754716981132076
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2125
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.21739130434782608
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2222222222222222
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22699386503067484
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23170731707317074
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23636363636363636
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24096385542168675
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23952095808383234
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24404761904761904
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2485207100591716
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2529411764705882
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2573099415204678
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2616279069767442
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26011560693641617
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.26436781609195403
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.26857142857142857
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.18    |
| Iteration     | 5        |
| MaximumReturn | -0.0613  |
| MinimumReturn | -29.4    |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05756942182779312
Validation loss = 0.05419325828552246
Validation loss = 0.049784913659095764
Validation loss = 0.05198393389582634
Validation loss = 0.055428631603717804
Validation loss = 0.053456634283065796
Validation loss = 0.05192321538925171
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07092893123626709
Validation loss = 0.055255819112062454
Validation loss = 0.05463016778230667
Validation loss = 0.05895993113517761
Validation loss = 0.05308659002184868
Validation loss = 0.05743204802274704
Validation loss = 0.06078442931175232
Validation loss = 0.053937774151563644
Validation loss = 0.05339808389544487
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06387373059988022
Validation loss = 0.049115754663944244
Validation loss = 0.054037876427173615
Validation loss = 0.05594531446695328
Validation loss = 0.06513066589832306
Validation loss = 0.06177131459116936
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06730973720550537
Validation loss = 0.058521170169115067
Validation loss = 0.05519900470972061
Validation loss = 0.0582672581076622
Validation loss = 0.06014154106378555
Validation loss = 0.05370693653821945
Validation loss = 0.05784207582473755
Validation loss = 0.056858908385038376
Validation loss = 0.05671491473913193
Validation loss = 0.055412255227565765
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07067456841468811
Validation loss = 0.05353974178433418
Validation loss = 0.05398793891072273
Validation loss = 0.06074875593185425
Validation loss = 0.056106019765138626
Validation loss = 0.05839868262410164
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26704545454545453
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2655367231638418
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2640449438202247
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.26256983240223464
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2611111111111111
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2596685082872928
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25824175824175827
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2568306010928962
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2554347826086957
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25405405405405407
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25268817204301075
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25133689839572193
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24867724867724866
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24736842105263157
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24607329842931938
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24479166666666666
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24352331606217617
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2422680412371134
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24102564102564103
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23979591836734693
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23857868020304568
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23737373737373738
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23618090452261306
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.235
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0564  |
| Iteration     | 6        |
| MaximumReturn | -0.0314  |
| MinimumReturn | -0.083   |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05837930738925934
Validation loss = 0.052424702793359756
Validation loss = 0.053766798228025436
Validation loss = 0.0563385896384716
Validation loss = 0.052889738231897354
Validation loss = 0.053301852196455
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06358324736356735
Validation loss = 0.06338456273078918
Validation loss = 0.05795466899871826
Validation loss = 0.06096480414271355
Validation loss = 0.0587332583963871
Validation loss = 0.05919525399804115
Validation loss = 0.06404704600572586
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06123858690261841
Validation loss = 0.0633392333984375
Validation loss = 0.055281687527894974
Validation loss = 0.05478600040078163
Validation loss = 0.05903221666812897
Validation loss = 0.059167712926864624
Validation loss = 0.05999079719185829
Validation loss = 0.05842398479580879
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0713125541806221
Validation loss = 0.05498775467276573
Validation loss = 0.05903824046254158
Validation loss = 0.05858514830470085
Validation loss = 0.05825766921043396
Validation loss = 0.0548812635242939
Validation loss = 0.06869089603424072
Validation loss = 0.06006007269024849
Validation loss = 0.05956282094120979
Validation loss = 0.060681771486997604
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07176660746335983
Validation loss = 0.06115587055683136
Validation loss = 0.057279322296381
Validation loss = 0.05883382633328438
Validation loss = 0.05408843234181404
Validation loss = 0.056188106536865234
Validation loss = 0.05755390599370003
Validation loss = 0.06517647951841354
Validation loss = 0.06382756680250168
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23383084577114427
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23267326732673269
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2315270935960591
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23039215686274508
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22926829268292684
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22815533980582525
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22705314009661837
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22596153846153846
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22488038277511962
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22380952380952382
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22274881516587677
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22169811320754718
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22065727699530516
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21962616822429906
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2186046511627907
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2175925925925926
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21658986175115208
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21559633027522937
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2146118721461187
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21363636363636362
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21266968325791855
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21171171171171171
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21076233183856502
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20982142857142858
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2088888888888889
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0029  |
| Iteration     | 7        |
| MaximumReturn | -0.00201 |
| MinimumReturn | -0.00413 |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0717129036784172
Validation loss = 0.05468999966979027
Validation loss = 0.051667697727680206
Validation loss = 0.05187647417187691
Validation loss = 0.056533001363277435
Validation loss = 0.051727294921875
Validation loss = 0.052151430398225784
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05892297998070717
Validation loss = 0.05341976508498192
Validation loss = 0.05326242372393608
Validation loss = 0.0516325905919075
Validation loss = 0.057494137436151505
Validation loss = 0.054378289729356766
Validation loss = 0.06738290190696716
Validation loss = 0.05311775207519531
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06483321636915207
Validation loss = 0.05339515209197998
Validation loss = 0.05469048395752907
Validation loss = 0.06458400189876556
Validation loss = 0.051533762365579605
Validation loss = 0.05312775447964668
Validation loss = 0.050730932503938675
Validation loss = 0.05516837164759636
Validation loss = 0.05915943160653114
Validation loss = 0.051907140761613846
Validation loss = 0.058027394115924835
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06447307765483856
Validation loss = 0.05806233733892441
Validation loss = 0.058863721787929535
Validation loss = 0.056877873837947845
Validation loss = 0.053205590695142746
Validation loss = 0.05494477227330208
Validation loss = 0.056971244513988495
Validation loss = 0.05246904119849205
Validation loss = 0.05289251357316971
Validation loss = 0.050672322511672974
Validation loss = 0.05179649218916893
Validation loss = 0.056104887276887894
Validation loss = 0.05567038059234619
Validation loss = 0.05393998697400093
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06315197050571442
Validation loss = 0.0530986450612545
Validation loss = 0.06030559167265892
Validation loss = 0.05668298527598381
Validation loss = 0.05615628510713577
Validation loss = 0.05477555841207504
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.21238938053097345
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.21585903083700442
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2149122807017544
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21397379912663755
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.21739130434782608
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22077922077922077
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22413793103448276
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.22746781115879827
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23076923076923078
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2297872340425532
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2288135593220339
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2320675105485232
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23529411764705882
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23430962343096234
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2375
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24066390041493776
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24380165289256198
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.24691358024691357
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.25
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24897959183673468
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.25203252032520324
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2550607287449393
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2540322580645161
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.2570281124497992
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.26
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -54.2    |
| Iteration     | 8        |
| MaximumReturn | -1       |
| MinimumReturn | -122     |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06756401062011719
Validation loss = 0.05065443366765976
Validation loss = 0.048872627317905426
Validation loss = 0.04682958871126175
Validation loss = 0.048773813992738724
Validation loss = 0.04830683022737503
Validation loss = 0.048475444316864014
Validation loss = 0.0484810434281826
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05899249017238617
Validation loss = 0.049752794206142426
Validation loss = 0.04799972474575043
Validation loss = 0.049163974821567535
Validation loss = 0.05167577415704727
Validation loss = 0.04910951852798462
Validation loss = 0.04718581214547157
Validation loss = 0.04683277755975723
Validation loss = 0.0498243011534214
Validation loss = 0.05041591823101044
Validation loss = 0.04855932667851448
Validation loss = 0.046528447419404984
Validation loss = 0.048779238015413284
Validation loss = 0.0510312095284462
Validation loss = 0.048453547060489655
Validation loss = 0.047956228256225586
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.062249764800071716
Validation loss = 0.05230940505862236
Validation loss = 0.04779721796512604
Validation loss = 0.04607439041137695
Validation loss = 0.05051576718688011
Validation loss = 0.049068763852119446
Validation loss = 0.048947833478450775
Validation loss = 0.043084703385829926
Validation loss = 0.047939449548721313
Validation loss = 0.04807575047016144
Validation loss = 0.045383699238300323
Validation loss = 0.047731705009937286
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06213332712650299
Validation loss = 0.06109147518873215
Validation loss = 0.05297337472438812
Validation loss = 0.046611614525318146
Validation loss = 0.04911880940198898
Validation loss = 0.04749133065342903
Validation loss = 0.04601393640041351
Validation loss = 0.047343041747808456
Validation loss = 0.05091256648302078
Validation loss = 0.04606263339519501
Validation loss = 0.05038315802812576
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06977629661560059
Validation loss = 0.052469365298748016
Validation loss = 0.049550898373126984
Validation loss = 0.052648141980171204
Validation loss = 0.05226608365774155
Validation loss = 0.050609104335308075
Validation loss = 0.05048098415136337
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2589641434262948
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25793650793650796
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25691699604743085
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2559055118110236
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2549019607843137
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25390625
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2529182879377432
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25193798449612403
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25096525096525096
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.25
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24904214559386972
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2480916030534351
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24714828897338403
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24621212121212122
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24528301886792453
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24436090225563908
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24344569288389514
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24253731343283583
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.241635687732342
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.24074074074074073
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23985239852398524
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23897058823529413
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23809523809523808
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23722627737226276
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23636363636363636
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0476  |
| Iteration     | 9        |
| MaximumReturn | -0.0338  |
| MinimumReturn | -0.066   |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.056361060589551926
Validation loss = 0.04460244998335838
Validation loss = 0.044852182269096375
Validation loss = 0.04566648602485657
Validation loss = 0.03942595794796944
Validation loss = 0.04142547771334648
Validation loss = 0.03987489268183708
Validation loss = 0.03849571943283081
Validation loss = 0.044014737010002136
Validation loss = 0.04220491647720337
Validation loss = 0.040755558758974075
Validation loss = 0.04282168671488762
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05254284292459488
Validation loss = 0.045374490320682526
Validation loss = 0.0424702912569046
Validation loss = 0.043258532881736755
Validation loss = 0.045164987444877625
Validation loss = 0.04235496371984482
Validation loss = 0.042261745780706406
Validation loss = 0.04432719200849533
Validation loss = 0.04472722113132477
Validation loss = 0.04420379176735878
Validation loss = 0.046384699642658234
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.050256337970495224
Validation loss = 0.04357311874628067
Validation loss = 0.04233495518565178
Validation loss = 0.040129244327545166
Validation loss = 0.042206183075904846
Validation loss = 0.041532617062330246
Validation loss = 0.04330284520983696
Validation loss = 0.04108528420329094
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05076942220330238
Validation loss = 0.04473595321178436
Validation loss = 0.043962761759757996
Validation loss = 0.043445341289043427
Validation loss = 0.04495612904429436
Validation loss = 0.04579441249370575
Validation loss = 0.04497615993022919
Validation loss = 0.04262656718492508
Validation loss = 0.04297643527388573
Validation loss = 0.04223741590976715
Validation loss = 0.041792768985033035
Validation loss = 0.04353040084242821
Validation loss = 0.041389353573322296
Validation loss = 0.046736523509025574
Validation loss = 0.04505033791065216
Validation loss = 0.04333033785223961
Validation loss = 0.042856842279434204
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.053655464202165604
Validation loss = 0.048363201320171356
Validation loss = 0.04723219573497772
Validation loss = 0.045323193073272705
Validation loss = 0.04680407792329788
Validation loss = 0.04494297876954079
Validation loss = 0.047138385474681854
Validation loss = 0.042589351534843445
Validation loss = 0.04616881161928177
Validation loss = 0.050664037466049194
Validation loss = 0.04545667767524719
Validation loss = 0.04503270611166954
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23550724637681159
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23465703971119134
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23381294964028776
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23297491039426524
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23214285714285715
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2313167259786477
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.23049645390070922
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22968197879858657
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22887323943661972
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22807017543859648
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22727272727272727
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2264808362369338
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22569444444444445
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22491349480968859
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22413793103448276
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22336769759450173
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2226027397260274
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22184300341296928
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22108843537414966
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.22033898305084745
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2195945945945946
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21885521885521886
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2181208053691275
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21739130434782608
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21666666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0075  |
| Iteration     | 10       |
| MaximumReturn | -0.00532 |
| MinimumReturn | -0.00961 |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.045943938195705414
Validation loss = 0.04743192344903946
Validation loss = 0.047442931681871414
Validation loss = 0.040046390146017075
Validation loss = 0.03665659576654434
Validation loss = 0.035699449479579926
Validation loss = 0.03756057471036911
Validation loss = 0.037970222532749176
Validation loss = 0.04034093767404556
Validation loss = 0.04110613092780113
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.047653861343860626
Validation loss = 0.04126521944999695
Validation loss = 0.04202935844659805
Validation loss = 0.04495556280016899
Validation loss = 0.039904385805130005
Validation loss = 0.03918389230966568
Validation loss = 0.040338922291994095
Validation loss = 0.04058897867798805
Validation loss = 0.04125130921602249
Validation loss = 0.04349232465028763
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.048362910747528076
Validation loss = 0.047695472836494446
Validation loss = 0.03861905634403229
Validation loss = 0.040250230580568314
Validation loss = 0.03748132288455963
Validation loss = 0.04599897935986519
Validation loss = 0.03883083164691925
Validation loss = 0.04073743522167206
Validation loss = 0.04128672555088997
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.053064893931150436
Validation loss = 0.04076794907450676
Validation loss = 0.04064556583762169
Validation loss = 0.04370865970849991
Validation loss = 0.0399722158908844
Validation loss = 0.03914909064769745
Validation loss = 0.043438494205474854
Validation loss = 0.04079512134194374
Validation loss = 0.04013148695230484
Validation loss = 0.04115388169884682
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04447542876005173
Validation loss = 0.04698190093040466
Validation loss = 0.041918184608221054
Validation loss = 0.04927291348576546
Validation loss = 0.04102625325322151
Validation loss = 0.04685702174901962
Validation loss = 0.04649825021624565
Validation loss = 0.04725484177470207
Validation loss = 0.04162050038576126
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2159468438538206
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2152317880794702
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2145214521452145
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2138157894736842
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21311475409836064
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21241830065359477
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21172638436482086
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21103896103896103
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.21035598705501618
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20967741935483872
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2090032154340836
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20833333333333334
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20766773162939298
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2070063694267516
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20634920634920634
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20569620253164558
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20504731861198738
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20440251572327045
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20376175548589343
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.203125
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20249221183800623
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20186335403726707
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.20123839009287925
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2006172839506173
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.2
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0135  |
| Iteration     | 11       |
| MaximumReturn | -0.0107  |
| MinimumReturn | -0.0208  |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04316207021474838
Validation loss = 0.040877439081668854
Validation loss = 0.039794035255908966
Validation loss = 0.04099767655134201
Validation loss = 0.03673935681581497
Validation loss = 0.03783859312534332
Validation loss = 0.038022320717573166
Validation loss = 0.040430523455142975
Validation loss = 0.03808509185910225
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04809591919183731
Validation loss = 0.04786691814661026
Validation loss = 0.046456169337034225
Validation loss = 0.03946149721741676
Validation loss = 0.04013868421316147
Validation loss = 0.04322069138288498
Validation loss = 0.04378792271018028
Validation loss = 0.04116620868444443
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.047094039618968964
Validation loss = 0.04147021099925041
Validation loss = 0.04586886242032051
Validation loss = 0.043705444782972336
Validation loss = 0.040844254195690155
Validation loss = 0.039616674184799194
Validation loss = 0.041066404432058334
Validation loss = 0.04068606719374657
Validation loss = 0.04315979778766632
Validation loss = 0.040029630064964294
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0534229651093483
Validation loss = 0.0449150875210762
Validation loss = 0.039918117225170135
Validation loss = 0.045266516506671906
Validation loss = 0.04252339154481888
Validation loss = 0.0427219420671463
Validation loss = 0.03986131399869919
Validation loss = 0.03981522470712662
Validation loss = 0.040520042181015015
Validation loss = 0.04168814793229103
Validation loss = 0.043562181293964386
Validation loss = 0.0418393574655056
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04210629686713219
Validation loss = 0.044713594019412994
Validation loss = 0.04103265330195427
Validation loss = 0.04132835939526558
Validation loss = 0.04385966807603836
Validation loss = 0.04323301091790199
Validation loss = 0.042993199080228806
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19938650306748465
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19877675840978593
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19817073170731708
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19756838905775076
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19696969696969696
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19637462235649547
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19578313253012047
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19519519519519518
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19461077844311378
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19402985074626866
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19345238095238096
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19287833827893175
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19230769230769232
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19174041297935104
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19117647058823528
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1906158357771261
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19005847953216373
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18950437317784258
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18895348837209303
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18840579710144928
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18786127167630057
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1873198847262248
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1867816091954023
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18624641833810887
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18571428571428572
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0188  |
| Iteration     | 12       |
| MaximumReturn | -0.0126  |
| MinimumReturn | -0.0276  |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03761438652873039
Validation loss = 0.037138886749744415
Validation loss = 0.04015762731432915
Validation loss = 0.03512215241789818
Validation loss = 0.03722125664353371
Validation loss = 0.03530190885066986
Validation loss = 0.036799781024456024
Validation loss = 0.038131825625896454
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03899337351322174
Validation loss = 0.04042594134807587
Validation loss = 0.04007752239704132
Validation loss = 0.038428399711847305
Validation loss = 0.037908025085926056
Validation loss = 0.036954279989004135
Validation loss = 0.036301132291555405
Validation loss = 0.03858555108308792
Validation loss = 0.037810321897268295
Validation loss = 0.046753883361816406
Validation loss = 0.04300756752490997
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03832486271858215
Validation loss = 0.039651136845350266
Validation loss = 0.03963761404156685
Validation loss = 0.036721620708703995
Validation loss = 0.03979251906275749
Validation loss = 0.035428762435913086
Validation loss = 0.03842506557703018
Validation loss = 0.03815259784460068
Validation loss = 0.03708949312567711
Validation loss = 0.037973545491695404
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03969563543796539
Validation loss = 0.036940813064575195
Validation loss = 0.04117777943611145
Validation loss = 0.03903097286820412
Validation loss = 0.03873654827475548
Validation loss = 0.0425746776163578
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04931803047657013
Validation loss = 0.03843408450484276
Validation loss = 0.04377575218677521
Validation loss = 0.041660554707050323
Validation loss = 0.03903641924262047
Validation loss = 0.037796393036842346
Validation loss = 0.041662123054265976
Validation loss = 0.03959706798195839
Validation loss = 0.03774994984269142
Validation loss = 0.04014540836215019
Validation loss = 0.04780237376689911
Validation loss = 0.03806731477379799
Validation loss = 0.04173163324594498
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18518518518518517
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1846590909090909
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18413597733711048
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18361581920903955
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18309859154929578
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18258426966292135
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18207282913165265
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18156424581005587
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.181058495821727
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18055555555555555
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18005540166204986
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17955801104972377
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1790633608815427
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17857142857142858
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1780821917808219
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17759562841530055
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1771117166212534
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1766304347826087
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17615176151761516
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17567567567567569
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1752021563342318
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17473118279569894
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1742627345844504
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17379679144385027
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17333333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0019  |
| Iteration     | 13       |
| MaximumReturn | -0.00136 |
| MinimumReturn | -0.00269 |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.033464040607213974
Validation loss = 0.03410534933209419
Validation loss = 0.04659203812479973
Validation loss = 0.03429378196597099
Validation loss = 0.03506922349333763
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04053254798054695
Validation loss = 0.03713838383555412
Validation loss = 0.03595501556992531
Validation loss = 0.036321792751550674
Validation loss = 0.03501546382904053
Validation loss = 0.03848434239625931
Validation loss = 0.04118384048342705
Validation loss = 0.03784804791212082
Validation loss = 0.03892543539404869
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0368293859064579
Validation loss = 0.040361251682043076
Validation loss = 0.037585947662591934
Validation loss = 0.03861944377422333
Validation loss = 0.038886506110429764
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.041611943393945694
Validation loss = 0.042254701256752014
Validation loss = 0.03997479006648064
Validation loss = 0.03536508232355118
Validation loss = 0.03673139214515686
Validation loss = 0.037274282425642014
Validation loss = 0.037234190851449966
Validation loss = 0.03502940014004707
Validation loss = 0.0363764725625515
Validation loss = 0.035322271287441254
Validation loss = 0.03585376963019371
Validation loss = 0.04203442111611366
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04501224681735039
Validation loss = 0.03836442902684212
Validation loss = 0.03921434283256531
Validation loss = 0.036879681050777435
Validation loss = 0.039058834314346313
Validation loss = 0.03653419390320778
Validation loss = 0.03569183871150017
Validation loss = 0.036947883665561676
Validation loss = 0.03742748498916626
Validation loss = 0.03745470568537712
Validation loss = 0.04116606339812279
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17287234042553193
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1724137931034483
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17195767195767195
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17150395778364116
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17105263157894737
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17060367454068243
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17015706806282724
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16971279373368145
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16927083333333334
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16883116883116883
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16839378238341968
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16795865633074936
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16752577319587628
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16709511568123395
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16666666666666666
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16624040920716113
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1683673469387755
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16793893129770993
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16751269035532995
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1670886075949367
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16666666666666666
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16624685138539042
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1658291457286432
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16541353383458646
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.165
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.04    |
| Iteration     | 14       |
| MaximumReturn | -0.0273  |
| MinimumReturn | -24.9    |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04379422962665558
Validation loss = 0.03340603783726692
Validation loss = 0.03191205859184265
Validation loss = 0.035639140754938126
Validation loss = 0.03403058275580406
Validation loss = 0.03335549682378769
Validation loss = 0.03183657303452492
Validation loss = 0.030404511839151382
Validation loss = 0.034301601350307465
Validation loss = 0.03114132210612297
Validation loss = 0.03550751507282257
Validation loss = 0.031351856887340546
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03587121516466141
Validation loss = 0.03142738714814186
Validation loss = 0.040680669248104095
Validation loss = 0.03129781410098076
Validation loss = 0.03456715866923332
Validation loss = 0.03264280781149864
Validation loss = 0.03405478596687317
Validation loss = 0.03257962316274643
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.039650142192840576
Validation loss = 0.03531232103705406
Validation loss = 0.03970469534397125
Validation loss = 0.0324595607817173
Validation loss = 0.03310427442193031
Validation loss = 0.02976812794804573
Validation loss = 0.03873675689101219
Validation loss = 0.02998347021639347
Validation loss = 0.032310858368873596
Validation loss = 0.03198734298348427
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.036043159663677216
Validation loss = 0.04939891770482063
Validation loss = 0.0318087637424469
Validation loss = 0.036735206842422485
Validation loss = 0.03584642335772514
Validation loss = 0.03564203158020973
Validation loss = 0.04437024146318436
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04043736308813095
Validation loss = 0.03750517964363098
Validation loss = 0.033042799681425095
Validation loss = 0.03205409273505211
Validation loss = 0.03739871084690094
Validation loss = 0.037894539535045624
Validation loss = 0.03292711824178696
Validation loss = 0.03575295954942703
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16458852867830423
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16417910447761194
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16377171215880892
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16336633663366337
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16296296296296298
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1625615763546798
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16216216216216217
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16176470588235295
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16136919315403422
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16097560975609757
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16058394160583941
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16019417475728157
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15980629539951574
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15942028985507245
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15903614457831325
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15865384615384615
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15827338129496402
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15789473684210525
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1575178997613365
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15714285714285714
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15676959619952494
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15639810426540285
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15602836879432624
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15566037735849056
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15529411764705883
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00949 |
| Iteration     | 15       |
| MaximumReturn | -0.00778 |
| MinimumReturn | -0.011   |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029962819069623947
Validation loss = 0.02816573902964592
Validation loss = 0.036231689155101776
Validation loss = 0.03108646161854267
Validation loss = 0.03137747198343277
Validation loss = 0.02960350550711155
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.032616179436445236
Validation loss = 0.03193589672446251
Validation loss = 0.03245364502072334
Validation loss = 0.03034350275993347
Validation loss = 0.035373032093048096
Validation loss = 0.03494579717516899
Validation loss = 0.030712291598320007
Validation loss = 0.03139830753207207
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03227744251489639
Validation loss = 0.03367976099252701
Validation loss = 0.03213377669453621
Validation loss = 0.030926823616027832
Validation loss = 0.03156837448477745
Validation loss = 0.02844606526196003
Validation loss = 0.030174659565091133
Validation loss = 0.029556144028902054
Validation loss = 0.029641609638929367
Validation loss = 0.033505629748106
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.037833429872989655
Validation loss = 0.03200129419565201
Validation loss = 0.032239168882369995
Validation loss = 0.03265666961669922
Validation loss = 0.030319949612021446
Validation loss = 0.03199901059269905
Validation loss = 0.031168362125754356
Validation loss = 0.03132552653551102
Validation loss = 0.031242610886693
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03381853550672531
Validation loss = 0.031341470777988434
Validation loss = 0.03887903690338135
Validation loss = 0.03486766666173935
Validation loss = 0.03315161168575287
Validation loss = 0.033546674996614456
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15492957746478872
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15456674473067916
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1542056074766355
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15384615384615385
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15348837209302327
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1531322505800464
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1527777777777778
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15242494226327943
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15207373271889402
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15172413793103448
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15137614678899083
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15102974828375287
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1506849315068493
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15034168564920272
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14965986394557823
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1493212669683258
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1489841986455982
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14864864864864866
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14831460674157304
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14798206278026907
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1476510067114094
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14732142857142858
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14699331848552338
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14666666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0321  |
| Iteration     | 16       |
| MaximumReturn | -0.0213  |
| MinimumReturn | -0.0503  |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0303004439920187
Validation loss = 0.03259217366576195
Validation loss = 0.030118290334939957
Validation loss = 0.029589831829071045
Validation loss = 0.03832957521080971
Validation loss = 0.02961757592856884
Validation loss = 0.029709327965974808
Validation loss = 0.027901310473680496
Validation loss = 0.029585642740130424
Validation loss = 0.029421919956803322
Validation loss = 0.027880916371941566
Validation loss = 0.03230191767215729
Validation loss = 0.026392163708806038
Validation loss = 0.028253087773919106
Validation loss = 0.03239668160676956
Validation loss = 0.02920658513903618
Validation loss = 0.028884291648864746
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.030055753886699677
Validation loss = 0.033719372004270554
Validation loss = 0.03014512173831463
Validation loss = 0.030026568099856377
Validation loss = 0.028427479788661003
Validation loss = 0.029888607561588287
Validation loss = 0.03493871912360191
Validation loss = 0.03093254752457142
Validation loss = 0.029440628364682198
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.030947454273700714
Validation loss = 0.031933318823575974
Validation loss = 0.027928559109568596
Validation loss = 0.033258214592933655
Validation loss = 0.029276905581355095
Validation loss = 0.029970934614539146
Validation loss = 0.03154150769114494
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03520720452070236
Validation loss = 0.03329585865139961
Validation loss = 0.03028251975774765
Validation loss = 0.030663425102829933
Validation loss = 0.02986273169517517
Validation loss = 0.03245072439312935
Validation loss = 0.030497409403324127
Validation loss = 0.028421450406312943
Validation loss = 0.031413059681653976
Validation loss = 0.03128546103835106
Validation loss = 0.03199578821659088
Validation loss = 0.03144785016775131
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.032143186777830124
Validation loss = 0.030731355771422386
Validation loss = 0.029849767684936523
Validation loss = 0.03177895024418831
Validation loss = 0.03101954609155655
Validation loss = 0.031120063737034798
Validation loss = 0.03104913793504238
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14634146341463414
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14601769911504425
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1456953642384106
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14537444933920704
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14505494505494507
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14473684210526316
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14442013129102846
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14410480349344978
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1437908496732026
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14347826086956522
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14316702819956617
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14285714285714285
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14254859611231102
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14224137931034483
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14193548387096774
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14163090128755365
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14132762312633834
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14102564102564102
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14072494669509594
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14042553191489363
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14012738853503184
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13983050847457626
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13953488372093023
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13924050632911392
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13894736842105262
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00141  |
| Iteration     | 17        |
| MaximumReturn | -0.000857 |
| MinimumReturn | -0.00201  |
| TotalSamples  | 31654     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03256191313266754
Validation loss = 0.029383700340986252
Validation loss = 0.028258316218852997
Validation loss = 0.029333293437957764
Validation loss = 0.037207916378974915
Validation loss = 0.028927506878972054
Validation loss = 0.02939753420650959
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03496372327208519
Validation loss = 0.035616688430309296
Validation loss = 0.03317630663514137
Validation loss = 0.030127277597784996
Validation loss = 0.030320819467306137
Validation loss = 0.03210720047354698
Validation loss = 0.03388955816626549
Validation loss = 0.03211172670125961
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.033492736518383026
Validation loss = 0.03502536937594414
Validation loss = 0.02990126423537731
Validation loss = 0.030007177963852882
Validation loss = 0.029639648273587227
Validation loss = 0.02804974652826786
Validation loss = 0.028260255232453346
Validation loss = 0.028264125809073448
Validation loss = 0.03553781658411026
Validation loss = 0.03004901297390461
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03335634618997574
Validation loss = 0.032738763839006424
Validation loss = 0.03020459972321987
Validation loss = 0.030703693628311157
Validation loss = 0.030132818967103958
Validation loss = 0.0314536914229393
Validation loss = 0.03362418711185455
Validation loss = 0.038343511521816254
Validation loss = 0.03150508552789688
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.032678984105587006
Validation loss = 0.035921793431043625
Validation loss = 0.03207600489258766
Validation loss = 0.031166715547442436
Validation loss = 0.03537341207265854
Validation loss = 0.03309334069490433
Validation loss = 0.03593571484088898
Validation loss = 0.03340507298707962
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13865546218487396
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13836477987421383
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1401673640167364
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13987473903966596
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13958333333333334
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1392931392931393
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13900414937759337
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13871635610766045
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1384297520661157
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1402061855670103
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13991769547325103
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13963039014373715
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13934426229508196
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1411042944785276
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14081632653061224
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14052953156822812
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1402439024390244
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13995943204868155
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1417004048582996
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1414141414141414
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14112903225806453
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14084507042253522
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14056224899598393
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1402805611222445
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.4     |
| Iteration     | 18       |
| MaximumReturn | -0.0295  |
| MinimumReturn | -31.1    |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.027231644839048386
Validation loss = 0.02896297164261341
Validation loss = 0.027353616431355476
Validation loss = 0.029292264953255653
Validation loss = 0.026703208684921265
Validation loss = 0.02796352468430996
Validation loss = 0.03065164014697075
Validation loss = 0.02678774856030941
Validation loss = 0.028954843059182167
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.029915904626250267
Validation loss = 0.026754846796393394
Validation loss = 0.028239447623491287
Validation loss = 0.027025021612644196
Validation loss = 0.03442421555519104
Validation loss = 0.02926594577729702
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.029375802725553513
Validation loss = 0.027214977890253067
Validation loss = 0.02809591218829155
Validation loss = 0.029065456241369247
Validation loss = 0.029013292863965034
Validation loss = 0.02917720377445221
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03277651220560074
Validation loss = 0.030206363648176193
Validation loss = 0.026980068534612656
Validation loss = 0.026924187317490578
Validation loss = 0.029477590695023537
Validation loss = 0.03093620017170906
Validation loss = 0.029385162517428398
Validation loss = 0.02724038064479828
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.028021497651934624
Validation loss = 0.030837101861834526
Validation loss = 0.028637666255235672
Validation loss = 0.029314465820789337
Validation loss = 0.03393646329641342
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13972055888223553
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1394422310756972
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13916500994035785
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1388888888888889
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13861386138613863
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1383399209486166
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13806706114398423
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1377952755905512
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.137524557956778
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13725490196078433
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.136986301369863
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13671875
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1364522417153996
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13618677042801555
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13592233009708737
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13565891472868216
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13539651837524178
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13513513513513514
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1348747591522158
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1346153846153846
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1343570057581574
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13409961685823754
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1338432122370937
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13358778625954199
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13333333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00301 |
| Iteration     | 19       |
| MaximumReturn | -0.00222 |
| MinimumReturn | -0.00465 |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02893855608999729
Validation loss = 0.025353454053401947
Validation loss = 0.026910027489066124
Validation loss = 0.02993045188486576
Validation loss = 0.02688765525817871
Validation loss = 0.0276202242821455
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02830377407371998
Validation loss = 0.03215821832418442
Validation loss = 0.028118738904595375
Validation loss = 0.028605572879314423
Validation loss = 0.02824479714035988
Validation loss = 0.026683714240789413
Validation loss = 0.028716295957565308
Validation loss = 0.028043853119015694
Validation loss = 0.026674989610910416
Validation loss = 0.03148232772946358
Validation loss = 0.027536669746041298
Validation loss = 0.02760516107082367
Validation loss = 0.02922588400542736
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02795475348830223
Validation loss = 0.03328344225883484
Validation loss = 0.03066747635602951
Validation loss = 0.029923835769295692
Validation loss = 0.028310565277934074
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.035423748195171356
Validation loss = 0.03765227273106575
Validation loss = 0.02892163209617138
Validation loss = 0.02851073257625103
Validation loss = 0.027317427098751068
Validation loss = 0.027876587584614754
Validation loss = 0.029310449957847595
Validation loss = 0.02870194986462593
Validation loss = 0.029354482889175415
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03099813126027584
Validation loss = 0.029618045315146446
Validation loss = 0.03365350142121315
Validation loss = 0.02912665531039238
Validation loss = 0.030732719227671623
Validation loss = 0.030062150210142136
Validation loss = 0.02978215552866459
Validation loss = 0.029829617589712143
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13307984790874525
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13282732447817835
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13257575757575757
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1323251417769376
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1320754716981132
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1318267419962335
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13157894736842105
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13133208255159476
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13108614232209737
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1308411214953271
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13059701492537312
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1303538175046555
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13011152416356878
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12987012987012986
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12962962962962962
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12939001848428835
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12915129151291513
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1289134438305709
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12867647058823528
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12844036697247707
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1282051282051282
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12797074954296161
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12773722627737227
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12750455373406194
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12727272727272726
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0176  |
| Iteration     | 20       |
| MaximumReturn | -0.0143  |
| MinimumReturn | -0.0216  |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.028581729158759117
Validation loss = 0.030645625665783882
Validation loss = 0.02708221971988678
Validation loss = 0.025935420766472816
Validation loss = 0.025405580177903175
Validation loss = 0.026080526411533356
Validation loss = 0.026287386193871498
Validation loss = 0.025365181267261505
Validation loss = 0.02960108034312725
Validation loss = 0.026659470051527023
Validation loss = 0.026703959330916405
Validation loss = 0.028285423293709755
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03509937599301338
Validation loss = 0.03186808526515961
Validation loss = 0.02802371047437191
Validation loss = 0.02644124999642372
Validation loss = 0.027090271934866905
Validation loss = 0.02635164000093937
Validation loss = 0.02644018828868866
Validation loss = 0.027864178642630577
Validation loss = 0.026713822036981583
Validation loss = 0.02758420817553997
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025948235765099525
Validation loss = 0.025939369574189186
Validation loss = 0.03332896530628204
Validation loss = 0.02616136707365513
Validation loss = 0.02735915035009384
Validation loss = 0.028228292241692543
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028381291776895523
Validation loss = 0.029643191024661064
Validation loss = 0.028481297194957733
Validation loss = 0.027366627007722855
Validation loss = 0.02751193195581436
Validation loss = 0.02715456858277321
Validation loss = 0.026432188227772713
Validation loss = 0.029853995889425278
Validation loss = 0.02722492441534996
Validation loss = 0.02705574408173561
Validation loss = 0.028638839721679688
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02876325137913227
Validation loss = 0.03263640031218529
Validation loss = 0.02787940762937069
Validation loss = 0.02839554287493229
Validation loss = 0.02772168070077896
Validation loss = 0.029581086710095406
Validation loss = 0.02833646908402443
Validation loss = 0.0293290838599205
Validation loss = 0.02947942353785038
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12704174228675136
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12681159420289856
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12658227848101267
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1263537906137184
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12612612612612611
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12589928057553956
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12567324955116696
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12544802867383512
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1252236135957066
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12477718360071301
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12455516014234876
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12433392539964476
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12411347517730496
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12389380530973451
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12367491166077739
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12345679012345678
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12323943661971831
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12302284710017575
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12280701754385964
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12259194395796848
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12237762237762238
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12216404886561955
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12195121951219512
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12173913043478261
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0386  |
| Iteration     | 21       |
| MaximumReturn | -0.0168  |
| MinimumReturn | -0.201   |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029691580682992935
Validation loss = 0.026274655014276505
Validation loss = 0.026785975322127342
Validation loss = 0.02843841351568699
Validation loss = 0.025711795315146446
Validation loss = 0.026485254988074303
Validation loss = 0.026257093995809555
Validation loss = 0.02404048666357994
Validation loss = 0.03587591275572777
Validation loss = 0.02690598927438259
Validation loss = 0.029764078557491302
Validation loss = 0.02793785370886326
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026125390082597733
Validation loss = 0.02678915299475193
Validation loss = 0.028709355741739273
Validation loss = 0.0275125652551651
Validation loss = 0.02737257443368435
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02671688236296177
Validation loss = 0.02931983768939972
Validation loss = 0.03352121263742447
Validation loss = 0.0400705449283123
Validation loss = 0.02706051804125309
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028757814317941666
Validation loss = 0.032562505453825
Validation loss = 0.027060890570282936
Validation loss = 0.02918230928480625
Validation loss = 0.03302187845110893
Validation loss = 0.026778079569339752
Validation loss = 0.025702491402626038
Validation loss = 0.028380755335092545
Validation loss = 0.026691580191254616
Validation loss = 0.02765592373907566
Validation loss = 0.03244442865252495
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.030088817700743675
Validation loss = 0.02899284102022648
Validation loss = 0.02892334572970867
Validation loss = 0.02813871204853058
Validation loss = 0.026567788794636726
Validation loss = 0.026724370196461678
Validation loss = 0.029637562111020088
Validation loss = 0.028612250462174416
Validation loss = 0.03053136169910431
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1232638888888889
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12478336221837089
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12629757785467127
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12780656303972365
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12931034482758622
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12908777969018934
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.13058419243986255
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1320754716981132
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13184931506849315
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.13333333333333333
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1348122866894198
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1362862010221465
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1360544217687075
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.14431239388794567
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.14576271186440679
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1455160744500846
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.14695945945945946
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.14839797639123103
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.14983164983164984
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15126050420168066
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15268456375838926
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.152428810720268
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15217391304347827
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15358931552587646
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15333333333333332
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.3    |
| Iteration     | 22       |
| MaximumReturn | -0.0635  |
| MinimumReturn | -70.4    |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03500569611787796
Validation loss = 0.032477617263793945
Validation loss = 0.031300436705350876
Validation loss = 0.03007526695728302
Validation loss = 0.03215063363313675
Validation loss = 0.03641367703676224
Validation loss = 0.03326486051082611
Validation loss = 0.029201209545135498
Validation loss = 0.02960827946662903
Validation loss = 0.035662777721881866
Validation loss = 0.029816359281539917
Validation loss = 0.03120540641248226
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03892877697944641
Validation loss = 0.03267473354935646
Validation loss = 0.0349273644387722
Validation loss = 0.03129886835813522
Validation loss = 0.0326903834939003
Validation loss = 0.03171439841389656
Validation loss = 0.032075900584459305
Validation loss = 0.03266207128763199
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.049638740718364716
Validation loss = 0.03131170943379402
Validation loss = 0.03315109387040138
Validation loss = 0.03297960013151169
Validation loss = 0.03565972298383713
Validation loss = 0.033472102135419846
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03393173962831497
Validation loss = 0.038599200546741486
Validation loss = 0.034504808485507965
Validation loss = 0.032568223774433136
Validation loss = 0.029364025220274925
Validation loss = 0.034819990396499634
Validation loss = 0.03387878090143204
Validation loss = 0.031835950911045074
Validation loss = 0.030693506821990013
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03584892302751541
Validation loss = 0.03592818230390549
Validation loss = 0.03415331244468689
Validation loss = 0.0342029444873333
Validation loss = 0.04418833181262016
Validation loss = 0.03454434871673584
Validation loss = 0.03328043222427368
Validation loss = 0.03560469672083855
Validation loss = 0.033139217644929886
Validation loss = 0.03744512051343918
Validation loss = 0.033752940595149994
Validation loss = 0.033444844186306
Validation loss = 0.03735867887735367
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15307820299500832
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15282392026578073
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15257048092868988
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.152317880794702
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15206611570247933
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15181518151815182
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15321252059308071
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15296052631578946
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15270935960591134
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1540983606557377
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15384615384615385
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15359477124183007
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1533442088091354
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15309446254071662
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15284552845528454
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1525974025974026
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1539708265802269
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15372168284789645
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15347334410339256
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1532258064516129
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1529790660225443
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1527331189710611
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15248796147672553
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15224358974358973
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1536
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.45    |
| Iteration     | 23       |
| MaximumReturn | -0.0445  |
| MinimumReturn | -44.7    |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03146477788686752
Validation loss = 0.030901124700903893
Validation loss = 0.03222423046827316
Validation loss = 0.031117742881178856
Validation loss = 0.031173815950751305
Validation loss = 0.035648707300424576
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.032340649515390396
Validation loss = 0.03409846872091293
Validation loss = 0.031911663711071014
Validation loss = 0.02968599833548069
Validation loss = 0.03189356252551079
Validation loss = 0.03463473916053772
Validation loss = 0.031035277992486954
Validation loss = 0.03392172232270241
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.031236082315444946
Validation loss = 0.03334718197584152
Validation loss = 0.031302131712436676
Validation loss = 0.032872963696718216
Validation loss = 0.034489233046770096
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.030367081984877586
Validation loss = 0.030740540474653244
Validation loss = 0.030064860358834267
Validation loss = 0.030475804582238197
Validation loss = 0.029895257204771042
Validation loss = 0.0300733745098114
Validation loss = 0.030528182163834572
Validation loss = 0.030562972649931908
Validation loss = 0.029055286198854446
Validation loss = 0.03147204965353012
Validation loss = 0.03535584360361099
Validation loss = 0.035674192011356354
Validation loss = 0.028813239187002182
Validation loss = 0.029181912541389465
Validation loss = 0.037060488015413284
Validation loss = 0.02964821830391884
Validation loss = 0.028691569343209267
Validation loss = 0.03289955109357834
Validation loss = 0.030573204159736633
Validation loss = 0.029123034328222275
Validation loss = 0.029940957203507423
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03474045544862747
Validation loss = 0.03339911997318268
Validation loss = 0.03315207362174988
Validation loss = 0.03240654617547989
Validation loss = 0.03164279833436012
Validation loss = 0.03323085233569145
Validation loss = 0.031486816704273224
Validation loss = 0.03270236402750015
Validation loss = 0.03125471621751785
Validation loss = 0.031605035066604614
Validation loss = 0.03080996870994568
Validation loss = 0.03325987607240677
Validation loss = 0.03052196279168129
Validation loss = 0.03408776968717575
Validation loss = 0.03249179199337959
Validation loss = 0.030956517904996872
Validation loss = 0.03270281106233597
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15495207667731628
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15629984051036683
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15605095541401273
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15580286168521462
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15714285714285714
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15847860538827258
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.15981012658227847
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15955766192733017
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1608832807570978
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.16377952755905512
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1650943396226415
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16483516483516483
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.164576802507837
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.16588419405320814
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1671875
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1669266770670827
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.16978193146417445
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17107309486780714
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17080745341614906
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.17364341085271318
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17492260061919504
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.1839258114374034
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.18518518518518517
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1864406779661017
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.19076923076923077
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.5    |
| Iteration     | 24       |
| MaximumReturn | -0.125   |
| MinimumReturn | -57.3    |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0325467512011528
Validation loss = 0.030586501583456993
Validation loss = 0.031107475981116295
Validation loss = 0.027180377393960953
Validation loss = 0.03114374540746212
Validation loss = 0.0281534306704998
Validation loss = 0.028903387486934662
Validation loss = 0.030035052448511124
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.032164067029953
Validation loss = 0.0347389280796051
Validation loss = 0.028499603271484375
Validation loss = 0.02599509246647358
Validation loss = 0.02818084880709648
Validation loss = 0.029087433591485023
Validation loss = 0.029601266607642174
Validation loss = 0.027417078614234924
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02930246666073799
Validation loss = 0.029124347493052483
Validation loss = 0.028626132756471634
Validation loss = 0.02666460908949375
Validation loss = 0.02965722233057022
Validation loss = 0.031884606927633286
Validation loss = 0.03791394829750061
Validation loss = 0.026680584996938705
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03234804794192314
Validation loss = 0.028920743614435196
Validation loss = 0.0273897685110569
Validation loss = 0.029977843165397644
Validation loss = 0.033567897975444794
Validation loss = 0.0341334193944931
Validation loss = 0.027563277631998062
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.032715365290641785
Validation loss = 0.03171619772911072
Validation loss = 0.030282866209745407
Validation loss = 0.027530785650014877
Validation loss = 0.02783942222595215
Validation loss = 0.026040557771921158
Validation loss = 0.02845107391476631
Validation loss = 0.02612101100385189
Validation loss = 0.0315115824341774
Validation loss = 0.02848755195736885
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.19047619047619047
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1901840490797546
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18989280245022971
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18960244648318042
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18931297709923664
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18902439024390244
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1887366818873668
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1884498480243161
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18816388467374812
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18787878787878787
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1875945537065053
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18731117824773413
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1870286576168929
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18674698795180722
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18646616541353384
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18618618618618618
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18590704647676162
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18562874251497005
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18535127055306427
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18507462686567164
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18479880774962743
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18452380952380953
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18424962852897475
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18397626112759644
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1837037037037037
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0185  |
| Iteration     | 25       |
| MaximumReturn | -0.0129  |
| MinimumReturn | -0.0232  |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03158826008439064
Validation loss = 0.03469850867986679
Validation loss = 0.032349053770303726
Validation loss = 0.03250369057059288
Validation loss = 0.03193351998925209
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.032209690660238266
Validation loss = 0.03375609219074249
Validation loss = 0.03324097767472267
Validation loss = 0.03293728455901146
Validation loss = 0.031774744391441345
Validation loss = 0.03022642247378826
Validation loss = 0.03519999608397484
Validation loss = 0.0317663811147213
Validation loss = 0.03122699446976185
Validation loss = 0.032609984278678894
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.028942761942744255
Validation loss = 0.031038010492920876
Validation loss = 0.02971644327044487
Validation loss = 0.030803905799984932
Validation loss = 0.033827587962150574
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02980414777994156
Validation loss = 0.030391113832592964
Validation loss = 0.03174656629562378
Validation loss = 0.035412486642599106
Validation loss = 0.034861013293266296
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03096146695315838
Validation loss = 0.03248440846800804
Validation loss = 0.033501628786325455
Validation loss = 0.035294875502586365
Validation loss = 0.030262691900134087
Validation loss = 0.0318821482360363
Validation loss = 0.03132588788866997
Validation loss = 0.031127439811825752
Validation loss = 0.03252151235938072
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1834319526627219
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1831610044313146
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18289085545722714
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18262150220913106
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18235294117647058
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18208516886930984
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18181818181818182
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1815519765739385
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18128654970760233
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.181021897810219
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18075801749271136
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1804949053857351
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.18023255813953487
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1799709724238026
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17971014492753623
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17945007235890015
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1791907514450867
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17893217893217894
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1786743515850144
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17841726618705037
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1781609195402299
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17790530846484937
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17765042979942694
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17739628040057226
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17714285714285713
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0202  |
| Iteration     | 26       |
| MaximumReturn | -0.0108  |
| MinimumReturn | -0.0461  |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02992267534136772
Validation loss = 0.030431589111685753
Validation loss = 0.030795609578490257
Validation loss = 0.03154253587126732
Validation loss = 0.03254265338182449
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.031554047018289566
Validation loss = 0.03237259387969971
Validation loss = 0.03163177892565727
Validation loss = 0.030911821871995926
Validation loss = 0.038267187774181366
Validation loss = 0.032414697110652924
Validation loss = 0.031725578010082245
Validation loss = 0.030539149418473244
Validation loss = 0.032524339854717255
Validation loss = 0.030402284115552902
Validation loss = 0.031140705570578575
Validation loss = 0.03043210320174694
Validation loss = 0.030160581693053246
Validation loss = 0.02998368814587593
Validation loss = 0.029958726838231087
Validation loss = 0.031503431499004364
Validation loss = 0.03270841762423515
Validation loss = 0.03212472051382065
Validation loss = 0.035565335303545
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.032794199883937836
Validation loss = 0.03315144032239914
Validation loss = 0.0299909058958292
Validation loss = 0.030183473601937294
Validation loss = 0.030815355479717255
Validation loss = 0.029838714748620987
Validation loss = 0.03227829188108444
Validation loss = 0.031010549515485764
Validation loss = 0.02986827678978443
Validation loss = 0.032131776213645935
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02919795736670494
Validation loss = 0.03050053119659424
Validation loss = 0.03188977763056755
Validation loss = 0.035585422068834305
Validation loss = 0.029520757496356964
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03089805878698826
Validation loss = 0.033610787242650986
Validation loss = 0.03196018561720848
Validation loss = 0.03162465617060661
Validation loss = 0.03275207802653313
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1768901569186876
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17663817663817663
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1763869132290185
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17613636363636365
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17588652482269504
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17563739376770537
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1753889674681754
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1751412429378531
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17489421720733428
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17464788732394365
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17440225035161744
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17415730337078653
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17391304347826086
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17366946778711484
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17342657342657342
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17318435754189945
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17294281729428174
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17270194986072424
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17246175243393602
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17222222222222222
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17198335644937587
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17174515235457063
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1715076071922545
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1712707182320442
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17103448275862068
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0011   |
| Iteration     | 27        |
| MaximumReturn | -0.000718 |
| MinimumReturn | -0.00167  |
| TotalSamples  | 48314     |
-----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03259442374110222
Validation loss = 0.03151145949959755
Validation loss = 0.02978888712823391
Validation loss = 0.03239603340625763
Validation loss = 0.029974671080708504
Validation loss = 0.03099868632853031
Validation loss = 0.02997053973376751
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03502483293414116
Validation loss = 0.030631033703684807
Validation loss = 0.030100295320153236
Validation loss = 0.03395547717809677
Validation loss = 0.03217761591076851
Validation loss = 0.030004888772964478
Validation loss = 0.028946643695235252
Validation loss = 0.029645420610904694
Validation loss = 0.030438443645834923
Validation loss = 0.029314318671822548
Validation loss = 0.037444110959768295
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03293276205658913
Validation loss = 0.03133461996912956
Validation loss = 0.028548160567879677
Validation loss = 0.029916150495409966
Validation loss = 0.030074650421738625
Validation loss = 0.02987518347799778
Validation loss = 0.030154531821608543
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.030172666534781456
Validation loss = 0.030447514727711678
Validation loss = 0.030309466645121574
Validation loss = 0.029656657949090004
Validation loss = 0.03083009086549282
Validation loss = 0.028781050816178322
Validation loss = 0.031487736850976944
Validation loss = 0.02954012341797352
Validation loss = 0.0405728705227375
Validation loss = 0.030291011556982994
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02976844646036625
Validation loss = 0.03483464568853378
Validation loss = 0.03280913829803467
Validation loss = 0.030645692721009254
Validation loss = 0.03659503161907196
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17079889807162535
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17056396148555708
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.17032967032967034
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1700960219478738
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16986301369863013
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16963064295485636
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16939890710382513
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16916780354706684
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16893732970027248
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16870748299319727
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16847826086956522
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16824966078697423
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16802168021680217
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16779431664411368
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16756756756756758
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16734143049932523
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16711590296495957
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16689098250336473
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16666666666666666
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16644295302013423
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16621983914209115
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16599732262382866
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1657754010695187
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16555407209612816
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16533333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000933 |
| Iteration     | 28        |
| MaximumReturn | -0.000656 |
| MinimumReturn | -0.00137  |
| TotalSamples  | 49980     |
-----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03130758926272392
Validation loss = 0.029414238408207893
Validation loss = 0.0321316234767437
Validation loss = 0.03400394320487976
Validation loss = 0.03283393010497093
Validation loss = 0.029549485072493553
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03569478541612625
Validation loss = 0.04093093052506447
Validation loss = 0.029606861993670464
Validation loss = 0.02903745323419571
Validation loss = 0.02963162027299404
Validation loss = 0.029951265081763268
Validation loss = 0.029236510396003723
Validation loss = 0.02936614118516445
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03473962843418121
Validation loss = 0.03014504164457321
Validation loss = 0.02857947163283825
Validation loss = 0.029851699247956276
Validation loss = 0.03090311586856842
Validation loss = 0.02912472002208233
Validation loss = 0.035559430718421936
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.031210334971547127
Validation loss = 0.030033690854907036
Validation loss = 0.030456749722361565
Validation loss = 0.02898584119975567
Validation loss = 0.030802089720964432
Validation loss = 0.02971971035003662
Validation loss = 0.029489466920495033
Validation loss = 0.030026191845536232
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.029103051871061325
Validation loss = 0.029673445969820023
Validation loss = 0.030208008363842964
Validation loss = 0.029512012377381325
Validation loss = 0.030033402144908905
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16511318242343542
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16489361702127658
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1646746347941567
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16445623342175067
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16423841059602648
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.164021164021164
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16380449141347425
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16358839050131926
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16337285902503293
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1631578947368421
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16294349540078842
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16272965879265092
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16251638269986893
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16230366492146597
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16209150326797386
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1618798955613577
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16166883963494133
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16145833333333334
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16124837451235371
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16103896103896104
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1608300907911803
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16062176165803108
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16041397153945666
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16020671834625322
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.16
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00105  |
| Iteration     | 29        |
| MaximumReturn | -0.000768 |
| MinimumReturn | -0.00171  |
| TotalSamples  | 51646     |
-----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03022078052163124
Validation loss = 0.029218334704637527
Validation loss = 0.03154639154672623
Validation loss = 0.03043636493384838
Validation loss = 0.03300309181213379
Validation loss = 0.03204931318759918
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03161131218075752
Validation loss = 0.03267009183764458
Validation loss = 0.029867257922887802
Validation loss = 0.030031679198145866
Validation loss = 0.031029870733618736
Validation loss = 0.03117801435291767
Validation loss = 0.03167957440018654
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.036641355603933334
Validation loss = 0.030169134959578514
Validation loss = 0.029504649341106415
Validation loss = 0.02930392697453499
Validation loss = 0.034809935837984085
Validation loss = 0.033247705549001694
Validation loss = 0.028831465169787407
Validation loss = 0.030075356364250183
Validation loss = 0.0285937562584877
Validation loss = 0.033120013773441315
Validation loss = 0.029797933995723724
Validation loss = 0.029911093413829803
Validation loss = 0.03194492682814598
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03582344949245453
Validation loss = 0.03541494533419609
Validation loss = 0.030653005465865135
Validation loss = 0.030609197914600372
Validation loss = 0.029839321970939636
Validation loss = 0.029961049556732178
Validation loss = 0.03032294102013111
Validation loss = 0.029694676399230957
Validation loss = 0.028628528118133545
Validation loss = 0.029690774157643318
Validation loss = 0.029546702280640602
Validation loss = 0.030490241944789886
Validation loss = 0.02936696447432041
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03160380199551582
Validation loss = 0.03583158552646637
Validation loss = 0.02954128570854664
Validation loss = 0.030980154871940613
Validation loss = 0.03146879002451897
Validation loss = 0.030978111550211906
Validation loss = 0.04116718843579292
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15979381443298968
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15958815958815958
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15938303341902313
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15917843388960207
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15897435897435896
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1587708066581306
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1585677749360614
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1583652618135377
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15816326530612246
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15796178343949044
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15776081424936386
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15756035578144853
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15736040609137056
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15716096324461343
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1569620253164557
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15676359039190899
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15656565656565657
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15636822194199243
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1561712846347607
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1559748427672956
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15577889447236182
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15558343789209536
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15538847117794485
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15519399249061328
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.155
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0342  |
| Iteration     | 30       |
| MaximumReturn | -0.019   |
| MinimumReturn | -0.096   |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.032898858189582825
Validation loss = 0.03143614903092384
Validation loss = 0.028568435460329056
Validation loss = 0.02875358611345291
Validation loss = 0.02811889722943306
Validation loss = 0.028591511771082878
Validation loss = 0.02823444828391075
Validation loss = 0.0314420610666275
Validation loss = 0.031449224799871445
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03257779777050018
Validation loss = 0.02947000414133072
Validation loss = 0.028267236426472664
Validation loss = 0.03005150891840458
Validation loss = 0.029462097212672234
Validation loss = 0.0316159762442112
Validation loss = 0.02912694588303566
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03100140579044819
Validation loss = 0.028475217521190643
Validation loss = 0.03653741627931595
Validation loss = 0.030223358422517776
Validation loss = 0.028932081535458565
Validation loss = 0.028694191947579384
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028834810480475426
Validation loss = 0.029656486585736275
Validation loss = 0.029733462259173393
Validation loss = 0.03155205771327019
Validation loss = 0.02895570546388626
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.033926431089639664
Validation loss = 0.03272667154669762
Validation loss = 0.028521357104182243
Validation loss = 0.029093794524669647
Validation loss = 0.027936965227127075
Validation loss = 0.031122831627726555
Validation loss = 0.03045390360057354
Validation loss = 0.029163893312215805
Validation loss = 0.040289439260959625
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15480649188514356
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1546134663341646
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15442092154420922
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15422885572139303
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15403726708074533
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15384615384615385
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1536555142503098
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15346534653465346
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15327564894932014
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15308641975308643
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15289765721331688
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15270935960591134
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15252152521525214
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15233415233415235
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1521472392638037
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15196078431372548
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15177478580171358
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15158924205378974
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1514041514041514
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15121951219512195
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1510353227771011
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15085158150851583
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15066828675577157
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15048543689320387
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1503030303030303
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00183 |
| Iteration     | 31       |
| MaximumReturn | -0.00138 |
| MinimumReturn | -0.0025  |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029156669974327087
Validation loss = 0.03048817068338394
Validation loss = 0.028095651417970657
Validation loss = 0.03023463673889637
Validation loss = 0.03220571205019951
Validation loss = 0.029410187155008316
Validation loss = 0.02880554087460041
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.029582126066088676
Validation loss = 0.030647067353129387
Validation loss = 0.031583353877067566
Validation loss = 0.029950959607958794
Validation loss = 0.02972777746617794
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02896455116569996
Validation loss = 0.02900116890668869
Validation loss = 0.028852466493844986
Validation loss = 0.026877883821725845
Validation loss = 0.02836594544351101
Validation loss = 0.03416300565004349
Validation loss = 0.027881599962711334
Validation loss = 0.027603797614574432
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028612693771719933
Validation loss = 0.03177835792303085
Validation loss = 0.02822950668632984
Validation loss = 0.02925957553088665
Validation loss = 0.028798578307032585
Validation loss = 0.028907790780067444
Validation loss = 0.03348896652460098
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.034655313938856125
Validation loss = 0.029241438955068588
Validation loss = 0.02981664426624775
Validation loss = 0.03071204386651516
Validation loss = 0.029407083988189697
Validation loss = 0.0283004529774189
Validation loss = 0.029782060533761978
Validation loss = 0.030411435291171074
Validation loss = 0.03845467418432236
Validation loss = 0.030672462657094002
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.15012106537530268
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14993954050785974
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1497584541062802
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14957780458383596
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1493975903614458
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14921780986762936
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14903846153846154
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.148859543817527
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1486810551558753
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14850299401197606
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14832535885167464
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14814814814814814
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14797136038186157
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14779499404052443
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14761904761904762
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14744351961950058
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14726840855106887
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14709371293001186
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14691943127962084
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1467455621301775
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14657210401891252
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14639905548996457
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14622641509433962
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14605418138987045
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14588235294117646
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00105  |
| Iteration     | 32        |
| MaximumReturn | -0.000733 |
| MinimumReturn | -0.00176  |
| TotalSamples  | 56644     |
-----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.031733058393001556
Validation loss = 0.02800413966178894
Validation loss = 0.028840960934758186
Validation loss = 0.029953446239233017
Validation loss = 0.030159637331962585
Validation loss = 0.029865367338061333
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.028372760862112045
Validation loss = 0.028802987188100815
Validation loss = 0.0308549702167511
Validation loss = 0.03383671119809151
Validation loss = 0.02918964810669422
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03077094629406929
Validation loss = 0.028053786605596542
Validation loss = 0.02710113860666752
Validation loss = 0.029339522123336792
Validation loss = 0.03038233518600464
Validation loss = 0.026842910796403885
Validation loss = 0.03330304101109505
Validation loss = 0.03349229320883751
Validation loss = 0.027480438351631165
Validation loss = 0.02723805420100689
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03883253410458565
Validation loss = 0.029388781636953354
Validation loss = 0.028493190184235573
Validation loss = 0.028985613957047462
Validation loss = 0.028384534642100334
Validation loss = 0.02853180095553398
Validation loss = 0.028271188959479332
Validation loss = 0.028965996578335762
Validation loss = 0.030133545398712158
Validation loss = 0.03762999176979065
Validation loss = 0.02855934016406536
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02935958467423916
Validation loss = 0.03291492909193039
Validation loss = 0.030194850638508797
Validation loss = 0.029586970806121826
Validation loss = 0.028635937720537186
Validation loss = 0.02857552282512188
Validation loss = 0.028714437037706375
Validation loss = 0.029067302122712135
Validation loss = 0.028896326199173927
Validation loss = 0.02875461056828499
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14571092831962398
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14553990610328638
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14536928487690504
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1451990632318501
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14502923976608187
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14485981308411214
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14469078179696615
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1445221445221445
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14435389988358557
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14418604651162792
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1440185830429733
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14385150812064965
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1436848203939745
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14351851851851852
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14335260115606938
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14318706697459585
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14302191464821223
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14285714285714285
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.142692750287687
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1425287356321839
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1423650975889782
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14220183486238533
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1420389461626575
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14187643020594964
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1417142857142857
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00942 |
| Iteration     | 33       |
| MaximumReturn | -0.00596 |
| MinimumReturn | -0.0138  |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029374711215496063
Validation loss = 0.028132228180766106
Validation loss = 0.028937844559550285
Validation loss = 0.028172578662633896
Validation loss = 0.02807912789285183
Validation loss = 0.026937291026115417
Validation loss = 0.029368644580245018
Validation loss = 0.04142705351114273
Validation loss = 0.02964271418750286
Validation loss = 0.02810932882130146
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.028129788115620613
Validation loss = 0.026940086856484413
Validation loss = 0.02909776009619236
Validation loss = 0.02795501984655857
Validation loss = 0.027444258332252502
Validation loss = 0.028296208009123802
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02584492415189743
Validation loss = 0.026165949180722237
Validation loss = 0.02761986292898655
Validation loss = 0.03181543946266174
Validation loss = 0.028383975848555565
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027637209743261337
Validation loss = 0.027713004499673843
Validation loss = 0.02824985794723034
Validation loss = 0.02799585834145546
Validation loss = 0.028023920953273773
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02962951548397541
Validation loss = 0.029075268656015396
Validation loss = 0.027977705001831055
Validation loss = 0.03899028152227402
Validation loss = 0.033537592738866806
Validation loss = 0.029781758785247803
Validation loss = 0.02808907814323902
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1415525114155251
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14139110604332952
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14123006833712984
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1410693970420933
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1409090909090909
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14074914869466515
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14058956916099774
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1404303510758777
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.14027149321266968
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1401129943502825
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1399548532731377
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13979706877113868
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13963963963963963
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13948256467941508
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1393258426966292
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13916947250280584
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13901345291479822
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13885778275475924
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13870246085011187
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13854748603351955
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13839285714285715
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13823857302118173
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13808463251670378
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13793103448275862
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13777777777777778
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00117  |
| Iteration     | 34        |
| MaximumReturn | -0.000702 |
| MinimumReturn | -0.002    |
| TotalSamples  | 59976     |
-----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03264901041984558
Validation loss = 0.0282851941883564
Validation loss = 0.02719208039343357
Validation loss = 0.025850705802440643
Validation loss = 0.028004834428429604
Validation loss = 0.0306499432772398
Validation loss = 0.027467073872685432
Validation loss = 0.027943473309278488
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.029572751373052597
Validation loss = 0.02804306335747242
Validation loss = 0.027283618226647377
Validation loss = 0.027608897536993027
Validation loss = 0.027280062437057495
Validation loss = 0.02699265256524086
Validation loss = 0.027299366891384125
Validation loss = 0.028137005865573883
Validation loss = 0.026927847415208817
Validation loss = 0.02810928411781788
Validation loss = 0.027321433648467064
Validation loss = 0.030395589768886566
Validation loss = 0.03208915516734123
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02813899889588356
Validation loss = 0.02679183892905712
Validation loss = 0.028625642880797386
Validation loss = 0.025916414335370064
Validation loss = 0.028578322380781174
Validation loss = 0.02750377543270588
Validation loss = 0.029700107872486115
Validation loss = 0.026877278462052345
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02795436419546604
Validation loss = 0.029991943389177322
Validation loss = 0.027248701080679893
Validation loss = 0.03298565000295639
Validation loss = 0.02901155687868595
Validation loss = 0.02781798504292965
Validation loss = 0.02917836420238018
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.027665285393595695
Validation loss = 0.027257535606622696
Validation loss = 0.028103752061724663
Validation loss = 0.03086863085627556
Validation loss = 0.028731169179081917
Validation loss = 0.03448629006743431
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13762486126526083
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13747228381374724
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13732004429678848
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13716814159292035
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13701657458563535
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1368653421633554
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13671444321940462
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13656387665198239
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13641364136413642
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13626373626373625
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13611416026344675
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13596491228070176
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13581599123767799
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13566739606126915
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1355191256830601
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13537117903930132
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13522355507088332
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13507625272331156
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13492927094668117
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13478260869565217
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13463626492942454
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13449023861171366
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13434452871072589
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1341991341991342
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13405405405405404
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00549 |
| Iteration     | 35       |
| MaximumReturn | -0.00338 |
| MinimumReturn | -0.00719 |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.028855716809630394
Validation loss = 0.027154147624969482
Validation loss = 0.027523184195160866
Validation loss = 0.035025861114263535
Validation loss = 0.028746740892529488
Validation loss = 0.0278695710003376
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02971164509654045
Validation loss = 0.03030017577111721
Validation loss = 0.02778131701052189
Validation loss = 0.028045499697327614
Validation loss = 0.02788540907204151
Validation loss = 0.026769690215587616
Validation loss = 0.027160944417119026
Validation loss = 0.028905624523758888
Validation loss = 0.02898390218615532
Validation loss = 0.028045324608683586
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.027047056704759598
Validation loss = 0.027331404387950897
Validation loss = 0.028082169592380524
Validation loss = 0.030474871397018433
Validation loss = 0.028439249843358994
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028425294905900955
Validation loss = 0.027208678424358368
Validation loss = 0.02951657585799694
Validation loss = 0.02755550853908062
Validation loss = 0.028384506702423096
Validation loss = 0.03018428012728691
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.033129457384347916
Validation loss = 0.030164668336510658
Validation loss = 0.027299629524350166
Validation loss = 0.02788875810801983
Validation loss = 0.02839176543056965
Validation loss = 0.028317755088210106
Validation loss = 0.02920748107135296
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13390928725701945
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.133764832793959
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1336206896551724
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13347685683530677
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13333333333333333
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13319011815252416
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13304721030042918
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13290460878885316
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13276231263383298
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13262032085561498
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13247863247863248
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13233724653148346
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13219616204690832
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13205537806176784
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13191489361702127
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13177470775770456
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1316348195329087
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13149522799575822
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13135593220338984
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1312169312169312
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13107822410147993
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13093980992608237
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1308016877637131
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13066385669125394
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13052631578947368
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0163  |
| Iteration     | 36       |
| MaximumReturn | -0.00812 |
| MinimumReturn | -0.0247  |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.027202587574720383
Validation loss = 0.02687455713748932
Validation loss = 0.027004247531294823
Validation loss = 0.026677364483475685
Validation loss = 0.028901146724820137
Validation loss = 0.027981851249933243
Validation loss = 0.02683209255337715
Validation loss = 0.028464892879128456
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02743947133421898
Validation loss = 0.02660514786839485
Validation loss = 0.03383072838187218
Validation loss = 0.027771400287747383
Validation loss = 0.024970779195427895
Validation loss = 0.027300439774990082
Validation loss = 0.028087178245186806
Validation loss = 0.026252729818224907
Validation loss = 0.035731855779886246
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03172009438276291
Validation loss = 0.027628174051642418
Validation loss = 0.025657599791884422
Validation loss = 0.027639437466859818
Validation loss = 0.029529912397265434
Validation loss = 0.025371115654706955
Validation loss = 0.026441173627972603
Validation loss = 0.02680078335106373
Validation loss = 0.026953857392072678
Validation loss = 0.027827316895127296
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02721736766397953
Validation loss = 0.02852742187678814
Validation loss = 0.027547407895326614
Validation loss = 0.028680987656116486
Validation loss = 0.02782847173511982
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.027875319123268127
Validation loss = 0.027391714975237846
Validation loss = 0.026911020278930664
Validation loss = 0.028044912964105606
Validation loss = 0.029504695907235146
Validation loss = 0.027949750423431396
Validation loss = 0.03046342544257641
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13038906414300735
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13025210084033614
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13011542497376705
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.129979035639413
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12984293193717278
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1297071129707113
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12957157784743992
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12943632567849686
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12930135557872785
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12916666666666668
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12903225806451613
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1288981288981289
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12876427829698858
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12863070539419086
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12849740932642487
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12836438923395446
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1282316442605998
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.128099173553719
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12796697626418987
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12783505154639174
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12770339855818744
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12757201646090535
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12744090441932168
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1273100616016427
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12717948717948718
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00509 |
| Iteration     | 37       |
| MaximumReturn | -0.00395 |
| MinimumReturn | -0.00665 |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.027506889775395393
Validation loss = 0.026597648859024048
Validation loss = 0.029666757211089134
Validation loss = 0.028055667877197266
Validation loss = 0.027226630598306656
Validation loss = 0.026443857699632645
Validation loss = 0.029040053486824036
Validation loss = 0.0316237136721611
Validation loss = 0.029349859803915024
Validation loss = 0.02708682417869568
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02707243338227272
Validation loss = 0.026905816048383713
Validation loss = 0.02464265376329422
Validation loss = 0.025344662368297577
Validation loss = 0.02578260749578476
Validation loss = 0.027296751737594604
Validation loss = 0.02586953714489937
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.028625598177313805
Validation loss = 0.027115199714899063
Validation loss = 0.025456702336668968
Validation loss = 0.02594141662120819
Validation loss = 0.027706243097782135
Validation loss = 0.027633588761091232
Validation loss = 0.027590904384851456
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02763824164867401
Validation loss = 0.03414201736450195
Validation loss = 0.02747969701886177
Validation loss = 0.02937241457402706
Validation loss = 0.028029216453433037
Validation loss = 0.026897264644503593
Validation loss = 0.027551282197237015
Validation loss = 0.026022793725132942
Validation loss = 0.028182968497276306
Validation loss = 0.026227500289678574
Validation loss = 0.029127534478902817
Validation loss = 0.02797836996614933
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02822018787264824
Validation loss = 0.026406634598970413
Validation loss = 0.02654751017689705
Validation loss = 0.02766416408121586
Validation loss = 0.027077507227659225
Validation loss = 0.026853863149881363
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12704918032786885
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1269191402251791
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12678936605316973
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12665985699693566
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12653061224489795
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12640163098878696
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12627291242362526
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12614445574771108
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12601626016260162
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12588832487309645
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1257606490872211
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12563323201621074
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12550607287449392
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12537917087967643
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12525252525252525
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12512613521695257
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12487411883182276
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12474849094567404
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12462311557788945
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12449799196787148
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12437311935807423
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12424849699398798
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12412412412412413
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.124
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0034  |
| Iteration     | 38       |
| MaximumReturn | -0.00244 |
| MinimumReturn | -0.00492 |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02606610395014286
Validation loss = 0.02583131194114685
Validation loss = 0.025622088462114334
Validation loss = 0.026327868923544884
Validation loss = 0.026067305356264114
Validation loss = 0.02599358931183815
Validation loss = 0.030247770249843597
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.028823290020227432
Validation loss = 0.030247481539845467
Validation loss = 0.025602828711271286
Validation loss = 0.025621891021728516
Validation loss = 0.03254229947924614
Validation loss = 0.02684704400599003
Validation loss = 0.029155241325497627
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.028799494728446007
Validation loss = 0.028264988213777542
Validation loss = 0.02622257173061371
Validation loss = 0.026039019227027893
Validation loss = 0.025295842438936234
Validation loss = 0.026336640119552612
Validation loss = 0.02855779230594635
Validation loss = 0.029383551329374313
Validation loss = 0.026759950444102287
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026464303955435753
Validation loss = 0.036066122353076935
Validation loss = 0.028013799339532852
Validation loss = 0.02615722268819809
Validation loss = 0.026321612298488617
Validation loss = 0.026857826858758926
Validation loss = 0.0253833569586277
Validation loss = 0.027988268062472343
Validation loss = 0.025831421837210655
Validation loss = 0.02708454616367817
Validation loss = 0.026948455721139908
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02974451705813408
Validation loss = 0.027494890615344048
Validation loss = 0.028392093256115913
Validation loss = 0.025503255426883698
Validation loss = 0.03254345431923866
Validation loss = 0.029902081936597824
Validation loss = 0.029743796214461327
Validation loss = 0.0256902314722538
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12387612387612387
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12375249500998003
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12362911266201396
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12350597609561753
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12338308457711443
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12326043737574553
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12313803376365443
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12301587301587301
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12289395441030723
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12277227722772277
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12265084075173097
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1225296442687747
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12240868706811452
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1222879684418146
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12216748768472907
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1220472440944882
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12192723697148476
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12180746561886051
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12168792934249265
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12156862745098039
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12144955925563174
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12133072407045009
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12121212121212122
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12109375
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12097560975609756
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00117  |
| Iteration     | 39        |
| MaximumReturn | -0.000665 |
| MinimumReturn | -0.00177  |
| TotalSamples  | 68306     |
-----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02956189215183258
Validation loss = 0.025875400751829147
Validation loss = 0.026090525090694427
Validation loss = 0.026784593239426613
Validation loss = 0.026116156950592995
Validation loss = 0.0267449002712965
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02629067748785019
Validation loss = 0.02653300017118454
Validation loss = 0.027970781549811363
Validation loss = 0.027201922610402107
Validation loss = 0.026012668386101723
Validation loss = 0.024952542036771774
Validation loss = 0.025383181869983673
Validation loss = 0.02845441922545433
Validation loss = 0.02691713161766529
Validation loss = 0.02661001868546009
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.027055976912379265
Validation loss = 0.026858415454626083
Validation loss = 0.026103461161255836
Validation loss = 0.026526354253292084
Validation loss = 0.026315517723560333
Validation loss = 0.026713142171502113
Validation loss = 0.03086433745920658
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026428837329149246
Validation loss = 0.026868442073464394
Validation loss = 0.026758844032883644
Validation loss = 0.026793844997882843
Validation loss = 0.02779337950050831
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026682104915380478
Validation loss = 0.025715403258800507
Validation loss = 0.026594607159495354
Validation loss = 0.027524620294570923
Validation loss = 0.03414918854832649
Validation loss = 0.026386922225356102
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12183235867446393
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12268743914313535
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.122568093385214
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.12342079689018465
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.1262135922330097
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12609117361784675
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12596899224806202
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12584704743465633
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12572533849129594
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12560386473429952
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12548262548262548
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1253616200578592
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1252408477842004
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12512030798845045
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12487992315081652
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12476007677543186
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12464046021093
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12452107279693486
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12440191387559808
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.124282982791587
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.12607449856733524
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12595419847328243
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.1306005719733079
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13047619047619047
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.11    |
| Iteration     | 40       |
| MaximumReturn | -0.0235  |
| MinimumReturn | -19.5    |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03610823675990105
Validation loss = 0.028284020721912384
Validation loss = 0.027234172448515892
Validation loss = 0.02692219614982605
Validation loss = 0.026658151298761368
Validation loss = 0.026551418006420135
Validation loss = 0.027172166854143143
Validation loss = 0.027943963184952736
Validation loss = 0.031870417296886444
Validation loss = 0.02782854624092579
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026377970352768898
Validation loss = 0.025905976071953773
Validation loss = 0.02614487148821354
Validation loss = 0.02700844034552574
Validation loss = 0.025948626920580864
Validation loss = 0.02641422674059868
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.028536390513181686
Validation loss = 0.027331307530403137
Validation loss = 0.025739986449480057
Validation loss = 0.026132384315133095
Validation loss = 0.028120556846261024
Validation loss = 0.02606472373008728
Validation loss = 0.025588857010006905
Validation loss = 0.02745385654270649
Validation loss = 0.025172842666506767
Validation loss = 0.026010463014245033
Validation loss = 0.025948313996195793
Validation loss = 0.026679521426558495
Validation loss = 0.02756343223154545
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027409574016928673
Validation loss = 0.026730265468358994
Validation loss = 0.03325073793530464
Validation loss = 0.026542747393250465
Validation loss = 0.027232185006141663
Validation loss = 0.026611708104610443
Validation loss = 0.02716425061225891
Validation loss = 0.026162192225456238
Validation loss = 0.026505721732974052
Validation loss = 0.02729848586022854
Validation loss = 0.026272648945450783
Validation loss = 0.026199106127023697
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02996949478983879
Validation loss = 0.02772802859544754
Validation loss = 0.02663400210440159
Validation loss = 0.025839755311608315
Validation loss = 0.027356673032045364
Validation loss = 0.030168894678354263
Validation loss = 0.027671560645103455
Validation loss = 0.025956930592656136
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13035204567078973
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13022813688212928
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.13010446343779677
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12998102466793168
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12985781990521328
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12973484848484848
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12961210974456008
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12948960302457466
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12936732766761094
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12924528301886792
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1291234684260132
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12900188323917136
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1288805268109125
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1287593984962406
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12863849765258217
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12851782363977485
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12839737582005623
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12827715355805244
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12815715622076707
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1280373831775701
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12791783380018673
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12779850746268656
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1276794035414725
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12756052141527002
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12744186046511627
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00576 |
| Iteration     | 41       |
| MaximumReturn | -0.00322 |
| MinimumReturn | -0.00807 |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.027048924937844276
Validation loss = 0.025669235736131668
Validation loss = 0.029611798003315926
Validation loss = 0.0269921962171793
Validation loss = 0.026878366246819496
Validation loss = 0.027748309075832367
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02663607709109783
Validation loss = 0.026749400421977043
Validation loss = 0.031136304140090942
Validation loss = 0.026538994163274765
Validation loss = 0.02664792165160179
Validation loss = 0.025621401146054268
Validation loss = 0.026314707472920418
Validation loss = 0.02744118869304657
Validation loss = 0.031176185235381126
Validation loss = 0.028570011258125305
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03163500130176544
Validation loss = 0.025649897754192352
Validation loss = 0.029252545908093452
Validation loss = 0.02663968876004219
Validation loss = 0.026311637833714485
Validation loss = 0.027684442698955536
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02575266920030117
Validation loss = 0.026540806517004967
Validation loss = 0.026155104860663414
Validation loss = 0.027721593156456947
Validation loss = 0.034498464316129684
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026320388540625572
Validation loss = 0.028414182364940643
Validation loss = 0.02687249518930912
Validation loss = 0.029217971488833427
Validation loss = 0.02650543861091137
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12732342007434944
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12720519962859797
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12708719851576994
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12696941612604262
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12685185185185185
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1267345050878816
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1266173752310536
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1265004616805171
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12638376383763839
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12626728110599078
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1261510128913444
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12603495860165592
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12591911764705882
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1258034894398531
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12568807339449542
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12557286892758937
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12545787545787546
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1253430924062214
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12522851919561243
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12511415525114156
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.125
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12488605287146765
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12477231329690346
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12465878070973613
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12454545454545454
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00133  |
| Iteration     | 42        |
| MaximumReturn | -0.000817 |
| MinimumReturn | -0.00194  |
| TotalSamples  | 73304     |
-----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02708037942647934
Validation loss = 0.028274592012166977
Validation loss = 0.026920260861516
Validation loss = 0.02762211672961712
Validation loss = 0.026781346648931503
Validation loss = 0.026877691969275475
Validation loss = 0.0365331806242466
Validation loss = 0.027320045977830887
Validation loss = 0.02778698317706585
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.029062939807772636
Validation loss = 0.02902253530919552
Validation loss = 0.026478581130504608
Validation loss = 0.025468941777944565
Validation loss = 0.026975305750966072
Validation loss = 0.025026656687259674
Validation loss = 0.02683980017900467
Validation loss = 0.02629552036523819
Validation loss = 0.026764631271362305
Validation loss = 0.025738604366779327
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.026253195479512215
Validation loss = 0.026730723679065704
Validation loss = 0.02752450294792652
Validation loss = 0.026688365265727043
Validation loss = 0.02741716243326664
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.032323483377695084
Validation loss = 0.029648294672369957
Validation loss = 0.026815664023160934
Validation loss = 0.02720080129802227
Validation loss = 0.025524096563458443
Validation loss = 0.026090774685144424
Validation loss = 0.029330607503652573
Validation loss = 0.026556607335805893
Validation loss = 0.02554774284362793
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02613835409283638
Validation loss = 0.02675870805978775
Validation loss = 0.025898657739162445
Validation loss = 0.027670232579112053
Validation loss = 0.026036901399493217
Validation loss = 0.02696191519498825
Validation loss = 0.028540654107928276
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12443233424159855
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12431941923774954
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1242067089755213
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12409420289855072
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12398190045248869
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12386980108499096
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12375790424570912
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12364620938628158
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12353471596032461
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12342342342342343
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12331233123312331
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12320143884892086
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12309074573225516
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1229802513464991
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12286995515695068
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12275985663082438
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12264995523724262
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12254025044722719
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1224307417336908
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12232142857142857
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12221231043710973
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12210338680926916
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1219946571682992
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12188612099644128
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12177777777777778
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00327 |
| Iteration     | 43       |
| MaximumReturn | -0.00222 |
| MinimumReturn | -0.00408 |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.027434878051280975
Validation loss = 0.02669653296470642
Validation loss = 0.027365224435925484
Validation loss = 0.02712550386786461
Validation loss = 0.02650231122970581
Validation loss = 0.02631809003651142
Validation loss = 0.026487721130251884
Validation loss = 0.02892444282770157
Validation loss = 0.02609938383102417
Validation loss = 0.026436548680067062
Validation loss = 0.026505300775170326
Validation loss = 0.02616560086607933
Validation loss = 0.028498874977231026
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02656577154994011
Validation loss = 0.02596423588693142
Validation loss = 0.025883350521326065
Validation loss = 0.026147015392780304
Validation loss = 0.025019090622663498
Validation loss = 0.025957342237234116
Validation loss = 0.025164823979139328
Validation loss = 0.02642623707652092
Validation loss = 0.03068849816918373
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.027340753003954887
Validation loss = 0.025890382006764412
Validation loss = 0.026103751733899117
Validation loss = 0.030423853546380997
Validation loss = 0.025720329955220222
Validation loss = 0.0326177179813385
Validation loss = 0.026684032753109932
Validation loss = 0.02490813098847866
Validation loss = 0.026927191764116287
Validation loss = 0.02740401215851307
Validation loss = 0.025851314887404442
Validation loss = 0.028772013261914253
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02649739943444729
Validation loss = 0.02618747390806675
Validation loss = 0.026050880551338196
Validation loss = 0.027022521942853928
Validation loss = 0.026126530021429062
Validation loss = 0.02728542685508728
Validation loss = 0.026954909786581993
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026519309729337692
Validation loss = 0.02855592779815197
Validation loss = 0.031302548944950104
Validation loss = 0.02635740302503109
Validation loss = 0.026548506692051888
Validation loss = 0.02693323604762554
Validation loss = 0.028262870386242867
Validation loss = 0.027710946276783943
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1216696269982238
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12156166814551908
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12145390070921985
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12134632418069087
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12123893805309735
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.121131741821397
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12102473498233215
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1209179170344219
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12081128747795414
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12070484581497798
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12059859154929578
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12049252418645559
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12038664323374342
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1202809482001756
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.12017543859649123
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1200701139351446
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11996497373029773
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11986001749781278
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11975524475524475
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11965065502183406
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11954624781849912
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11944202266782912
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11933797909407666
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11923411662315056
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1191304347826087
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00112  |
| Iteration     | 44        |
| MaximumReturn | -0.000773 |
| MinimumReturn | -0.002    |
| TotalSamples  | 76636     |
-----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026897145435214043
Validation loss = 0.026457127183675766
Validation loss = 0.025267908349633217
Validation loss = 0.026156282052397728
Validation loss = 0.027069078758358955
Validation loss = 0.028664372861385345
Validation loss = 0.02733207680284977
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.030812304466962814
Validation loss = 0.03374810144305229
Validation loss = 0.027422014623880386
Validation loss = 0.024967776611447334
Validation loss = 0.026626499369740486
Validation loss = 0.026158228516578674
Validation loss = 0.025394899770617485
Validation loss = 0.024991219863295555
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02615470439195633
Validation loss = 0.025588760152459145
Validation loss = 0.030223814770579338
Validation loss = 0.02644338645040989
Validation loss = 0.02648543380200863
Validation loss = 0.026058346033096313
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028281770646572113
Validation loss = 0.02743908204138279
Validation loss = 0.02675641141831875
Validation loss = 0.025441940873861313
Validation loss = 0.02883514203131199
Validation loss = 0.02649020217359066
Validation loss = 0.027160901576280594
Validation loss = 0.025928683578968048
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026826733723282814
Validation loss = 0.02615312859416008
Validation loss = 0.02576625719666481
Validation loss = 0.02573576010763645
Validation loss = 0.02727869339287281
Validation loss = 0.02668502740561962
Validation loss = 0.026406893506646156
Validation loss = 0.027354594320058823
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11902693310165074
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1189236111111111
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11882046834345186
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11871750433275563
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11861471861471862
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1185121107266436
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11840968020743302
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11830742659758203
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1182053494391717
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11810344827586207
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11800172265288544
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11790017211703958
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.117798796216681
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11769759450171821
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11759656652360514
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11749571183533447
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11739502999143102
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1172945205479452
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11719418306244654
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1170940170940171
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11699402220324509
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11689419795221843
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11679454390451834
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11669505962521294
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11659574468085106
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00407 |
| Iteration     | 45       |
| MaximumReturn | -0.00288 |
| MinimumReturn | -0.00554 |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.028132256120443344
Validation loss = 0.025110727176070213
Validation loss = 0.02669523097574711
Validation loss = 0.028365517035126686
Validation loss = 0.028911808505654335
Validation loss = 0.02792901173233986
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026955118402838707
Validation loss = 0.02984435483813286
Validation loss = 0.0243667159229517
Validation loss = 0.024256285279989243
Validation loss = 0.02477182447910309
Validation loss = 0.026245737448334694
Validation loss = 0.025085631757974625
Validation loss = 0.025595903396606445
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02550564706325531
Validation loss = 0.02536487579345703
Validation loss = 0.025720924139022827
Validation loss = 0.025768114253878593
Validation loss = 0.02640514262020588
Validation loss = 0.02520792745053768
Validation loss = 0.027147850021719933
Validation loss = 0.02610192820429802
Validation loss = 0.025699978694319725
Validation loss = 0.027190083637833595
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024965180084109306
Validation loss = 0.025674544274806976
Validation loss = 0.024956688284873962
Validation loss = 0.025357583537697792
Validation loss = 0.03376780077815056
Validation loss = 0.026904186233878136
Validation loss = 0.026562048122286797
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025263754650950432
Validation loss = 0.025107841938734055
Validation loss = 0.02617775835096836
Validation loss = 0.025760825723409653
Validation loss = 0.02538352645933628
Validation loss = 0.027690542861819267
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11649659863945579
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11639762107051826
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11629881154499151
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11620016963528414
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11610169491525424
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11600338696020322
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11590524534686972
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1158072696534235
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11570945945945946
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11561181434599156
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11551433389544688
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11541701769165964
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11531986531986532
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1152228763666947
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11512605042016806
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11502938706968933
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11493288590604027
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11483654652137469
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11474036850921274
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11464435146443515
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11454849498327759
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11445279866332497
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11435726210350584
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11426188490408674
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11416666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00117  |
| Iteration     | 46        |
| MaximumReturn | -0.000729 |
| MinimumReturn | -0.00186  |
| TotalSamples  | 79968     |
-----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02659139595925808
Validation loss = 0.02613602951169014
Validation loss = 0.027176151052117348
Validation loss = 0.02636551856994629
Validation loss = 0.0251859612762928
Validation loss = 0.029908183962106705
Validation loss = 0.027715612202882767
Validation loss = 0.02689039148390293
Validation loss = 0.02743658982217312
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025121018290519714
Validation loss = 0.033536653965711594
Validation loss = 0.02697870507836342
Validation loss = 0.0253796074539423
Validation loss = 0.025442903861403465
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025922060012817383
Validation loss = 0.025486517697572708
Validation loss = 0.024928677827119827
Validation loss = 0.028314882889389992
Validation loss = 0.024375353008508682
Validation loss = 0.02513815090060234
Validation loss = 0.026314979419112206
Validation loss = 0.026738207787275314
Validation loss = 0.02580922283232212
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025607366114854813
Validation loss = 0.024833915755152702
Validation loss = 0.02692735567688942
Validation loss = 0.025676894932985306
Validation loss = 0.027064602822065353
Validation loss = 0.026517048478126526
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026095876470208168
Validation loss = 0.02647055685520172
Validation loss = 0.025772839784622192
Validation loss = 0.02549855038523674
Validation loss = 0.02531425654888153
Validation loss = 0.02581494115293026
Validation loss = 0.029597660526633263
Validation loss = 0.026925314217805862
Validation loss = 0.025624457746744156
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11407160699417153
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11397670549084858
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11388196176226101
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11378737541528239
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11369294605809128
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11359867330016583
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11350455675227837
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11341059602649006
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11331679073614558
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11322314049586776
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11312964492155243
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11303630363036303
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11294311624072548
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1128500823723229
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11275720164609053
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11266447368421052
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11257189811010682
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11247947454844007
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11238720262510254
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11229508196721312
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1122031122031122
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11211129296235679
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11201962387571546
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1119281045751634
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11183673469387755
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00114  |
| Iteration     | 47        |
| MaximumReturn | -0.000805 |
| MinimumReturn | -0.00203  |
| TotalSamples  | 81634     |
-----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.028846686705946922
Validation loss = 0.026497656479477882
Validation loss = 0.026044830679893494
Validation loss = 0.03215381130576134
Validation loss = 0.02575380727648735
Validation loss = 0.026675477623939514
Validation loss = 0.02713250182569027
Validation loss = 0.025867560878396034
Validation loss = 0.02590709552168846
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02635730803012848
Validation loss = 0.025851760059595108
Validation loss = 0.025934970006346703
Validation loss = 0.0258778128772974
Validation loss = 0.026937734335660934
Validation loss = 0.025011103600263596
Validation loss = 0.02579014003276825
Validation loss = 0.025630449876189232
Validation loss = 0.02526354230940342
Validation loss = 0.03154230862855911
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025416379794478416
Validation loss = 0.027327746152877808
Validation loss = 0.026254927739501
Validation loss = 0.02714446187019348
Validation loss = 0.028387650847434998
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0261913500726223
Validation loss = 0.026258444413542747
Validation loss = 0.026417672634124756
Validation loss = 0.028799105435609818
Validation loss = 0.02508961595594883
Validation loss = 0.026108231395483017
Validation loss = 0.025510484352707863
Validation loss = 0.02863939478993416
Validation loss = 0.02995801530778408
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026903793215751648
Validation loss = 0.027659619227051735
Validation loss = 0.02657337859272957
Validation loss = 0.026342999190092087
Validation loss = 0.027258548885583878
Validation loss = 0.028455320745706558
Validation loss = 0.026196956634521484
Validation loss = 0.02620450221002102
Validation loss = 0.035530347377061844
Validation loss = 0.0284593403339386
Validation loss = 0.025921832770109177
Validation loss = 0.027271728962659836
Validation loss = 0.026954416185617447
Validation loss = 0.025878023356199265
Validation loss = 0.02590728923678398
Validation loss = 0.02652447298169136
Validation loss = 0.026571940630674362
Validation loss = 0.025836581364274025
Validation loss = 0.02823466621339321
Validation loss = 0.02608320489525795
Validation loss = 0.027425449341535568
Validation loss = 0.027586359530687332
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11174551386623165
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11165444172779136
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11156351791530944
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1114727420667209
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11138211382113822
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11129163281884646
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1112012987012987
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1111111111111111
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11102106969205834
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11093117408906883
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11084142394822007
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11075181891673404
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11066235864297254
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11057304277643261
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11048387096774194
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11039484286865431
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11030595813204509
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11021721641190668
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11012861736334405
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.11004016064257029
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10995184590690209
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10986367281475541
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10977564102564102
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10968775020016013
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1096
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00308 |
| Iteration     | 48       |
| MaximumReturn | -0.00194 |
| MinimumReturn | -0.00539 |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026091929525136948
Validation loss = 0.02512967959046364
Validation loss = 0.027095021679997444
Validation loss = 0.024744484573602676
Validation loss = 0.02663784846663475
Validation loss = 0.025525905191898346
Validation loss = 0.0262453556060791
Validation loss = 0.025583863258361816
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02783937007188797
Validation loss = 0.02565336599946022
Validation loss = 0.025204379111528397
Validation loss = 0.02692907303571701
Validation loss = 0.025428542867302895
Validation loss = 0.03107307292521
Validation loss = 0.026760568842291832
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02570386417210102
Validation loss = 0.024159660562872887
Validation loss = 0.02503923885524273
Validation loss = 0.02505364641547203
Validation loss = 0.02543354593217373
Validation loss = 0.026417305693030357
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026131965219974518
Validation loss = 0.0266098715364933
Validation loss = 0.025917913764715195
Validation loss = 0.028321251273155212
Validation loss = 0.025019267573952675
Validation loss = 0.0260265301913023
Validation loss = 0.026515096426010132
Validation loss = 0.025012949481606483
Validation loss = 0.025326626375317574
Validation loss = 0.02604595012962818
Validation loss = 0.025491677224636078
Validation loss = 0.02586521953344345
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.027786534279584885
Validation loss = 0.025963719934225082
Validation loss = 0.030285192653536797
Validation loss = 0.026165060698986053
Validation loss = 0.02602066658437252
Validation loss = 0.02588590420782566
Validation loss = 0.027170736342668533
Validation loss = 0.026140011847019196
Validation loss = 0.027279041707515717
Validation loss = 0.028671694919466972
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10951239008792965
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10942492012779553
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10933758978451716
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10925039872408293
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10916334661354582
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10907643312101911
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10898965791567224
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10890302066772654
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10881652104845115
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10873015873015873
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10864393338620143
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10855784469096671
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10847189231987332
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10838607594936708
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.108300395256917
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10821484992101106
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10812943962115233
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10804416403785488
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10795902285263988
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1078740157480315
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1077891424075531
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10770440251572327
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10761979575805185
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1075353218210361
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10745098039215686
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00148 |
| Iteration     | 49       |
| MaximumReturn | -0.00105 |
| MinimumReturn | -0.00219 |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025138726457953453
Validation loss = 0.025610756129026413
Validation loss = 0.03029666095972061
Validation loss = 0.02996319904923439
Validation loss = 0.02514691650867462
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025987911969423294
Validation loss = 0.025216057896614075
Validation loss = 0.025158336386084557
Validation loss = 0.025351444259285927
Validation loss = 0.025082966312766075
Validation loss = 0.02582635171711445
Validation loss = 0.026029720902442932
Validation loss = 0.025252094492316246
Validation loss = 0.02483174577355385
Validation loss = 0.026068057864904404
Validation loss = 0.025467589497566223
Validation loss = 0.02723253332078457
Validation loss = 0.02634083852171898
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025811240077018738
Validation loss = 0.025272265076637268
Validation loss = 0.026986034587025642
Validation loss = 0.026212086901068687
Validation loss = 0.025922201573848724
Validation loss = 0.028368739411234856
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025662383064627647
Validation loss = 0.02493778057396412
Validation loss = 0.026768630370497704
Validation loss = 0.02701796218752861
Validation loss = 0.02531437948346138
Validation loss = 0.025888366624712944
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.029786961153149605
Validation loss = 0.025111768394708633
Validation loss = 0.025915488600730896
Validation loss = 0.025355413556098938
Validation loss = 0.02649732679128647
Validation loss = 0.025777701288461685
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10736677115987461
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10728269381362568
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10719874804381847
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10711493354182955
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10703125
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10694769711163153
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10686427457098284
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10678098207326578
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10669781931464174
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1066147859922179
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10653188180404355
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10644910644910645
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1063664596273292
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10628394103956555
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1062015503875969
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10611928737412858
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10603715170278638
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10595514307811292
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10587326120556415
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10579150579150579
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10570987654320987
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1056283731688512
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10554699537750385
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10546574287913779
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10538461538461538
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00222 |
| Iteration     | 50       |
| MaximumReturn | -0.00152 |
| MinimumReturn | -0.00322 |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02571483887732029
Validation loss = 0.026436999440193176
Validation loss = 0.02555195428431034
Validation loss = 0.025927476584911346
Validation loss = 0.026141483336687088
Validation loss = 0.025601720437407494
Validation loss = 0.026229381561279297
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025258423760533333
Validation loss = 0.02547866478562355
Validation loss = 0.02707098051905632
Validation loss = 0.0273600984364748
Validation loss = 0.024916421622037888
Validation loss = 0.027673546224832535
Validation loss = 0.023742618039250374
Validation loss = 0.02550278790295124
Validation loss = 0.025247476994991302
Validation loss = 0.025512484833598137
Validation loss = 0.02480448968708515
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02557966858148575
Validation loss = 0.02475244551897049
Validation loss = 0.02519095689058304
Validation loss = 0.02443590760231018
Validation loss = 0.02442801184952259
Validation loss = 0.02721664309501648
Validation loss = 0.025143908336758614
Validation loss = 0.025624677538871765
Validation loss = 0.024936256930232048
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02511953003704548
Validation loss = 0.024247102439403534
Validation loss = 0.02526191994547844
Validation loss = 0.024911047890782356
Validation loss = 0.02550734207034111
Validation loss = 0.024717915803194046
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025617599487304688
Validation loss = 0.028284557163715363
Validation loss = 0.025434136390686035
Validation loss = 0.026495404541492462
Validation loss = 0.02634591795504093
Validation loss = 0.026196299120783806
Validation loss = 0.025333324447274208
Validation loss = 0.028582250699400902
Validation loss = 0.025061149150133133
Validation loss = 0.025873614475131035
Validation loss = 0.027040788903832436
Validation loss = 0.0255767609924078
Validation loss = 0.028138695284724236
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10530361260568794
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10522273425499232
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10514198004604758
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10506134969325154
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1049808429118774
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10490045941807044
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10482019892884469
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10474006116207951
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10466004583651642
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10458015267175573
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10450038138825324
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10442073170731707
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10434120335110435
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10426179604261795
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10418250950570342
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10410334346504559
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10402429764616553
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1039453717754173
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10386656557998483
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10378787878787879
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10370931112793338
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10363086232980333
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10355253212396069
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10347432024169184
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10339622641509434
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00107  |
| Iteration     | 51        |
| MaximumReturn | -0.000654 |
| MinimumReturn | -0.00159  |
| TotalSamples  | 88298     |
-----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025144100189208984
Validation loss = 0.029458554461598396
Validation loss = 0.027296215295791626
Validation loss = 0.027304530143737793
Validation loss = 0.026471497491002083
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025095755234360695
Validation loss = 0.025141077116131783
Validation loss = 0.024936042726039886
Validation loss = 0.023816384375095367
Validation loss = 0.025071222335100174
Validation loss = 0.026097940281033516
Validation loss = 0.025936126708984375
Validation loss = 0.025189608335494995
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02470734901726246
Validation loss = 0.0265995804220438
Validation loss = 0.02486370876431465
Validation loss = 0.026520775631070137
Validation loss = 0.02488703280687332
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0258571095764637
Validation loss = 0.02543366514146328
Validation loss = 0.02468465454876423
Validation loss = 0.0244308989495039
Validation loss = 0.02554105408489704
Validation loss = 0.02685398980975151
Validation loss = 0.028154345229268074
Validation loss = 0.02400301769375801
Validation loss = 0.025354618206620216
Validation loss = 0.024573540315032005
Validation loss = 0.028634293004870415
Validation loss = 0.025681016966700554
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026871513575315475
Validation loss = 0.024496400728821754
Validation loss = 0.02589443325996399
Validation loss = 0.025262348353862762
Validation loss = 0.025724099949002266
Validation loss = 0.026386678218841553
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1033182503770739
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10324039186134137
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10316265060240964
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10308502633559068
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10300751879699248
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10293012772351616
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10285285285285285
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10277569392348088
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10269865067466268
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10262172284644194
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10254491017964072
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1024682124158564
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10239162929745889
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10231516056758776
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10223880597014925
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10216256524981357
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10208643815201192
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10201042442293373
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10193452380952381
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10185873605947955
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10178306092124814
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10170749814402376
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1016320474777448
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10155670867309118
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10148148148148148
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00119  |
| Iteration     | 52        |
| MaximumReturn | -0.000785 |
| MinimumReturn | -0.00157  |
| TotalSamples  | 89964     |
-----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02591869607567787
Validation loss = 0.02494487538933754
Validation loss = 0.0248792115598917
Validation loss = 0.024738702923059464
Validation loss = 0.026015281677246094
Validation loss = 0.02619253844022751
Validation loss = 0.025454211980104446
Validation loss = 0.02881474420428276
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024728208780288696
Validation loss = 0.02916143462061882
Validation loss = 0.024539023637771606
Validation loss = 0.026030652225017548
Validation loss = 0.025265242904424667
Validation loss = 0.030762730166316032
Validation loss = 0.024553924798965454
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025121627375483513
Validation loss = 0.0247791800647974
Validation loss = 0.025045784190297127
Validation loss = 0.025033580139279366
Validation loss = 0.02610020712018013
Validation loss = 0.02443816140294075
Validation loss = 0.028297601267695427
Validation loss = 0.0258407574146986
Validation loss = 0.02517872117459774
Validation loss = 0.024827372282743454
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026950974017381668
Validation loss = 0.02455967850983143
Validation loss = 0.02519693598151207
Validation loss = 0.028382249176502228
Validation loss = 0.02444848045706749
Validation loss = 0.025678904727101326
Validation loss = 0.025317253544926643
Validation loss = 0.0257905013859272
Validation loss = 0.023755857720971107
Validation loss = 0.024089913815259933
Validation loss = 0.02437860704958439
Validation loss = 0.025945080444216728
Validation loss = 0.025986772030591965
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02555024065077305
Validation loss = 0.025004876777529716
Validation loss = 0.026403352618217468
Validation loss = 0.030062958598136902
Validation loss = 0.026085754856467247
Validation loss = 0.02540280856192112
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10140636565507032
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10133136094674557
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10125646711012565
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10118168389955687
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1011070110701107
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10103244837758112
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10095799557848195
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10088365243004419
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1008094186902134
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10073529411764706
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10066127847171198
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10058737151248165
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10051357300073367
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10043988269794721
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10036630036630037
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10029282576866765
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10021945866861741
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10014619883040936
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.10007304601899196
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.1
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.099927060539752
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09985422740524781
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09978150036416605
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09970887918486172
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09963636363636363
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00172 |
| Iteration     | 53       |
| MaximumReturn | -0.00121 |
| MinimumReturn | -0.00285 |
| TotalSamples  | 91630    |
----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025215912610292435
Validation loss = 0.025840220972895622
Validation loss = 0.025028202682733536
Validation loss = 0.02700001746416092
Validation loss = 0.026073087006807327
Validation loss = 0.025043969973921776
Validation loss = 0.031279973685741425
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024461660534143448
Validation loss = 0.024731872603297234
Validation loss = 0.024563005194067955
Validation loss = 0.02513338439166546
Validation loss = 0.025767270475625992
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03257483243942261
Validation loss = 0.025492293760180473
Validation loss = 0.024544086307287216
Validation loss = 0.02751760743558407
Validation loss = 0.02652011252939701
Validation loss = 0.02577909454703331
Validation loss = 0.025342615321278572
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025135142728686333
Validation loss = 0.024462612345814705
Validation loss = 0.0265942569822073
Validation loss = 0.02668839320540428
Validation loss = 0.02522575855255127
Validation loss = 0.024884650483727455
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026425356045365334
Validation loss = 0.03330395370721817
Validation loss = 0.024968594312667847
Validation loss = 0.025146063417196274
Validation loss = 0.024766646325588226
Validation loss = 0.024835171177983284
Validation loss = 0.027093948796391487
Validation loss = 0.025293191894888878
Validation loss = 0.0253701601177454
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09956395348837209
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09949164851125636
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09941944847605225
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09934735315445975
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09927536231884058
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09920347574221579
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09913169319826338
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09906001446131597
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09898843930635838
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09891696750902527
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09884559884559885
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09877433309300648
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09870317002881844
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09863210943124551
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0985611510791367
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.098490294751977
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09841954022988506
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09834888729361091
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09827833572453372
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0982078853046595
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09813753581661891
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09806728704366499
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09799713876967096
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09792709077912795
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09785714285714285
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00217 |
| Iteration     | 54       |
| MaximumReturn | -0.00159 |
| MinimumReturn | -0.00383 |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026708101853728294
Validation loss = 0.026226110756397247
Validation loss = 0.03344964236021042
Validation loss = 0.025289857760071754
Validation loss = 0.024725424125790596
Validation loss = 0.025690833106637
Validation loss = 0.025343332439661026
Validation loss = 0.03310154750943184
Validation loss = 0.024265224114060402
Validation loss = 0.025099754333496094
Validation loss = 0.02511576935648918
Validation loss = 0.024629810824990273
Validation loss = 0.025008654221892357
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025566399097442627
Validation loss = 0.027107929810881615
Validation loss = 0.026398135349154472
Validation loss = 0.0248750988394022
Validation loss = 0.024205539375543594
Validation loss = 0.027520930394530296
Validation loss = 0.02484842576086521
Validation loss = 0.024690885096788406
Validation loss = 0.025828314945101738
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024651452898979187
Validation loss = 0.024769460782408714
Validation loss = 0.025649601593613625
Validation loss = 0.026140913367271423
Validation loss = 0.02487611025571823
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025239247828722
Validation loss = 0.02452016994357109
Validation loss = 0.031569406390190125
Validation loss = 0.024500496685504913
Validation loss = 0.02551191858947277
Validation loss = 0.025088198482990265
Validation loss = 0.027216188609600067
Validation loss = 0.02504439279437065
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0251893512904644
Validation loss = 0.02442248724400997
Validation loss = 0.02865525148808956
Validation loss = 0.02664303220808506
Validation loss = 0.024546144530177116
Validation loss = 0.025274528190493584
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09778729478943611
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09771754636233952
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09764789736279401
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09757834757834757
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09750889679715302
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09743954480796586
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09737029140014215
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09730113636363637
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09723207948899928
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09716312056737589
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09709425939050319
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09702549575070822
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09695682944090588
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09688826025459689
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09681978798586573
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09675141242937853
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09668313338038109
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09661495063469676
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09654686398872446
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09647887323943662
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0964109781843772
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09634317862165963
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09627547434996486
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09620786516853932
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09614035087719298
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00103  |
| Iteration     | 55        |
| MaximumReturn | -0.000699 |
| MinimumReturn | -0.00135  |
| TotalSamples  | 94962     |
-----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025925783440470695
Validation loss = 0.026220019906759262
Validation loss = 0.025083262473344803
Validation loss = 0.028646335005760193
Validation loss = 0.024208512157201767
Validation loss = 0.025692107155919075
Validation loss = 0.025176160037517548
Validation loss = 0.027410361915826797
Validation loss = 0.024517152458429337
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0252132136374712
Validation loss = 0.024013973772525787
Validation loss = 0.025049736723303795
Validation loss = 0.02509405091404915
Validation loss = 0.025600800290703773
Validation loss = 0.024708816781640053
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02541135437786579
Validation loss = 0.024337438866496086
Validation loss = 0.025016499683260918
Validation loss = 0.02580259181559086
Validation loss = 0.02708817459642887
Validation loss = 0.027752548456192017
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025137510150671005
Validation loss = 0.024886224418878555
Validation loss = 0.026262493804097176
Validation loss = 0.024991201236844063
Validation loss = 0.02532019279897213
Validation loss = 0.024114277213811874
Validation loss = 0.024260492995381355
Validation loss = 0.027138588950037956
Validation loss = 0.02809986099600792
Validation loss = 0.02434314228594303
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02516263909637928
Validation loss = 0.025105582550168037
Validation loss = 0.024924833327531815
Validation loss = 0.024844931438565254
Validation loss = 0.024799782782793045
Validation loss = 0.026370026171207428
Validation loss = 0.028789807111024857
Validation loss = 0.02650318294763565
Validation loss = 0.025862520560622215
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09607293127629733
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09600560616678346
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09593837535014006
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09587123862841147
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0958041958041958
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0957372466806429
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09567039106145252
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09560362875087229
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09553695955369595
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09547038327526132
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09540389972144847
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0953375086986778
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0952712100139082
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09520500347463516
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09513888888888888
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09507286606523248
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09500693481276005
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09494109494109494
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09487534626038781
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09480968858131487
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09474412171507607
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09467864547339323
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09461325966850828
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0945479641131815
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09448275862068965
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0011   |
| Iteration     | 56        |
| MaximumReturn | -0.000744 |
| MinimumReturn | -0.00194  |
| TotalSamples  | 96628     |
-----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024289652705192566
Validation loss = 0.027211690321564674
Validation loss = 0.02593798004090786
Validation loss = 0.025068029761314392
Validation loss = 0.024521013721823692
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02960427664220333
Validation loss = 0.02594061754643917
Validation loss = 0.024226153269410133
Validation loss = 0.029876215383410454
Validation loss = 0.02473016269505024
Validation loss = 0.024448001757264137
Validation loss = 0.024962222203612328
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02448940835893154
Validation loss = 0.025058036670088768
Validation loss = 0.02700360305607319
Validation loss = 0.02529958449304104
Validation loss = 0.024396071210503578
Validation loss = 0.024425411596894264
Validation loss = 0.02567489631474018
Validation loss = 0.024317992851138115
Validation loss = 0.025273039937019348
Validation loss = 0.02458123117685318
Validation loss = 0.02431090734899044
Validation loss = 0.026377858594059944
Validation loss = 0.028877122327685356
Validation loss = 0.02458612062036991
Validation loss = 0.02496930956840515
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025097696110606194
Validation loss = 0.024824151769280434
Validation loss = 0.02510327659547329
Validation loss = 0.026506418362259865
Validation loss = 0.024550989270210266
Validation loss = 0.025120005011558533
Validation loss = 0.026996565982699394
Validation loss = 0.02429269813001156
Validation loss = 0.025105394423007965
Validation loss = 0.025303935632109642
Validation loss = 0.026744389906525612
Validation loss = 0.02460377663373947
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025084486231207848
Validation loss = 0.025585735216736794
Validation loss = 0.025120293721556664
Validation loss = 0.025727448984980583
Validation loss = 0.024789512157440186
Validation loss = 0.025833090767264366
Validation loss = 0.024670517072081566
Validation loss = 0.02358269691467285
Validation loss = 0.02487623691558838
Validation loss = 0.025863992050290108
Validation loss = 0.02461268939077854
Validation loss = 0.024727433919906616
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09441764300482426
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0943526170798898
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09428768066070199
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09422283356258597
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09415807560137457
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09409340659340659
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09402882635552505
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09396433470507544
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09389993145990404
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09383561643835617
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09377138945927448
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09370725034199727
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0936431989063568
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0935792349726776
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09351535836177474
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09345156889495225
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09338786639400136
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09332425068119891
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09326072157930565
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09319727891156462
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09313392250169952
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09307065217391304
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09300746775288526
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09294436906377204
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0928813559322034
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00285 |
| Iteration     | 57       |
| MaximumReturn | -0.0018  |
| MinimumReturn | -0.00395 |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02527630887925625
Validation loss = 0.027209805324673653
Validation loss = 0.026772819459438324
Validation loss = 0.024877823889255524
Validation loss = 0.02522922120988369
Validation loss = 0.023997778072953224
Validation loss = 0.02404996007680893
Validation loss = 0.024928921833634377
Validation loss = 0.024505574256181717
Validation loss = 0.023687101900577545
Validation loss = 0.02409435249865055
Validation loss = 0.026019347831606865
Validation loss = 0.026136908680200577
Validation loss = 0.02368771843612194
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02440129965543747
Validation loss = 0.028846660628914833
Validation loss = 0.024500546976923943
Validation loss = 0.025660112500190735
Validation loss = 0.023820770904421806
Validation loss = 0.024849994108080864
Validation loss = 0.024718135595321655
Validation loss = 0.02462613582611084
Validation loss = 0.0238485187292099
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02508816123008728
Validation loss = 0.025123070925474167
Validation loss = 0.024061666801571846
Validation loss = 0.027752744033932686
Validation loss = 0.02467215806245804
Validation loss = 0.02443956583738327
Validation loss = 0.023745805025100708
Validation loss = 0.024231722578406334
Validation loss = 0.02494111657142639
Validation loss = 0.02509680576622486
Validation loss = 0.02490987628698349
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02472095750272274
Validation loss = 0.025001516565680504
Validation loss = 0.024607526138424873
Validation loss = 0.02469227835536003
Validation loss = 0.024342266842722893
Validation loss = 0.030911047011613846
Validation loss = 0.025235336273908615
Validation loss = 0.02492590993642807
Validation loss = 0.026466356590390205
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024601895362138748
Validation loss = 0.024640049785375595
Validation loss = 0.024359721690416336
Validation loss = 0.024832099676132202
Validation loss = 0.027031103149056435
Validation loss = 0.02679351717233658
Validation loss = 0.02504180558025837
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09281842818428185
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0927555856465809
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09269282814614344
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09263015551048005
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09256756756756757
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0925050641458474
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09244264507422402
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09238031018206339
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09231805929919137
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09225589225589226
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09219380888290714
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09213180901143242
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09206989247311828
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09200805910006717
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09194630872483221
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09188464118041582
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0918230563002681
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09176155391828533
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09170013386880857
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09163879598662207
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09157754010695188
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09151636606546426
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09145527369826435
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0913942628418946
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09133333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00338 |
| Iteration     | 58       |
| MaximumReturn | -0.00194 |
| MinimumReturn | -0.00455 |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023753328248858452
Validation loss = 0.024837948381900787
Validation loss = 0.02477671578526497
Validation loss = 0.024314381182193756
Validation loss = 0.028575383126735687
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022997422143816948
Validation loss = 0.025872759521007538
Validation loss = 0.024078430607914925
Validation loss = 0.024594247341156006
Validation loss = 0.0269436314702034
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025708692148327827
Validation loss = 0.024137599393725395
Validation loss = 0.02365025132894516
Validation loss = 0.024909375235438347
Validation loss = 0.024167735129594803
Validation loss = 0.02575591765344143
Validation loss = 0.02376409061253071
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024570941925048828
Validation loss = 0.024934614077210426
Validation loss = 0.026949705556035042
Validation loss = 0.02444646507501602
Validation loss = 0.024364113807678223
Validation loss = 0.028111949563026428
Validation loss = 0.026289550587534904
Validation loss = 0.023852067068219185
Validation loss = 0.02416924014687538
Validation loss = 0.02413526363670826
Validation loss = 0.027156665921211243
Validation loss = 0.02384616620838642
Validation loss = 0.024085430428385735
Validation loss = 0.023607872426509857
Validation loss = 0.02380184642970562
Validation loss = 0.026557141914963722
Validation loss = 0.02436654269695282
Validation loss = 0.026546400040388107
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025105025619268417
Validation loss = 0.024347612634301186
Validation loss = 0.025121329352259636
Validation loss = 0.027091186493635178
Validation loss = 0.02488209493458271
Validation loss = 0.024310654029250145
Validation loss = 0.02536502294242382
Validation loss = 0.0245918408036232
Validation loss = 0.027103234082460403
Validation loss = 0.02469494566321373
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09127248500999334
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09121171770972038
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09115103127079174
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09109042553191489
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09102990033222591
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09096945551128818
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09090909090909091
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09084880636604775
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09078860172299535
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09072847682119205
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09066843150231635
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09060846560846561
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09054857898215465
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0904887714663144
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09042904290429044
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09036939313984169
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0903098220171391
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09025032938076416
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0901909150757077
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09013157894736842
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.09007232084155162
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0900131406044678
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08995403808273145
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08989501312335958
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0898360655737705
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00351 |
| Iteration     | 59       |
| MaximumReturn | -0.00203 |
| MinimumReturn | -0.0055  |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02502937987446785
Validation loss = 0.024782922118902206
Validation loss = 0.024072740226984024
Validation loss = 0.029626725241541862
Validation loss = 0.02512490190565586
Validation loss = 0.024789026007056236
Validation loss = 0.024912118911743164
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0255537461489439
Validation loss = 0.025078417733311653
Validation loss = 0.025842072442173958
Validation loss = 0.026876142248511314
Validation loss = 0.024114158004522324
Validation loss = 0.02407991886138916
Validation loss = 0.025047801434993744
Validation loss = 0.02793937362730503
Validation loss = 0.02459334395825863
Validation loss = 0.024627653881907463
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.027765071019530296
Validation loss = 0.02391408383846283
Validation loss = 0.025225749239325523
Validation loss = 0.026810694485902786
Validation loss = 0.02503717876970768
Validation loss = 0.024178054183721542
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026838254183530807
Validation loss = 0.024486737325787544
Validation loss = 0.02503749169409275
Validation loss = 0.027445927262306213
Validation loss = 0.02420903369784355
Validation loss = 0.023974131792783737
Validation loss = 0.02580578252673149
Validation loss = 0.024619536474347115
Validation loss = 0.02697008103132248
Validation loss = 0.024055115878582
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025402767583727837
Validation loss = 0.025979436933994293
Validation loss = 0.024956779554486275
Validation loss = 0.02447991631925106
Validation loss = 0.024514101445674896
Validation loss = 0.024463403970003128
Validation loss = 0.030267691239714622
Validation loss = 0.02382628619670868
Validation loss = 0.024043235927820206
Validation loss = 0.02494114637374878
Validation loss = 0.024032916873693466
Validation loss = 0.024830687791109085
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08977719528178243
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08971840209561231
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08965968586387435
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08960104643557881
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08954248366013072
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08948399738732854
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08942558746736293
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08936725375081539
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08930899608865711
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08925081433224756
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08919270833333333
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08913467794404685
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08907672301690507
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08901884340480831
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08896103896103896
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08890330953926022
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08884565499351492
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08878807517822424
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08873056994818652
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08867313915857605
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08861578266494179
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08855850032320621
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08850129198966408
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08844415752098128
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08838709677419355
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00112  |
| Iteration     | 60        |
| MaximumReturn | -0.000716 |
| MinimumReturn | -0.00159  |
| TotalSamples  | 103292    |
-----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02394251339137554
Validation loss = 0.02617681585252285
Validation loss = 0.024260319769382477
Validation loss = 0.025471989065408707
Validation loss = 0.024978289380669594
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024057850241661072
Validation loss = 0.025646785274147987
Validation loss = 0.026202240958809853
Validation loss = 0.0239787045866251
Validation loss = 0.024503426626324654
Validation loss = 0.024396760389208794
Validation loss = 0.02861657179892063
Validation loss = 0.02400052361190319
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024905655533075333
Validation loss = 0.029279330745339394
Validation loss = 0.02387532964348793
Validation loss = 0.023988358676433563
Validation loss = 0.02457861602306366
Validation loss = 0.02354048192501068
Validation loss = 0.024441976100206375
Validation loss = 0.026932936161756516
Validation loss = 0.025286301970481873
Validation loss = 0.02460443414747715
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026054605841636658
Validation loss = 0.02421439439058304
Validation loss = 0.026478931307792664
Validation loss = 0.025321125984191895
Validation loss = 0.0250953771173954
Validation loss = 0.02533104456961155
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025205334648489952
Validation loss = 0.024106768891215324
Validation loss = 0.025250239297747612
Validation loss = 0.024564677849411964
Validation loss = 0.024753643199801445
Validation loss = 0.026156077161431313
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08833010960670536
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08827319587628867
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08821635544108178
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08815958815958816
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08810289389067524
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08804627249357326
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08798972382787412
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08793324775353016
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08787684413085312
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08782051282051281
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0877642536835362
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08770806658130602
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08765195137555983
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08759590792838874
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08753993610223643
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08748403575989783
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0874282067645182
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08737244897959184
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08731676226896112
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08726114649681528
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08720560152768937
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0871501272264631
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08709472345835982
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08703939008894536
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08698412698412698
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00104  |
| Iteration     | 61        |
| MaximumReturn | -0.000769 |
| MinimumReturn | -0.00135  |
| TotalSamples  | 104958    |
-----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024926112964749336
Validation loss = 0.025522548705339432
Validation loss = 0.024065367877483368
Validation loss = 0.02546675316989422
Validation loss = 0.024194767698645592
Validation loss = 0.023793131113052368
Validation loss = 0.02530486136674881
Validation loss = 0.025840263813734055
Validation loss = 0.02891921065747738
Validation loss = 0.024638723582029343
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02398519776761532
Validation loss = 0.025775205343961716
Validation loss = 0.027225706726312637
Validation loss = 0.024047335609793663
Validation loss = 0.024670898914337158
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024419555440545082
Validation loss = 0.023931147530674934
Validation loss = 0.025757670402526855
Validation loss = 0.02435445599257946
Validation loss = 0.025127951055765152
Validation loss = 0.02465824969112873
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023946847766637802
Validation loss = 0.02435363456606865
Validation loss = 0.02484055422246456
Validation loss = 0.025415809825062752
Validation loss = 0.02442809008061886
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.028355039656162262
Validation loss = 0.024101940914988518
Validation loss = 0.025170447304844856
Validation loss = 0.024592511355876923
Validation loss = 0.023615332320332527
Validation loss = 0.02485302835702896
Validation loss = 0.026667704805731773
Validation loss = 0.02552901953458786
Validation loss = 0.025363769382238388
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08692893401015228
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08687381103360811
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08681875792141952
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08676377454084863
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08670886075949367
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0866540164452878
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0865992414664981
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08654453569172457
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08648989898989899
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08643533123028391
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08638083228247162
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08632640201638311
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.086272040302267
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08621774701069855
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08616352201257861
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08610936517913262
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08605527638190955
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08600125549278091
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08594730238393977
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08589341692789969
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08583959899749373
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08578584846587352
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08573216520650813
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08567854909318325
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.085625
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00119  |
| Iteration     | 62        |
| MaximumReturn | -0.000846 |
| MinimumReturn | -0.00162  |
| TotalSamples  | 106624    |
-----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023797009140253067
Validation loss = 0.024301258847117424
Validation loss = 0.024883413687348366
Validation loss = 0.02505957894027233
Validation loss = 0.024089878425002098
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024024859070777893
Validation loss = 0.024893688037991524
Validation loss = 0.03032241202890873
Validation loss = 0.024215470999479294
Validation loss = 0.02560160495340824
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02405252493917942
Validation loss = 0.02399405837059021
Validation loss = 0.024454064667224884
Validation loss = 0.024662991985678673
Validation loss = 0.025271276012063026
Validation loss = 0.02436872012913227
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02547690086066723
Validation loss = 0.026382893323898315
Validation loss = 0.025302663445472717
Validation loss = 0.02375750243663788
Validation loss = 0.02513163909316063
Validation loss = 0.024901017546653748
Validation loss = 0.026139385998249054
Validation loss = 0.023937979713082314
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02458946406841278
Validation loss = 0.024171944707632065
Validation loss = 0.023234430700540543
Validation loss = 0.025120580568909645
Validation loss = 0.02507971227169037
Validation loss = 0.023610122501850128
Validation loss = 0.02572719380259514
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08557151780137415
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08551810237203496
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08546475358702432
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08541147132169576
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0853582554517134
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08530510585305105
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08525202240199128
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08519900497512438
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08514605344934742
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08509316770186336
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08504034761018
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08498759305210918
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08493490390576565
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0848822800495663
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0848297213622291
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08477722772277228
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0847247990105133
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08467243510506799
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0846201358863496
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08456790123456791
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08451573103022826
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08446362515413071
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08441158348736907
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08435960591133004
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0843076923076923
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00123  |
| Iteration     | 63        |
| MaximumReturn | -0.000775 |
| MinimumReturn | -0.00174  |
| TotalSamples  | 108290    |
-----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024226557463407516
Validation loss = 0.024524379521608353
Validation loss = 0.02797343023121357
Validation loss = 0.024230526760220528
Validation loss = 0.024993538856506348
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02483237534761429
Validation loss = 0.023427022621035576
Validation loss = 0.023719286546111107
Validation loss = 0.02316645346581936
Validation loss = 0.023384757339954376
Validation loss = 0.024533269926905632
Validation loss = 0.02563605085015297
Validation loss = 0.024854222312569618
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02342809922993183
Validation loss = 0.023858366534113884
Validation loss = 0.02350839041173458
Validation loss = 0.023932309821248055
Validation loss = 0.024693794548511505
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02390800602734089
Validation loss = 0.024860097095370293
Validation loss = 0.024307670071721077
Validation loss = 0.026175521314144135
Validation loss = 0.024469174444675446
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024195406585931778
Validation loss = 0.02482500486075878
Validation loss = 0.024625340476632118
Validation loss = 0.0239571463316679
Validation loss = 0.025436531752347946
Validation loss = 0.023992739617824554
Validation loss = 0.025065526366233826
Validation loss = 0.024644067510962486
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08425584255842558
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0842040565457898
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08415233415233415
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08410067526089625
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08404907975460123
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08399754751686082
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08394607843137254
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0838946723821188
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08384332925336598
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0837920489296636
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08374083129584352
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08368967623701894
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08363858363858363
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0835875533862111
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08353658536585366
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08348567946374162
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08343483556638245
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08338405356055995
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08333333333333333
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08328267477203648
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08323207776427703
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08318154219793564
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08313106796116505
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08308065494238932
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08303030303030302
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00112  |
| Iteration     | 64        |
| MaximumReturn | -0.000786 |
| MinimumReturn | -0.0019   |
| TotalSamples  | 109956    |
-----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02511821873486042
Validation loss = 0.023808889091014862
Validation loss = 0.02627469412982464
Validation loss = 0.02953724004328251
Validation loss = 0.026647008955478668
Validation loss = 0.023552181199193
Validation loss = 0.02379394695162773
Validation loss = 0.02363310381770134
Validation loss = 0.025108693167567253
Validation loss = 0.026761626824736595
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02470209449529648
Validation loss = 0.02378964237868786
Validation loss = 0.0240750964730978
Validation loss = 0.0240382868796587
Validation loss = 0.02627190761268139
Validation loss = 0.025193994864821434
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02789040096104145
Validation loss = 0.025499368086457253
Validation loss = 0.023036599159240723
Validation loss = 0.024570556357502937
Validation loss = 0.02379387803375721
Validation loss = 0.024716705083847046
Validation loss = 0.02946244552731514
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025237690657377243
Validation loss = 0.024747701361775398
Validation loss = 0.02787601388990879
Validation loss = 0.024968689307570457
Validation loss = 0.023029912263154984
Validation loss = 0.024012111127376556
Validation loss = 0.02583976276218891
Validation loss = 0.023903047665953636
Validation loss = 0.024689480662345886
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024893907830119133
Validation loss = 0.026805048808455467
Validation loss = 0.024401351809501648
Validation loss = 0.024303413927555084
Validation loss = 0.02409243769943714
Validation loss = 0.02560018002986908
Validation loss = 0.0236971378326416
Validation loss = 0.02352045848965645
Validation loss = 0.023852700367569923
Validation loss = 0.025116680189967155
Validation loss = 0.023411644622683525
Validation loss = 0.025506550446152687
Validation loss = 0.026253584772348404
Validation loss = 0.02366463653743267
Validation loss = 0.023346833884716034
Validation loss = 0.026230216026306152
Validation loss = 0.025460069999098778
Validation loss = 0.023832667618989944
Validation loss = 0.024663787335157394
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08298001211387038
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08292978208232446
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08287961282516637
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08282950423216445
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08277945619335347
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08272946859903382
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08267954133977067
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08262967430639324
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08257986738999397
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0825301204819277
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08248043347381095
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08243080625752106
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08238123872519543
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08233173076923077
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08228228228228228
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0822328931572629
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08218356328734253
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08213429256594725
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08208508088675853
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08203592814371258
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0819868342309994
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0819377990430622
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08188882247459653
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08183990442054959
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0817910447761194
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00108  |
| Iteration     | 65        |
| MaximumReturn | -0.000846 |
| MinimumReturn | -0.00145  |
| TotalSamples  | 111622    |
-----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.027124125510454178
Validation loss = 0.024225939065217972
Validation loss = 0.02512088045477867
Validation loss = 0.025175543501973152
Validation loss = 0.024029862135648727
Validation loss = 0.02430519461631775
Validation loss = 0.03196198120713234
Validation loss = 0.02452477626502514
Validation loss = 0.02895425632596016
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02546130307018757
Validation loss = 0.02590840309858322
Validation loss = 0.02429025247693062
Validation loss = 0.03169749304652214
Validation loss = 0.024285942316055298
Validation loss = 0.02419402450323105
Validation loss = 0.024207549169659615
Validation loss = 0.02511505037546158
Validation loss = 0.024021895602345467
Validation loss = 0.025958191603422165
Validation loss = 0.026181496679782867
Validation loss = 0.023898785933852196
Validation loss = 0.024981118738651276
Validation loss = 0.02529091015458107
Validation loss = 0.025201624259352684
Validation loss = 0.02376045659184456
Validation loss = 0.02440599910914898
Validation loss = 0.02913864329457283
Validation loss = 0.02438570372760296
Validation loss = 0.024681616574525833
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.024392513558268547
Validation loss = 0.024086015298962593
Validation loss = 0.024025993421673775
Validation loss = 0.02391918934881687
Validation loss = 0.02998991683125496
Validation loss = 0.02439652942121029
Validation loss = 0.02546883001923561
Validation loss = 0.023703986778855324
Validation loss = 0.023415913805365562
Validation loss = 0.026266098022460938
Validation loss = 0.024548491463065147
Validation loss = 0.023517511785030365
Validation loss = 0.025749674066901207
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02540571428835392
Validation loss = 0.026107527315616608
Validation loss = 0.02524688094854355
Validation loss = 0.024882076308131218
Validation loss = 0.02428717166185379
Validation loss = 0.024197475984692574
Validation loss = 0.023276986554265022
Validation loss = 0.03074950911104679
Validation loss = 0.024624377489089966
Validation loss = 0.024263130500912666
Validation loss = 0.025591015815734863
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023666774854063988
Validation loss = 0.024819383397698402
Validation loss = 0.025401445105671883
Validation loss = 0.024269385263323784
Validation loss = 0.02574584074318409
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08174224343675418
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08169350029815146
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08164481525625746
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08159618820726623
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08154761904761905
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08149910767400356
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08145065398335315
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0814022578728461
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08135391923990498
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08130563798219585
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08125741399762752
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08120924718435092
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0811611374407583
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08111308466548253
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08106508875739644
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08101714961561206
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08096926713947991
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08092144122858831
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0808736717827627
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0808259587020649
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08077830188679246
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0807307012374779
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08068315665488811
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08063566804002355
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08058823529411764
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00368 |
| Iteration     | 66       |
| MaximumReturn | -0.00271 |
| MinimumReturn | -0.00601 |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025882402434945107
Validation loss = 0.02524552121758461
Validation loss = 0.02457561157643795
Validation loss = 0.02475583553314209
Validation loss = 0.02324083261191845
Validation loss = 0.024527868255972862
Validation loss = 0.024080948904156685
Validation loss = 0.024301502853631973
Validation loss = 0.02707226574420929
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023994257673621178
Validation loss = 0.023986870422959328
Validation loss = 0.024798188358545303
Validation loss = 0.024168094620108604
Validation loss = 0.025453107431530952
Validation loss = 0.024066032841801643
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.026566041633486748
Validation loss = 0.02400456741452217
Validation loss = 0.023229466751217842
Validation loss = 0.023734750226140022
Validation loss = 0.023598849773406982
Validation loss = 0.02425393834710121
Validation loss = 0.023959284648299217
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02487535960972309
Validation loss = 0.02470780722796917
Validation loss = 0.02433539368212223
Validation loss = 0.025017771869897842
Validation loss = 0.02403276227414608
Validation loss = 0.02471007965505123
Validation loss = 0.02639632299542427
Validation loss = 0.02365156076848507
Validation loss = 0.02515062876045704
Validation loss = 0.024872610345482826
Validation loss = 0.0237573254853487
Validation loss = 0.023588543757796288
Validation loss = 0.025886254385113716
Validation loss = 0.025935089215636253
Validation loss = 0.02346937544643879
Validation loss = 0.023555813357234
Validation loss = 0.024386273697018623
Validation loss = 0.02380126714706421
Validation loss = 0.024530794471502304
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024523193016648293
Validation loss = 0.02413191832602024
Validation loss = 0.026122460141777992
Validation loss = 0.023845741525292397
Validation loss = 0.02545560710132122
Validation loss = 0.024532997980713844
Validation loss = 0.025323856621980667
Validation loss = 0.02440890669822693
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0805408583186361
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08049353701527615
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08044627128596595
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08039906103286384
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08035190615835777
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08030480656506447
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08025776215582894
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08021077283372366
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08016383850204799
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08011695906432749
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08007013442431327
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.08002336448598131
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07997664915353181
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07992998833138856
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07988338192419825
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07983682983682984
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07979033197437391
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07974388824214203
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07969749854566609
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07965116279069767
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07960488088320744
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07955865272938444
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07951247823563552
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07946635730858469
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07942028985507246
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00171 |
| Iteration     | 67       |
| MaximumReturn | -0.00117 |
| MinimumReturn | -0.00291 |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02318078652024269
Validation loss = 0.02352720871567726
Validation loss = 0.024350972846150398
Validation loss = 0.036355528980493546
Validation loss = 0.02443959377706051
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024389589205384254
Validation loss = 0.026932254433631897
Validation loss = 0.02332247421145439
Validation loss = 0.023550963029265404
Validation loss = 0.02478785626590252
Validation loss = 0.02408982813358307
Validation loss = 0.024162085726857185
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02318326011300087
Validation loss = 0.024590786546468735
Validation loss = 0.02662571705877781
Validation loss = 0.02340928465127945
Validation loss = 0.025759520009160042
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025349341332912445
Validation loss = 0.02419394813477993
Validation loss = 0.024008721113204956
Validation loss = 0.02507276087999344
Validation loss = 0.023982008919119835
Validation loss = 0.02606986090540886
Validation loss = 0.023431669920682907
Validation loss = 0.024747317656874657
Validation loss = 0.02355688437819481
Validation loss = 0.025239909067749977
Validation loss = 0.02329873852431774
Validation loss = 0.023459209129214287
Validation loss = 0.025708984583616257
Validation loss = 0.02371254563331604
Validation loss = 0.02457752265036106
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024549150839447975
Validation loss = 0.023619186133146286
Validation loss = 0.024781180545687675
Validation loss = 0.023749593645334244
Validation loss = 0.027431081980466843
Validation loss = 0.02407844364643097
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07937427578215528
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07932831499710481
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07928240740740741
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07923655292076345
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07919075144508671
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07914500288850375
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07909930715935334
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07905366416618581
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0790080738177624
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07896253602305475
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07891705069124424
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07887161773172136
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07882623705408516
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07878090856814261
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07873563218390804
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07869040781160253
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07864523536165327
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07860011474469306
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07855504587155963
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07851002865329514
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07846506300114547
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07842014882655982
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07837528604118993
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07833047455688966
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07828571428571429
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00506 |
| Iteration     | 68       |
| MaximumReturn | -0.00388 |
| MinimumReturn | -0.00663 |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02468928135931492
Validation loss = 0.025504715740680695
Validation loss = 0.02783821150660515
Validation loss = 0.024456359446048737
Validation loss = 0.023936009034514427
Validation loss = 0.026519332081079483
Validation loss = 0.024218956008553505
Validation loss = 0.02690574899315834
Validation loss = 0.024094395339488983
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024061480537056923
Validation loss = 0.025871220976114273
Validation loss = 0.023166390135884285
Validation loss = 0.02306630089879036
Validation loss = 0.024141885340213776
Validation loss = 0.02555394545197487
Validation loss = 0.023079153150320053
Validation loss = 0.02337178774178028
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023085931316018105
Validation loss = 0.023785535246133804
Validation loss = 0.023017805069684982
Validation loss = 0.023814935237169266
Validation loss = 0.029045982286334038
Validation loss = 0.025121966376900673
Validation loss = 0.02732296846807003
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023319128900766373
Validation loss = 0.023823974654078484
Validation loss = 0.023350071161985397
Validation loss = 0.023866023868322372
Validation loss = 0.024867426604032516
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024617239832878113
Validation loss = 0.025728773325681686
Validation loss = 0.02390766143798828
Validation loss = 0.02385029010474682
Validation loss = 0.024768443778157234
Validation loss = 0.023053711280226707
Validation loss = 0.023428700864315033
Validation loss = 0.023624058812856674
Validation loss = 0.023341894149780273
Validation loss = 0.02306271903216839
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07824100513992005
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07819634703196347
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07815173987450086
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07810718358038768
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07806267806267807
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07801822323462415
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07797381900967558
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07792946530147896
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0778851620238772
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07784090909090909
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07779670641680864
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07775255391600454
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07770845150311968
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07766439909297052
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07762039660056658
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07757644394110985
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07753254102999434
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07748868778280543
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07744488411531938
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07740112994350283
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07735742518351214
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.077313769751693
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0772701635645798
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07722660653889515
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0771830985915493
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00101 |
| Iteration     | 69       |
| MaximumReturn | -0.00072 |
| MinimumReturn | -0.00156 |
| TotalSamples  | 118286   |
----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024156350642442703
Validation loss = 0.023634906858205795
Validation loss = 0.023625986650586128
Validation loss = 0.02376304566860199
Validation loss = 0.02387947030365467
Validation loss = 0.025035841390490532
Validation loss = 0.02445603907108307
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023655397817492485
Validation loss = 0.02357424609363079
Validation loss = 0.026494307443499565
Validation loss = 0.02392626367509365
Validation loss = 0.023369422182440758
Validation loss = 0.02442736178636551
Validation loss = 0.024120060727000237
Validation loss = 0.028683599084615707
Validation loss = 0.02366701513528824
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023913567885756493
Validation loss = 0.023754354566335678
Validation loss = 0.02411637268960476
Validation loss = 0.02224845439195633
Validation loss = 0.024484219029545784
Validation loss = 0.022853881120681763
Validation loss = 0.0239337719976902
Validation loss = 0.023595483973622322
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027135945856571198
Validation loss = 0.02310168370604515
Validation loss = 0.022411659359931946
Validation loss = 0.024578271433711052
Validation loss = 0.0241414625197649
Validation loss = 0.023245587944984436
Validation loss = 0.024249251931905746
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02321884036064148
Validation loss = 0.025042619556188583
Validation loss = 0.02387954294681549
Validation loss = 0.023587828502058983
Validation loss = 0.0240031685680151
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07713963963963964
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07709622960045019
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07705286839145106
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07700955593029792
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07696629213483146
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07692307692307693
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07687991021324354
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07683679192372406
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07679372197309417
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07675070028011205
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0767077267637178
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07666480134303301
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07662192393736018
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07657909446618222
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07653631284916201
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07649357900614182
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07645089285714286
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07640825432236475
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07636566332218506
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07632311977715878
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07628062360801782
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07623817473567056
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07619577308120133
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07615341856586992
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07611111111111112
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00458 |
| Iteration     | 70       |
| MaximumReturn | -0.00349 |
| MinimumReturn | -0.00617 |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02398868091404438
Validation loss = 0.02383916638791561
Validation loss = 0.026395253837108612
Validation loss = 0.023096967488527298
Validation loss = 0.024927770718932152
Validation loss = 0.024106193333864212
Validation loss = 0.02648809365928173
Validation loss = 0.02352076768875122
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023786636069417
Validation loss = 0.02489469386637211
Validation loss = 0.024459443986415863
Validation loss = 0.024783235043287277
Validation loss = 0.02547633834183216
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022440532222390175
Validation loss = 0.02427569404244423
Validation loss = 0.028277475386857986
Validation loss = 0.022981176152825356
Validation loss = 0.026585016399621964
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02499542199075222
Validation loss = 0.023255638778209686
Validation loss = 0.023247675970196724
Validation loss = 0.02323555387556553
Validation loss = 0.02337181754410267
Validation loss = 0.02242366597056389
Validation loss = 0.024180520325899124
Validation loss = 0.02473558858036995
Validation loss = 0.022826220840215683
Validation loss = 0.0229756198823452
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02314360812306404
Validation loss = 0.023504110053181648
Validation loss = 0.023133505135774612
Validation loss = 0.023447738960385323
Validation loss = 0.02588995173573494
Validation loss = 0.023918138816952705
Validation loss = 0.03034999407827854
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07606885063853415
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07602663706992231
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07598447032723239
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07594235033259424
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07590027700831024
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07585825027685493
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07581627006087438
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07577433628318585
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07573244886677723
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07569060773480663
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07564881281060187
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07560706401766004
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.075565361279647
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07552370452039692
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07548209366391184
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07544052863436124
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07539900935608146
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07535753575357536
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07531610775151182
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07527472527472527
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07523338824821527
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.075192096597146
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07515085024684585
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07510964912280702
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07506849315068494
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00104 |
| Iteration     | 71       |
| MaximumReturn | -0.00078 |
| MinimumReturn | -0.00152 |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023634476587176323
Validation loss = 0.023271258920431137
Validation loss = 0.02435106411576271
Validation loss = 0.02435765601694584
Validation loss = 0.023367343470454216
Validation loss = 0.022173510864377022
Validation loss = 0.02334439940750599
Validation loss = 0.024265330284833908
Validation loss = 0.02442321553826332
Validation loss = 0.023833507671952248
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0324464850127697
Validation loss = 0.023654887452721596
Validation loss = 0.025419624522328377
Validation loss = 0.02723061479628086
Validation loss = 0.02366156131029129
Validation loss = 0.024325961247086525
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0229163970798254
Validation loss = 0.02395591139793396
Validation loss = 0.024789581075310707
Validation loss = 0.023567980155348778
Validation loss = 0.02237807586789131
Validation loss = 0.02383643575012684
Validation loss = 0.03220969811081886
Validation loss = 0.023321425542235374
Validation loss = 0.023343656212091446
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02420353703200817
Validation loss = 0.023494848981499672
Validation loss = 0.023162465542554855
Validation loss = 0.02339397743344307
Validation loss = 0.024321846663951874
Validation loss = 0.023179421201348305
Validation loss = 0.02874661050736904
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02450631745159626
Validation loss = 0.023648833855986595
Validation loss = 0.025770938023924828
Validation loss = 0.023638317361474037
Validation loss = 0.025497805327177048
Validation loss = 0.02932284213602543
Validation loss = 0.023846635594964027
Validation loss = 0.023271804675459862
Validation loss = 0.022899366915225983
Validation loss = 0.025705939158797264
Validation loss = 0.02529238723218441
Validation loss = 0.02303866483271122
Validation loss = 0.02425789088010788
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07502738225629792
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0749863163656267
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07494529540481401
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07490431930016403
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07486338797814207
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07482250136537412
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07478165938864628
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07474086197490452
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07470010905125408
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07465940054495913
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07461873638344227
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07457811649428416
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07453754080522307
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07449700924415444
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07445652173913044
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07441607821835959
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0743756786102063
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07433532284319046
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07429501084598698
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07425474254742548
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07421451787648971
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07417433676231727
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07413419913419914
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07409410492157924
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07405405405405406
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00175 |
| Iteration     | 72       |
| MaximumReturn | -0.00127 |
| MinimumReturn | -0.00243 |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024419773370027542
Validation loss = 0.026007961481809616
Validation loss = 0.02434530109167099
Validation loss = 0.024223383516073227
Validation loss = 0.023569736629724503
Validation loss = 0.025357957929372787
Validation loss = 0.023638581857085228
Validation loss = 0.026057017967104912
Validation loss = 0.02306869626045227
Validation loss = 0.024459315463900566
Validation loss = 0.026854565367102623
Validation loss = 0.02390786074101925
Validation loss = 0.023749304935336113
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024006275460124016
Validation loss = 0.0238849725574255
Validation loss = 0.025440072640776634
Validation loss = 0.02417364902794361
Validation loss = 0.02412804774940014
Validation loss = 0.0281769260764122
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023398537188768387
Validation loss = 0.022771645337343216
Validation loss = 0.023094967007637024
Validation loss = 0.027729429304599762
Validation loss = 0.022384224459528923
Validation loss = 0.023432498797774315
Validation loss = 0.023150501772761345
Validation loss = 0.02584204636514187
Validation loss = 0.024682654067873955
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025095107033848763
Validation loss = 0.023456376045942307
Validation loss = 0.023532189428806305
Validation loss = 0.02392849512398243
Validation loss = 0.024745509028434753
Validation loss = 0.023460237309336662
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025471458211541176
Validation loss = 0.02384393848478794
Validation loss = 0.02310960367321968
Validation loss = 0.024438057094812393
Validation loss = 0.023396510630846024
Validation loss = 0.023919353261590004
Validation loss = 0.023676948621869087
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07401404646137223
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07397408207343413
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07393416082029142
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07389428263214672
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0738544474393531
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0738146551724138
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07377490576198169
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.073735199138859
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07369553523399677
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07365591397849462
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07361633530360022
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07357679914070892
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07353730542136339
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07349785407725322
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07345844504021448
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07341907824222937
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07337975361542581
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07334047109207709
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07330123060460139
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0732620320855615
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07322287546766434
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07318376068376069
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07314468766684463
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07310565635005337
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07306666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00361 |
| Iteration     | 73       |
| MaximumReturn | -0.0022  |
| MinimumReturn | -0.00575 |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02335570752620697
Validation loss = 0.02447747252881527
Validation loss = 0.027045557275414467
Validation loss = 0.022741982713341713
Validation loss = 0.02990579418838024
Validation loss = 0.02625826746225357
Validation loss = 0.022617187350988388
Validation loss = 0.021773695945739746
Validation loss = 0.023370089009404182
Validation loss = 0.02593892440199852
Validation loss = 0.022557567805051804
Validation loss = 0.022613925859332085
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02511274255812168
Validation loss = 0.02391599491238594
Validation loss = 0.024290096014738083
Validation loss = 0.023807240650057793
Validation loss = 0.023428136482834816
Validation loss = 0.023177925497293472
Validation loss = 0.023711660876870155
Validation loss = 0.024270830675959587
Validation loss = 0.025255519896745682
Validation loss = 0.025181373581290245
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02663712576031685
Validation loss = 0.023189466446638107
Validation loss = 0.02555660717189312
Validation loss = 0.022956866770982742
Validation loss = 0.022810712456703186
Validation loss = 0.02362341433763504
Validation loss = 0.023329170420765877
Validation loss = 0.02581103704869747
Validation loss = 0.023565638810396194
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025044120848178864
Validation loss = 0.02341078408062458
Validation loss = 0.022685127332806587
Validation loss = 0.027088338509202003
Validation loss = 0.0244669858366251
Validation loss = 0.023309731855988503
Validation loss = 0.023407623171806335
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022908613085746765
Validation loss = 0.023323247209191322
Validation loss = 0.023960579186677933
Validation loss = 0.023480284959077835
Validation loss = 0.024696292355656624
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0730277185501066
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07298881193393714
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07294994675186368
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07291112293773283
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07287234042553191
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07283359914938863
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07279489904357067
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07275624004248539
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0727176220806794
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0726790450928382
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07264050901378578
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07260201377848437
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0725635593220339
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07252514557967178
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07248677248677249
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07244843997884717
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07241014799154334
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07237189646064449
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0723336853220697
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07229551451187335
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07225738396624473
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07221929362150764
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07218124341412013
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07214323328067404
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07210526315789474
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00125  |
| Iteration     | 74        |
| MaximumReturn | -0.000934 |
| MinimumReturn | -0.00171  |
| TotalSamples  | 126616    |
-----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02398730255663395
Validation loss = 0.024333791807293892
Validation loss = 0.023115776479244232
Validation loss = 0.023278288543224335
Validation loss = 0.02241559885442257
Validation loss = 0.022186439484357834
Validation loss = 0.02444467507302761
Validation loss = 0.022655636072158813
Validation loss = 0.023245280608534813
Validation loss = 0.024104585871100426
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023242656141519547
Validation loss = 0.02430160529911518
Validation loss = 0.02515748329460621
Validation loss = 0.02293781191110611
Validation loss = 0.022838300094008446
Validation loss = 0.027756117284297943
Validation loss = 0.022910187020897865
Validation loss = 0.024104570969939232
Validation loss = 0.02589552104473114
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02218097820878029
Validation loss = 0.024181459099054337
Validation loss = 0.023410070687532425
Validation loss = 0.023416219279170036
Validation loss = 0.022840702906250954
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022705640643835068
Validation loss = 0.023913096636533737
Validation loss = 0.024778347462415695
Validation loss = 0.023513194173574448
Validation loss = 0.02338000014424324
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023712318390607834
Validation loss = 0.0235072560608387
Validation loss = 0.03285105898976326
Validation loss = 0.023841336369514465
Validation loss = 0.023854941129684448
Validation loss = 0.023507386445999146
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07206733298264072
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07202944269190326
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0719915922228061
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07195378151260504
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07191601049868766
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07187827911857293
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07184058730991086
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07180293501048218
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.071765322158198
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07172774869109948
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0716902145473574
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07165271966527197
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07161526398327235
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07157784743991641
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07154046997389034
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07150313152400835
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0714658320292123
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07142857142857142
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07139134966128191
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07135416666666666
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07131702238417491
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0712799167533819
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07124284971398856
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07120582120582121
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07116883116883117
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00114  |
| Iteration     | 75        |
| MaximumReturn | -0.000772 |
| MinimumReturn | -0.00152  |
| TotalSamples  | 128282    |
-----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025388136506080627
Validation loss = 0.02283748798072338
Validation loss = 0.02265786938369274
Validation loss = 0.02360627055168152
Validation loss = 0.023079529404640198
Validation loss = 0.022424476221203804
Validation loss = 0.02387041598558426
Validation loss = 0.02474765107035637
Validation loss = 0.024118993431329727
Validation loss = 0.023290235549211502
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025488052517175674
Validation loss = 0.023406077176332474
Validation loss = 0.026975326240062714
Validation loss = 0.02338240295648575
Validation loss = 0.02384837344288826
Validation loss = 0.0235077366232872
Validation loss = 0.023659193888306618
Validation loss = 0.023171065375208855
Validation loss = 0.023886553943157196
Validation loss = 0.025545887649059296
Validation loss = 0.024110956117510796
Validation loss = 0.023525314405560493
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023371700197458267
Validation loss = 0.024819359183311462
Validation loss = 0.0222453735768795
Validation loss = 0.0233191829174757
Validation loss = 0.022484447807073593
Validation loss = 0.025316372513771057
Validation loss = 0.022079750895500183
Validation loss = 0.024998582899570465
Validation loss = 0.023458274081349373
Validation loss = 0.02386009320616722
Validation loss = 0.023531772196292877
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023330673575401306
Validation loss = 0.025052286684513092
Validation loss = 0.02329103648662567
Validation loss = 0.023405224084854126
Validation loss = 0.023289602249860764
Validation loss = 0.023457586765289307
Validation loss = 0.024212272837758064
Validation loss = 0.022871481254696846
Validation loss = 0.02287767082452774
Validation loss = 0.023035241290926933
Validation loss = 0.022451505064964294
Validation loss = 0.022640949115157127
Validation loss = 0.023475904017686844
Validation loss = 0.023106617853045464
Validation loss = 0.02310742810368538
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022875238209962845
Validation loss = 0.023296471685171127
Validation loss = 0.0247475765645504
Validation loss = 0.023326415568590164
Validation loss = 0.02294653281569481
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0711318795430945
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07109496626881162
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07105809128630705
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07102125453602903
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07098445595854923
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0709476954945624
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07091097308488613
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07087428867046043
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07083764219234746
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07080103359173126
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07076446280991736
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07072792978833248
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07069143446852426
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07065497679216091
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07061855670103093
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07058217413704276
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0705458290422245
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07050952135872363
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07047325102880658
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07043701799485862
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07040082219938334
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07036466358500257
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07032854209445585
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07029245767060031
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07025641025641026
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00327 |
| Iteration     | 76       |
| MaximumReturn | -0.00229 |
| MinimumReturn | -0.00436 |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023705393075942993
Validation loss = 0.023625843226909637
Validation loss = 0.023845715448260307
Validation loss = 0.023198697715997696
Validation loss = 0.026101943105459213
Validation loss = 0.02251310460269451
Validation loss = 0.023276522755622864
Validation loss = 0.023912139236927032
Validation loss = 0.023789098486304283
Validation loss = 0.023592207580804825
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023220645263791084
Validation loss = 0.02344055101275444
Validation loss = 0.025317776948213577
Validation loss = 0.022505775094032288
Validation loss = 0.0234699584543705
Validation loss = 0.022403087466955185
Validation loss = 0.022846411913633347
Validation loss = 0.022854099050164223
Validation loss = 0.02350243180990219
Validation loss = 0.027180925011634827
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02268875017762184
Validation loss = 0.023108774796128273
Validation loss = 0.02214791439473629
Validation loss = 0.02344939112663269
Validation loss = 0.022308625280857086
Validation loss = 0.021793007850646973
Validation loss = 0.02234603464603424
Validation loss = 0.022708095610141754
Validation loss = 0.02465730346739292
Validation loss = 0.022665301337838173
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02629542350769043
Validation loss = 0.022911695763468742
Validation loss = 0.02450159378349781
Validation loss = 0.026219448074698448
Validation loss = 0.023804903030395508
Validation loss = 0.02331281453371048
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022991307079792023
Validation loss = 0.024136589840054512
Validation loss = 0.022720923647284508
Validation loss = 0.022579608485102654
Validation loss = 0.023479240015149117
Validation loss = 0.02256312221288681
Validation loss = 0.02293897047638893
Validation loss = 0.023421229794621468
Validation loss = 0.023076048120856285
Validation loss = 0.023444633930921555
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07022039979497693
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0701844262295082
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0701484895033282
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07011258955987718
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.070076726342711
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07004089979550102
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.07000510986203373
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06996935648621042
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06993363961204696
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06989795918367347
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06986231514533402
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06982670744138635
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06979113601630157
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06975560081466395
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06972010178117048
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06968463886063073
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06964921199796645
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06961382113821138
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06957846622651091
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06954314720812182
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06950786402841197
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06947261663286004
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06943740496705525
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06940222897669707
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06936708860759494
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00105  |
| Iteration     | 77        |
| MaximumReturn | -0.000742 |
| MinimumReturn | -0.0014   |
| TotalSamples  | 131614    |
-----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02227257937192917
Validation loss = 0.023932723328471184
Validation loss = 0.024442218244075775
Validation loss = 0.024043889716267586
Validation loss = 0.024343810975551605
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0246351957321167
Validation loss = 0.023651540279388428
Validation loss = 0.023277780041098595
Validation loss = 0.023542363196611404
Validation loss = 0.0252887811511755
Validation loss = 0.023341553285717964
Validation loss = 0.023184290155768394
Validation loss = 0.02325027994811535
Validation loss = 0.02389972098171711
Validation loss = 0.02314862608909607
Validation loss = 0.02361750602722168
Validation loss = 0.026990044862031937
Validation loss = 0.023316822946071625
Validation loss = 0.02372089773416519
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023521192371845245
Validation loss = 0.02357160672545433
Validation loss = 0.02289462462067604
Validation loss = 0.02342832088470459
Validation loss = 0.02352345734834671
Validation loss = 0.02265041694045067
Validation loss = 0.025441333651542664
Validation loss = 0.02234748937189579
Validation loss = 0.02457302063703537
Validation loss = 0.02201509103178978
Validation loss = 0.023144476115703583
Validation loss = 0.026035703718662262
Validation loss = 0.022856641560792923
Validation loss = 0.02365884557366371
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023645924404263496
Validation loss = 0.024934181943535805
Validation loss = 0.02769353985786438
Validation loss = 0.02256213128566742
Validation loss = 0.023216286674141884
Validation loss = 0.02402835339307785
Validation loss = 0.022905101999640465
Validation loss = 0.024140559136867523
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023565491661429405
Validation loss = 0.024244945496320724
Validation loss = 0.024922337383031845
Validation loss = 0.023685988038778305
Validation loss = 0.02311304770410061
Validation loss = 0.023632697761058807
Validation loss = 0.025565188378095627
Validation loss = 0.02329806052148342
Validation loss = 0.02262675017118454
Validation loss = 0.023564105853438377
Validation loss = 0.025636013597249985
Validation loss = 0.02336152084171772
Validation loss = 0.023227574303746223
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06933198380566802
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06929691451694486
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0692618806875632
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06922688226376958
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0691919191919192
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06915699141847552
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06912209889001009
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06908724155320221
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0690524193548387
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0690176322418136
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0689828801611279
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06894816305988928
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06891348088531186
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06887883358471594
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06884422110552764
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06880964339527876
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06877510040160642
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06874059207225289
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0687061183550652
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06867167919799498
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0686372745490982
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06860290435653481
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06856856856856856
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06853426713356678
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0685
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00112 |
| Iteration     | 78       |
| MaximumReturn | -0.00077 |
| MinimumReturn | -0.00176 |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023473260924220085
Validation loss = 0.025185033679008484
Validation loss = 0.02705620601773262
Validation loss = 0.02298581786453724
Validation loss = 0.0223239716142416
Validation loss = 0.023779120296239853
Validation loss = 0.023060092702507973
Validation loss = 0.023503918200731277
Validation loss = 0.024682069197297096
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024317622184753418
Validation loss = 0.024148613214492798
Validation loss = 0.02348824217915535
Validation loss = 0.02467912994325161
Validation loss = 0.026503553614020348
Validation loss = 0.023245997726917267
Validation loss = 0.025828927755355835
Validation loss = 0.023125244304537773
Validation loss = 0.023425370454788208
Validation loss = 0.02749786525964737
Validation loss = 0.02264939621090889
Validation loss = 0.022780640050768852
Validation loss = 0.022955631837248802
Validation loss = 0.025538723915815353
Validation loss = 0.023361410945653915
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02495748922228813
Validation loss = 0.022055115550756454
Validation loss = 0.02314038947224617
Validation loss = 0.022781943902373314
Validation loss = 0.025338780134916306
Validation loss = 0.023050792515277863
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024271441623568535
Validation loss = 0.024950973689556122
Validation loss = 0.02413572371006012
Validation loss = 0.022835224866867065
Validation loss = 0.024187810719013214
Validation loss = 0.02286016009747982
Validation loss = 0.023695925250649452
Validation loss = 0.02394033968448639
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023232482373714447
Validation loss = 0.023651907220482826
Validation loss = 0.02350759506225586
Validation loss = 0.022858353331685066
Validation loss = 0.023105623200535774
Validation loss = 0.02301655150949955
Validation loss = 0.023178398609161377
Validation loss = 0.02269558608531952
Validation loss = 0.02330245077610016
Validation loss = 0.023935670033097267
Validation loss = 0.02289869263768196
Validation loss = 0.02322637289762497
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06846576711644178
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06843156843156843
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06839740389415876
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06836327345309381
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06832917705735661
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0682951146560319
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06826108619830593
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06822709163346613
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06819313091090094
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0681592039800995
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06812531079065141
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06809145129224652
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06805762543467461
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06802383316782522
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06799007444168735
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0679563492063492
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06792265741199802
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06788899900891972
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06785537394749876
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06782178217821783
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0677882236516576
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06775469831849654
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06772120612951063
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06768774703557312
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.06765432098765432
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00274 |
| Iteration     | 79       |
| MaximumReturn | -0.00191 |
| MinimumReturn | -0.00379 |
| TotalSamples  | 134946   |
----------------------------
